Reproducible Data Science with Docker Containers

Container Camp SF 2016 - Ben Hamner - CTO of Kaggle One of the biggest pain points in data science today is that a data scientist's work isn't reproducible. This makes it hard for a data scientist to come back to their own work 6 months down the road, and even harder for a colleague to leverage the analytics that have already been done. Docker containers enable a simple solution to this. At Kaggle, we're maintaining kaggle/python, kaggle/rstats, and kaggle/julia public docker containers designed to make it easy for data scientists to get started on a new analytics task & to build off work that they or others have already done. In this talk, I'll cover how we're using docker at Kaggle for reproducible data science, how our community's found it valuable, and how you can leverage this in your own workflows.
Length: 23:26
Views 818 Likes: 15
Recorded on 2016-04-15 at Container Camp USA
Look for other videos at Container Camp USA.
Tweet this video