Reproducible Data Science with Docker Containers

Container Camp SF 2016 - https://container.camp Ben Hamner - CTO of Kaggle One of the biggest pain points in data science today is that a data scientist's work isn't reproducible. This makes it hard for a data scientist to come back to their own work 6 months down the road, and even harder for a colleague to leverage the analytics that have already been done. Docker containers enable a simple solution to this. At Kaggle, we're maintaining kaggle/python, kaggle/rstats, and kaggle/julia public docker containers designed to make it easy for data scientists to get started on a new analytics task & to build off work that they or others have already done. In this talk, I'll cover how we're using docker at Kaggle for reproducible data science, how our community's found it valuable, and how you can leverage this in your own workflows.