Datagrom | AI & Data Science Consulting

View Original

Docker for Data Scientists Made Simple: Why Pay for Data Science Software?

See this content in the original post

Want to deliver value with data science more quickly and cheaply? You can if you apply lessons learned in software development to your data science projects. Try a Lean-Agile approach to data science with tools like Docker for data scientists. 

Here's what you need to know:

  1. Why agile data science

  2. Docker for data scientists: Benefits & Challenges

  3. Docker for data scientists made simple

Why Agile Data Science

We’ve all heard that up to 85% of data science projects fail. Perhaps it’s time to reexamine our approach. If only there were a more mature field that was similar enough for us to learn something (or anything) useful. The world of software development perhaps?

Lean-Agile methodology replaced Waterfall in software development for good reasons. Notice if any of these sound similar to how your organization approaches data science projects and data science technology. 

Waterfall

  1. Spend months building product requirements that you anticipate will meet your internal or customer needs both now and in the future. 

  2. Invest significant resources to build (or buy) a big platform that meets your lengthy list of requirements.

  3. Launch and hope your customers (internal or external) flock to it, and you see a positive return on investment.

Lean-Agile

  1. Use the cheapest tools you have to build the simplest solution that you think may add value to your customers. Build and deliver it as quickly as possible.

  2. Gather customer feedback and build the next most-important project your customers tell you they need. Abandon failures quickly and cheaply. Build on successes.

  3. Continue to build in small increments in a tight feedback loop with your customers. You are well-positioned to adapt to the world as it changes.

Here's a thought. Don't waste too much time with up-front planning for your data science projects. Don't spend months planning which new expensive data science platform will meet your list of requirements that have probably already changed.

Instead, implement an appropriate governance framework, and empower your entire team to start experimenting with data science today with the cheapest and simplest tools available. The more experiments they run, the faster they will build something valuable.

Once you have something valuable, promote successful experiments to more costly production environments in the cloud.

Docker For Data Scientists

See this content in the original post

Docker for data scientists is a great option to enable widespread agile data science experimentation and collaboration at your organization at a low cost. You can think of open-source Docker as a computer within a computer (or “container”). We can easily share this Docker computer from one data scientist to the next and don't have to worry about whether it will work the same on a different machine or operating system.

For example, I can do all my data science work in a Jupyter notebook in a Docker browser window on a Mac. I can then share the entire project with you embedded in a Docker container. The data, as well as Python and machine learning libraries I used, are all included. Whether you have a Windows laptop or an Amazon EC2 Linux cloud instance, when you open the Docker computer, you will see everything already installed and configured just like I had it.

Benefits of Docker for Data Scientists

  1. Portability & reproducibility

  2. Low cost (it’s free)

Given the benefits, why haven't all organizations already adopted Docker for data scientists?

Challenges of Docker for Data Scientists

  1. You need to know Docker — The virtual computer manages compute resources, code, data, tools, and packages. This is tough to get right.

  2. You need to know Git — As projects evolve, you need to track changes in code and data. No easy task.

Most data scientists don't have the software infrastructure experience it takes to set up open-source Docker and Git for collaborative data science projects at a large scale. And they shouldn't have to!

Docker For Data Scientists Made Simple

See this content in the original post

Fortunately, there is now a solution made specifically for data scientists to take advantage of the reproducibility and low cost that Docker offers, without any prior knowledge of either Docker or Git!

Gigantum took open-source Docker and built all of the extra functionality around it that data scientists need. They created a seamless experience for data scientists (or anyone) to download, install, and open a fully configured environment for data science without writing a single line of code.

Ready to share your work and collaborate with others? Simply save your work to the GigantumHub, which will automate all of the code and data Git version control for you and make your project available to others. You can think of GigantumHub as GitHub for data science.

To put the cherry on top, Gigantum has open-sourced GigantumClient, which is the entire data science environment in a box, built on Docker. Whether you're a hardcore data scientist, casual observer, or Chief Data Officer, you should try it. Download Gigantum Desktop and work locally for free.

The setup experience is refreshingly simple. Sure the name might sound a little funny, but the tech is real, and it just plain works.

See this content in the original post

Stay Current in Data Science Technology

Subscribe to our free weekly newsletter.

See this form in the original post

Subscribe to our weekly Data Science & Machine Learning Technology Newsletter

See this form in the original post

Recent Posts

See this gallery in the original post

Posts by Category

See this content in the original post