2. Embrace DevOps
Much has been written about DevOps and its ability to speed up time-to-value and innovation. Machine learning is no different. New approaches and algorithms—for example, deep learning—are coming out all the time, and data scientists are trying them out through code and relying less on GUI-based interfaces. After the new approach has been tested out in a sandbox environment with limited scope, it’s time to move toward development, QA, and finally, production. Each one of these environments can be automated with DevOps through tools like Jenkins, Puppet, Chef, Ansible, and Docker. In other words, ML, similar to any type of analytics, can be something done once to quickly make a decision; however, companies should think of ML as software products—something to be deployed and maintained with quick development cycles using agile and continuous integration methodologies.
Code like R and Python can be versioned using Git-related technologies just like software. Typically, data scientists can work on the same model by branching to try out different input variables and types of training data sets, as well as tweaking the model’s parameters in different ways. If one scientist has achieved a high amount of predictability with their particular model, they make a “pull request” or push the code back to the branch. In this way, machine learning models can be thought of as “features” in a product-driven world.
3. Use pre-made cloud environments
Machine learning development isn’t something that should be done on local computers using locally stored data. Data scientists can be much more productive by having a pre-made environment with all the tools, packages, languages, and data sets ready for development—typically done in cloud environments. These environments can be built and “rented” for periods of time, then shut down when they’re not in use. Even better, new data scientists that come on board can use an environment that looks the same as others without having to create their own from scratch via proper imaging, virtualization, and snapshots—the pre-made dev, QA, and production environments are already there. Firms like Domino have made a business building such environment platforms.