Everyone knows that creating data science/AI solutions is not easy—especially people who’ve tried to do it! But the process is getting easier as tools improve. Exhibit A: Emerging ML Ops tools from Google Cloud Platform.
Emerging tools from Google Cloud Platform are very approachable, even for people without AI experience. Let’s hone in on one area of AI usage—ML Ops—and then look at some of the new tools that are reducing the barriers to entry.
Deploying and maintaining AI with ML Ops
Often, organizations that want to leverage AI don’t consider a plan for production at the outset. When there are only a handful of models running in production, a data science team can likely manage the data engineering and modeling processes manually. But as the number of models grows, you need a more structured approach.
It’s considerably more challenging to build an integrated machine learning (ML) system and iterate on it—in both a safe development space and production—than it is to build a single ML model. ML Ops applies concepts from the DevOps discipline to unlock scale through automation. The automation applies key elements of the data science process, including data extraction, analysis, preparation, model training, model evaluation, validation, serving, and continuous monitoring.
At this point, clients often say, “The idea of an automated AI pipeline is great, but how do we get there?” To break things down a bit, it’s useful to think of AI system maturity along a spectrum. Google recommends assessing ML systems based on three levels of maturity:
Level 0: Manually driven processes.
Every step of the process is manual. There’s an explicit handoff of a prepared model from the data science team to DevOps to serve predictions. This process is easier to implement initially, particularly for smaller teams. But scale and speed become problems over time.
Level 1: Automated delivery pipelines.
Data and model evaluation steps are automated, and ETLs (extract, transform, load) are triggered using event-driven workflows. The metadata around models and underlying data assets is explicitly managed. Manual deployments are still done for the pipelines and triggers themselves. However, the same pipelines used in model development can be deployed to production, which decreases the time to market. Models are automatically retrained to address data drift concerns.
Level 2: Pipelines fully automated via CI/CD (continuous integration / continuous delivery).
Data scientists can rapidly build, test and deploy new components to existing pipelines. Source control, testing, and monitoring are all automatic and respond to changes in a modular fashion.
No matter an organization’s level of data maturity, the focus should be on developing the fundamentals of thorough data analysis at a manageable scale first. Once the basic engine is in place, additional automation and complexity can be introduced to the solution to accelerate.
New Google Cloud AI products
Fortunately, Google Cloud provides a host of tools on their AI Platform to help with every part of the end-to-end AI lifecycle, from data warehousing with BigQuery to machine learning development with AI Platform. Perhaps the most impressive of these tools is the new AI Platform Pipelines solution. With this tool, Google is uniting all the complicated processes involved with AI development—from data extraction to model deployment.
Google is also including its existing AutoML solution in a more democratized manner within AI Platform, so it fits into workflows more effectively. The Call Center AI and Document AI platforms exist in one form or another on the other cloud providers, so it’s great to see Google including some managed solutions for both of those topics, as large enterprises are beginning to try AI-based customer service and document analytics.
A word about responsible AI
Two controversial issues have in the field of AI over the past few years: data privacy and bias. A lot of companies actually cancel AI projects immediately when they discover ethical issues with their processes. There’s a stronger argument about data privacy concerns as a potential derailer of projects, but AI bias is often fixable with the right process.
Any team building an AI solution that will affect customers should act immediately to ensure that consumers will be treated fairly and ethically. Google recommends three overarching priorities to reinforce ethical and responsible AI behavior: principles, programs, and tools. If someone raises their hand to say, “I think that could be irresponsible,” it’s everyone’s duty to take those concerns into account, and make sure there are safeguards in place to protect consumers.
It’s imperative to ensure an AI team is aware of problems like gender and racial misidentification, and is empowered to prioritize and solve them.
Ultimately not a tech problem
AI problems don’t get solved overnight. Solutions will be complicated. That’s why it’s essential to work with business and product teams to get the proper buy-in necessary to get a project done. Every AI project should have a steering committee of functional experts to ensure that data scientists are driving toward the right outcome for the business.
So remember: AI isn’t just a technology problem. To get real, tangible AI value out of data, it’s imperative to get alignment from the right people and ensure red tape is cleared to get access to the necessary data.