How to get started with Microsoft Azure Machine Learning
Azure Machine Learning's algorithms help companies predict and plan for the future.
Kevin Feit and Oliver Asmus | December 16, 2015
The rate at which companies accumulate data continues to accelerate, and with it, so does the need to translate that data into meaningful, actionable insights to key decision makers and stakeholders. What’s affecting the bottom line? What’s the next trend-setting product? Where do our customers’ interests lie today, tomorrow, and beyond?
That demand has put increasing pressure on solution providers to deliver technology that addresses these questions, and Microsoft is responding to the call. Azure Machine Learning (Azure ML), one of Microsoft’s cloud-based solutions, helps companies unlock key insights from its available data resources.
Whereas traditional machine learning offerings focus on very rigid, tightly controlled processes and experiments, Azure ML’s key benefit is its highly flexible machine learning model architecture. This architecture allows developers to try any number of experimental options just by dragging and dropping different algorithms into the model, retraining that model, and then reviewing the results. These types of modern features are true game-changers for the machine learning industry, offering new predictive insights that can help businesses get ahead of the curve.
Designed for high-performance data mining, Azure ML excels at applying the latest in artificial intelligence algorithms to predict future outcomes. Since this service is based in Microsoft’s Azure cloud, any other Azure services can be used to complement the ML service. Once a model is built and trained, you can generate code to allow it to be called from your application with just a few clicks.
Setting up Azure ML
Setting up Azure ML is pretty simple. A Microsoft account is all that’s required to activate an Azure subscription and begin creating Azure ML models using various algorithms.
You have a couple of options to load your data into Azure’s ML model:
- If your data is already in the Azure cloud, you can connect to it directly from Azure ML.
- You can also extract your data into a flat file such as TXT or CSV and then upload it into Azure ML.
Deploying an Azure ML model
Once a data set has been established, it’s time to build the ML model and perform analysis on the data set. You can choose from a host of different types of algorithms to manipulate your data sets. All Azure ML algorithms are pre-built and designed to give developers the ability to quickly synthesize large data sets and derive meaningful results with little effort.
Linear regression test
For example, suppose you wanted to analyze data on car-buying trends from the last five years and determine how those trends will change in the next five years. In addition, you want to factor in several variables into your experiment, including economic growth and demographics. In this example, you could use a statistical algorithm called a linear regression test to forecast car-buying trends with variables in Azure ML. As one of the many Azure ML algorithms, regression testing identifies correlations between outcomes and certain influencing variables.
Another popular algorithm included in Azure ML is k-means clustering. In k-means clustering, similar data sets are grouped together to form a cluster, which allows for the easy identification of outliers.
Suppose you wanted to identify possible fraudulent credit card activity on a consumer’s account. You could use k-means clustering to group like transactions together over one or more clusters. Outlier transactions would appear further away from the clusters and allow you to track potentially fraudulent activity on an account.
Extensions also exist so that data scientists can leverage the full capabilities of R and Python scripting languages to create custom model components. The compilers for both scripting languages are fully supported within the Azure ML model. Further support of additional algorithms not delivered in the model can be found on the Microsoft Azure Marketplace. This storefront contains hundreds of pre-built models, algorithms, and components that can be snapped into a model and used immediately upon purchase.
Walkthrough of an Azure ML experiment
Suppose your organization is trying to predict which customers are more or less likely to renew their magazine subscriptions. Based on this, the organization might offer different discounts or incentives to retain the customers.
The first step is to acquire historical data that will potentially predict whether they will renew along with whether they actually renewed. This data will be used to train the model. For example, you may consider attributes such as the reader’s income level, how long they have been a subscriber, and how many other magazines they subscribe to. You should include any attributes that you suspect may affect the outcome based on the business.
That said, don’t include arbitrary attributes. For example, there appears to be a correlation between the league of the Super Bowl winner and stock market performance, but that’s pure coincidence—the Super Bowl winner is not an attribute you should include in a stock market performance model.
Don’t forget to include what the customer actually decided to do in your data set. In the example shown below, the last column, Renewed Subscription, is the outcome you are trying to predict. It’s also sometimes referred to as the Target Variable.
This is how your Azure Machine Learning data set might look if your company was trying to predict magazine subscription renewals.
The existing data will be used to “train” a model in Azure ML that predicts the likelihood that a subscriber will renew. Once the model is trained with actual data, you will provide new data with the same information, except for the column you are trying to predict.
Once you have the data set, upload it to Azure ML and add it to your experiment. If your data is not perfect, you can decide how to handle cases with missing data. Options include removing those cases completely or replacing the missing values with the average value from the other cases.
Next, you will select an algorithm to use for prediction. Azure ML makes it easy to try multiple algorithms and compare the results. Each algorithm also has parameters that control its behavior. Azure ML has a great feature called Sweep Parameters that will try different options and identify the best one.
The screenshot below shows an example of an experiment that compares the results of two different algorithms, Two-Class Logistic Regression and Two-Class Decision Forest, so you can identify which gives better results.
Testing two algorithms in Azure ML: Two-Class Logistic Regression vs. Two-Class Decision Forest.
After running the training experiment, you can compare the two algorithms by clicking on the circle at the bottom of Evaluate Model and selecting Visualize.
After you’ve run your experiment in Azure ML, click “visualize” to see the results.
The blue line represents the Two-Class Logistic Regression; the red line is the algorithm you’re testing against. The light gray line represents random guesses. The line farthest from the light gray line is the better algorithm, so in this case, you’d select Two-Class Logistic Regression.
Once you are satisfied with the results using your training data set, you can create a Scoring Experiment. A scoring experiment is a streamlined version of your training experiment that removes components that aren’t necessary for production and adds the web service input and output.
A scoring experiment ready for deployment as a web service.
The scoring experiment can then be published as a web service. When you do that, Azure ML will provide sample code in C#, R, and Python that you can use to run your model against new data. From there, you can make predictions with that new data. The predictions can either be for a single record at a time (for example, providing guidance for a customer service representative regarding what offer to make during a customer call) or in batch mode (for example, deciding which subscribers to include in a mailing).
The tremendous expansion of devices that gather data has only accelerated the rate at which companies wish to gain real-time insight into their business. Microsoft’s Azure ML product provides an easy-to-use tool that enables organizations to quickly take advantage of cutting-edge algorithms. Combine these resources from Microsoft with a trusted business partner, and the sky’s the limit.
Kevin Feit is no longer with Slalom.
Oliver Asmus is an data management solution principal for Slalom New York. He’s a Microsoft Certified Professional and has over a decade of diversified experience delivering Microsoft solutions with an emphasis on data and cloud technologies. Follow Oliver on Twitter: @OliverAsmus.