In this repository you will find a set of scripts and commands that help you build a scalable solution for scoring many models in parallel using Azure Machine Learning (AML).
The solution can be used as a template and can generalize to different problems. The problem addressed here is monitoring the operation of a large number of devices in an IoT setting, where each device sends sensor readings continuously. We assume there are pre-trained anomaly detection models - one for each sensor of a device. The models are used to predict whether a series of measurements, that are aggregated over a predefined time interval, correspond to an anomaly or not.
To get started, read through the Design section, then go through the following sections to create the Python environment, Azure resources, and the scoring pipeline:
- Design
- Prerequisites
- Create Environment
- Steps
- Create Azure Resources
- Create and Schedule the Scoring Pipeline
- Validate Deployments and Pipeline Execution
- Cleanup
This solution consists of several Azure cloud services that allow upscaling and downscaling resources according to need. The services and their role in this solution are described below.
Blob containers are used to store the pre-trained models, the data, and the output predictions. The models that we upload to blob storage in the 01_create_resources.ipynb notebook are One-class SVM models that are trained on data that represents values of different sensors of different devices. We assume that the data values are aggregated over a fixed interval of time. In real-world scenarios, this could be a stream of sensor readings that need to be filtered and aggregated before being used in training or real-time scoring. For simplicity, we use the same data file when executing scoring jobs.
Azure Machine Learning (AML) is a cloud service that allows training, scoring, managing, and deploying machine learning models at scale in the cloud. It can be used to execute training, scoring, or other demanding jobs on remote compute targets, such as a cluster of virtual machines, that can scale according to need. In this solution guide, we use AML to run scoring jobs for many sensors in parallel. We do that by creating an AML pipeline with parallel steps, where each step executes a scoring Python script for each sensor. AML manages queueing and executing the steps on a scalable compute target.
In addition, we create a scheduling process using AML to run the pipeline continuously on a specified time interval.
For more information on these services, check the documentation links provided in the Links section.
All scripts and commands were tested on an Ubuntu 16.04 LTS system.
Once all prerequisites are installed,
-
Clone or download this repsitory:
git clone https://github.com/Microsoft/AMLBatchScoringPipeline.git
-
Create and select conda environment from yml file:
conda env create -f environment.yml conda activate amlmm
-
Login to Azure and select subscription
az login --use-device-code az account set -s "<subscription name or ID>"
-
Start Jupyter in the same environment:
jupyter notebook
-
Open Jupyter Notebook in your browser and make sure your environemnt's kernel is selected:
Kernel > Change Kernel > Python [conda env:amlmm]
Start creating the required resources in the next section.
The 01_create_resources.ipynb notebook contains all Azure CLI commands needed to create resources in your Azure subscription, as well as configurations of the AML pipeline and the compute target.
Navigate to the cloned/downloaded directory in Jupyter Notebook: AMLBatchScoringPipeline/01_create_resources.ipynb, and start executing the cells to create the needed Azure resources.
The 02_create_pipeline.ipynb notebook contains Python code that creates the AML scoring pipeline and schedules it to run on a predefined interval.
After all resources are created, you can check your resource group in the portal and validate that all components have been deployed successfully.
Under Storage Account > Blobs, you should see the predictions' CSV files in the preds container, after the pipeline runs successfully.
If you wish to delete all created resources, run the following CLI command to delete the resource group and all underlying resources.
az group delete --name <resource_group_name>
- End-to-End Anomaly Detection Jobs using Azure Batch AI
- Azure Machine Learning Documentation
- Azure Blob Storage Documentation
- Distributing inference across many compute nodes using ParallelRunStep
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Microsoft AI Github Find other Best Practice projects, and Azure AI Designed patterns in our central repository.