Making Data Scientists Productive in Azure

Read Time9 Minute, 42 Second

Doing data science today is far more difficult than it will be in the next 5 to 10 years. Sharing and collaborating on workflows in painful, pushing models into production is challenging. Let’s explore what Azure provides to ease data scientists’ pains.

In this post, you will learn about the Azure Machine Learning Studio, Azure Machine Learning, Azure Databricks, Data Science Virtual Machine, and Cognitive Services. What tools and services can we choose based on a problem definition, skillset, or infrastructure requirements?

One thing about Microsoft: they have many ways to solve the same problem.

Picking a good name for your classes, methods, or variables is essential (and difficult). Finding a good name for a product or service seems to be even more challenging. When I look at the Azure service names (“machine learning” this, “machine learning” that), it is clear that even big companies, like Microsoft, have difficulties finding catchy and straightforward names.

Too many services with built-in ML capabilities

Too many services with built-in ML capabilities

As a result, there are many different services with similar names. For example, what is the difference between Azure Machine Learning Service and Azure Machine Learning Studio? Is Microsoft Machine Learning Server the same thing as Data Science Virtual Machine?

But before we dive into the details, I need to explain the title of this post, “Making Data Scientists Productive in Azure.” Matei Zaharia, the author of Spark, in one of his presentations, pointed out the main aspects of the machine learning lifecycle.

Machine Learning lifecycle

Logging Model Management

In the above lifecycle, we start with data. Later, we run data preparation scripts, model training, and model deployment. Then, if our application is doing anything important, we want to monitor it to see how it’s doing, collect extra data, and feed it back into this process again. Each step has many tools that often need tuning for better results and performance. Finding what parameters were used at each stage to get a specific result is essential to be able to experiment with. Everything needs to happen at scale.

If a tool or service supports all the lifecycle phases (on its own or by solid integrations), I think it makes Data Scientists productive.

What is it?

Azure services with pre-built AI and ML models

What can you do with it?

Add intelligent features to your apps.

Azure Cognitive Services is a capability that allows software developers (no machine learning knowledge required) to use pre-trained machine learning models and integrate with other applications by calling APIs or importing SDKs.

Image title

Azure Cognitive Services

As Cognitive Services continually expand with new features, you may check the latest status (many services offer free demos).

Image title

Cognitive Services free demo

For example, here is a face detection API returning my face parameters. You find attributes like hair color, smile, and gender. But, the first property is BALD: 0.17! By the way, increased by 4 percentage points since the last year 🙂

Azure Cognitive Services – Summary

Key benefits:

  • Minimal development effort.
  • Easy integration via HTTP REST.
  • Built-in integrations with other Azure services.
  • Containers support.
  • Azure Virtual Network for enhanced data security.

Considerations:

  • Limited customization allowed.
  • Limited support for Non-English languages.

ML NET

What is it?

An open source and cross-platform ML framework.

What can you do with it?

Create custom ML models using C# or F# without leaving the .NET ecosystem.

Image title

ML.NET framework

ML.NET Summary

Key benefits:

  • High performance.
  • AutoML functionality.
  • Leverage TensorFlow or ONNX.
  • Expose a model via an ASP.NET Core Web API.
  • Integrate with Spark via .NET for Apache Spark (preview).
  • Use ML.NET in Jupyter Notebooks (preview).

Considerations:

  • Limited support for popular ML libraries (e.g. Scikit-learn, NumPy).

What is it?

Drag-and-drop visual interface for ML.

What can you do with it?

Build, experiment, and deploy models using pre-configured algorithms

Azure Machine Learning Studio (ML Studio) is a collaborative, drag-and-drop visual workspace where you can build, test, and deploy machine learning solutions without needing to write code. It uses pre-built and pre-configured machine learning algorithms and data-handling modules. Business analysts/statisticians without R/Python knowledge would be productive with this tool.

Azure Machine Learning Studio is an impressive service that can make people productive quickly. Yet, an experienced Data Scientist might find the tool very limiting and slow.

Use ML Studio when you want to experiment with machine learning models quickly and easily, and the built-in machine learning algorithms are enough for your solutions.

Image title

Binary Classification: Direct marketing

The whole experiment looks like a graph, with inputs at the top and outputs (predictions) at the bottom. In the example above, “Binary Classification: Direct marketing,” I compare two algorithms (two-class boosted decision tree and two-class support vector machine). The tool makes it easy to deploy a better performing model as a web service.

Image title

ML workflow

Azure Machine Learning Studio – Summary

Key benefits:

  • Interactive visual interface.
  • Built-in Jupyter Notebooks for data exploration.
  • Direct deployment of trained models as web services.
  • Built-in integrations with other Azure services.

Considerations:

  • Online only.
  • Limited scalability (the maximum size of a training dataset is 10 GB).
  • Limited number of supported input and output connectors.
  • Limited support for custom Python/R code.

Power BI Auto ML

What is it?

Auto Machine Learning component built into Power BI to build ML models without any code

What can you do with it?

Using AutoML in Power BI, business analysts without a strong background in machine learning can build ML models.

Image title

Power BI Auto ML

There is a built-in model explanation functionality to get top predictors during training and explanations for each prediction.

Image title

Power BI Auto Machine Learning – Summary

Key benefits:

  • Use Power BI dataflows to load data, transform it and build models on top of it.
  • Deploy models as services via Azure ML.
  • Get top predictors during training and explanations for each prediction.

Considerations:

  • Limited selection of algorithms (binary prediction, general classification, regression).
  • Paid Pro or Premium license needed.

What is it?

Managed cloud service for ML.

What can you do with it?

Train, deploy, and manage models in Azure.

First of all, be aware that we are discussing Azure Machine Learning, NOT STUDIO (presented earlier).

Azure Machine Learning (Azure ML) provides a cloud-based environment you can use to develop, train, test, deploy, manage, and track machine learning models. It supports open source technologies so you can use Python packages with machine learning components.

Image title

Azure Machine Learning

By using Azure ML, you can start training on your local machine and then scale out to the cloud. With many available compute targets and advanced hyperparameter tuning services, you can build better models faster by using the power of the cloud.

Image title

Azure ML workflow

Azure ML supports the whole cycle, from data ingestion to deployment using Docker containers. Data should be available in Azure Blob Storage. For data preparation and training, you can use any Python open source package. For deployment, the easiest setup is achievable with Azure Container Instances or Azure Kubernetes Service.

It got a new look in 2019 to complete end-to-end tasks in a seamless manner.

Image title

Azure Machine Learning – Summary

Key benefits:

  • Central management of scripts and run history.
  • Run model training scripts locally (offline), and then scale out to the cloud.
  • Management and deployment of models to the cloud or edge devices.
  • Integration with Azure Dev Ops.
  • Added support for R (preview).

Considerations:

  • Investigate MLflow to track metrics and manage models.

Azure Databricks

What is it?

Spark-based analytics platform.

What can you do with it?

Build and deploy models and data workflows.

Image title

Building and deploying models and data workflows

Databricks provides a managed cloud platform built around Spark that delivers:

  1. Fully managed Spark clusters.
  2. An interactive workspace for exploration and visualization.
  3. A production pipeline scheduler.
  4. A platform for powering your Spark-based applications.

Image title

Creating a new cluster

The main concepts:

  • Databricks Runtime (Apache Spark, concurrent clusters, REST APIs, libraries).
  • Collaborative workspace (notebooks, user access, git integration).
  • Deploy Jobs and Workflows (job scheduler, notifications and logs, multi-stage pipelines).
  • Security (single sign-on (SSO), access control list (ACL), secrets via Azure Key Vault).

Image title

Databricks ML workflow

Azure Databricks, with the help of extra libraries and services, supports the complete machine learning cycle.

Azure Databricks – Summary

Key benefits:

  • The most mature development environment for ML on the Azure platform.
  • Seamless integration with MLflow & Azure ML.
  • Integrated with other Azure services (e.g., Azure Data Factory, Azure Key Vault).
  • Delta Lake support.

Considerations:

  • Online only.
  • Cost includes the price of virtual machines and Databricks fee.

Data Science Virtual Machine

What is it?

An Azure virtual machine with pre-installed data science tools.

What can you do with it?

Develop ML solutions in a pre-configured environment.

Data Science Virtual Machine (DSVM) is a pre-installed and pre-configured set of images for Windows or Linux virtual machines. It includes the most popular data science tools. Since it has access to Azure networking and scalability (it’s a Virtual Machine eventually), DSVM can be a great environment even for data science teams.

Image title

Data Science Virtual Machine

Data Science Virtual Machine can be useful for learning and comparing different machine learning tools.

Data Science Virtual Machine – Summary

Key benefits:

  • The most complete development environment for ML on the Azure platform.
  • Reduced time to install, manage, and troubleshoot data science tools and frameworks.
  • Included the latest versions of all commonly used tools and frameworks.
  • Virtual machine options include scalable GPU images.

Considerations:

  • Online only.
  • Infrastructure as a service (IaaS), not a managed data science solution.

Image title

If you can’t use Azure, I suggest you look into Microsoft Machine Learning Server and SQL Server Machine Learning Services.

Microsoft Machine Learning Server

What is it?

Cross-platform standalone server for predictive analysis.

What can you do with it?

Build and deploy models written in R or Python.

Microsoft Machine Learning Server (ML Server) is a flexible choice for analyzing data at scale, building intelligent apps, and discovering insights. It includes a collection of R packages, Python packages, interpreters, and infrastructure for developing and deploying distributed R and Python-based machine learning solutions on a range of platforms across on-premises and cloud.

Image title

Microsoft Machine Learning server

ML Server offers operationalization by generating web services on top of ML models. These web services are hosted on a server grid on-premises or in the cloud and can be integrated with line-of-business applications. Additionally, ML Server integrates with Active Directory and Azure Active Directory and includes role-based access control to satisfy the security and compliance needs of your enterprise.

Image title

Microsoft Machine Learning Server workflow

ML Server has full support for the data science lifecycle of R and Python-based analytics.

Microsoft Machine Learning Server – Summary

Key benefits:

  • Built on a legacy of Microsoft R Server and Revolution R Enterprise.
  • Advanced security options.
  • Deploy R and Python models as web services.

Considerations:

  • You need to deploy and manage Machine Learning Server in your enterprise.

SQL Server Machine Learning Services

What is it?

A built-in SQL Server feature to support machine learning.

What can you do with it?

Execute Python and R scripts with relational data.

SQL Server Machine Learning Services – Summary

Key benefits:

  • Run your scripts where the data resides and eliminate the transfer of data across the network to another server.
  • Encapsulate predictive logic in a database function or as a library.
  • Use base distributions of Python, R, and Java (extensibility framework).

Considerations:

  • Assumes a SQL Server database as the data tier for your application.
  • Limited scalability.
  • A long list of known issues.

Summary

Overall, the most popular Data Science service in Azure is Azure Machine Learning. Azure Databricks provides amazing data engineering capabilities and best-in-class Spark environment. However, if you look for something simpler, investigate Power BI Auto ML and Azure Machine Learning Studio. Software Developers should try out ML.NET.

0 0
Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleppy
Sleppy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %

Leave a Reply