Data Science and MLOps use case
To operationalize data analysis and model creation, your enterprise needs integrated systems and processes. Cloud Pak for Data provides the processes and technologies to enable your enterprise to develop and deploy machine learning models and other data science applications.
Watch this video to see the data fabric use case for implementing a Data Science and MLOps solution in Cloud Pak for Data.
This video provides a visual method as an alternative to following the written steps in this documentation.
Challenges
Establishing Data Science and MLOps solutions for enterprises involves tackling these challenges:
- Accessing high-quality data
- Organizations need to provide easy access to high quality, governed data for data science teams who use the data to build models.
- Operationalizing model building and deploying
- Organizations need to implement repeatable processes to quickly and efficiently build and deploy models to production environments.
- Monitoring and retraining models
- Organizations need to automate the monitoring and retraining of models based on production feedback.
You can solve these challenges by implementing a data fabric on Cloud Pak for Data.
Example: Golden Bank's challenges
Follow the story of Golden Bank as it implements a Data Science and MLOps process to expand its business by offering low-rate mortgage renewals for online applications. Data scientists at Golden Bank need to create a mortgage approval model
that avoids risk and treats all applicants fairly. They must also automate the model retraining to optimize model performance.
Process
To implement Data Science and MLOps for your enterprise, your organization can follow this process:
- Prepare and share the data
- Build and train models
- Deploy models
- Monitor models
- Automate the AI lifecycle
The Watson Studio, Watson Machine Learning, Watson OpenScale, Watson Knowledge Catalog, IBM Watson Pipelines, and AI Factsheets services in Cloud Pak for Data provide all of the tools and processes that your organization needs to implement a Data Science and MLOps solution.
2. Build and train models
To get predictive insights based on your data, data scientists, business analysts, and machine learning engineers can build and train models. Data scientists use Cloud Pak for Data services to build the AI models, ensuring that the right algorithms and optimizations are used to make predictions that help to solve business problems.
What you can use | What you can do | Best to use when |
---|---|---|
AutoAI | Use AutoAI in Watson Studio to automatically select algorithms, engineer features, generate pipeline candidates, and train model pipeline candidates. Then, evaluate the ranked pipelines and save the best as models. Deploy the trained models to a space, or export the model training pipeline that you like from AutoAI into a notebook to refine it. |
You want an advanced and automated way to build a good set of training pipelines and models quickly. You want to be able to export the generated pipelines to refine them. |
Notebooks and scripts | Use notebooks and scripts in Watson Studio to write your own feature engineering model training and evaluation code in Python or R. Use training data sets that are available in the project, or connections to data sources such as databases,
data lakes, or object storage. Use your favorite open source frameworks and libraries. |
You want to use Python or R coding skills to have full control over the code that is used to create, train, and evaluate the models. |
SPSS Modeler flows | Use SPSS Modeler flows in Watson Studio to create your own model training, evaluation, and scoring flows. Use training data sets that are available in the project, or connections to data sources such as databases, data lakes, or object storage. | You want to visually code on a graphical builder. You want to create repeatable flows to explore data and define model training, evaluation, and scoring. |
RStudio Server with R 3.6 | Analyze data and build and test models by working with R in an RStudio Server development environment. | You want to use a development environment to work in R. |
JupiterLab IDE | Analyze data and build and test models by working with the JupiterLab development environment. | You want to use a development environment to work in Python. |
Visual Studio Code editor | Use the Watson Studio extension to connect to a Cloud Pak for Data cluster directly from Visual Studio Code. Using the extension, you can start and stop your runtimes, securely connect to your runtimes on the cluster through SSH, and edit the files inside your Watson Studio Git-based project through SSH. | You want to edit and run code in Visual Studio Code. |
Watson Machine Learning Accelerator | Train neural networks by using a deep learning experiment builder. | You want to train thousands of models, train deeper neural networks, and explore more complicated hyperparameter spaces. |
Decision Optimization | Prepare data, import models, solve problems and compare scenarios, visualize data, find solutions, produce reports, and save models to deploy with Watson Machine Learning. | You need to evaluate millions of possibilities to find the best solution to a prescriptive analytics problem. |
Analytics Engine powered by Apache Spark | Run Jupyter notebooks and jobs from other tools in Watson Studio projects by selecting a Spark environment runtime. Run Spark SQL or jobs for data transformation, data science, or machine learning by using Spark job APIs. |
You have a Spark cluster for running distributed jobs. |
Federated learning | Train a common model that uses distributed data. | You need to train a model without moving, combining, or sharing data that is distributed across multiple locations. |
Example: Golden Bank's model building and training
Data scientists at Golden Bank create a model, "Mortgage Approval Model" that avoids unanticipated risk and treats all applicants fairly. They want to track the history and performance of the model from the beginning, so they add a model use case to the "Mortgage Approval Catalog". They run a notebook to build the model and predict which applicants qualify for mortgages. The details of the model training are automatically captured as metadata in the model use case.
3. Deploy models
When operations team members deploy your AI models, the models become available for applications to use for scoring and predictions to help drive actions.
What you can use | What you can do | Best to use when |
---|---|---|
Spaces user interface (UI) | Use the Spaces UI to deploy models and other assets from projects to spaces. | You want to deploy models and view deployment information in a collaborative workspace. |
Command-line tool (cpdctl) | Use the cpdctl command-line tool in Watson Machine Learning to manage the lifecycle of models and to automate an end-to-end flow that includes training the model, saving it, creating a deployment space, and deploying the model. | You want to deploy and manage models to test or production environments from a command-line. |
Example: Golden Bank's model deployment
The operations team members at Golden Bank promote the "Mortgage Approval Model" from the project to a deployment space and then creates an online model deployment.
4. Monitor deployed models
After models are deployed, it is important to monitor them to make sure that they are performing well. Data scientists must watch for model performance and data consistency issues.
What you can use | What you can do | Best to use when |
---|---|---|
Watson OpenScale | Monitor model fairness issues across multiple features. Monitor model performance and data consistency over time. Explain how the model arrived at certain predictions with weighted factors. Maintain and report on model governance and lifecycle across your organization. |
You have features that are protected or that might contribute to prediction fairness. You want to trace model performance and data consistencies over time. You want to know why the model gives certain predictions. |
Example: Golden Bank's model monitoring
Data scientists at Golden Bank use Watson OpenScale to monitor the deployed "Mortgage Approval Model" to make sure that it is accurate and treating all Golden Bank mortgage applicants fairly. They run a notebook to set up monitors for the model and then tweak the configuration by using the Watson OpenScale user interface. Using metrics from the Watson OpenScale quality monitor and fairness monitor, the data scientists determine how well the model predicts outcomes and if it produces any biased outcomes. They also get insights for how the model comes to decisions so that the decisions can be explained to the mortgage applicants.
5. Automate the ML lifecycle
Your team can automate and simplify the MLOps and AI lifecycle with Watson Pipelines.
What you can use | What you can do | Best to use when |
---|---|---|
Watson Pipelines | Use pipelines to create repeatable and scheduled flows that automate notebook, Data Refinery, and machine learning pipelines, from data ingestion to model training, testing, and deployment. | You want to automate some or all of the steps in an MLOps flow. |
Example: Golden Bank's automated ML lifecycle
The data scientists at Golden Bank can use pipelines to automate their complete Data Science and MLOps lifecycle and processes to simplify the model retraining process.
Tutorials for Data Science and MLOps
Tutorial | Description | Expertise for tutorial |
---|---|---|
Orchestrate an AI pipeline with model monitoring | Train a model, promote it to a deployment space, and deploy the model. | Run a notebook. |
Orchestrate an AI pipeline with data integration | Create an end-to-end pipeline that prepares data and trains a model. | Use the Watson Pipelines drag and drop interface to create a pipeline. |
Learn more
- Data fabric tutorials
- Watson Studio overview
- Watson Machine Learning overview
- Watson OpenScale overview
- Watson Knowledge Catalog overview
- Watson Pipelines
- Videos
Parent topic: Data fabric solution overview