ModelOps use case

By setting up a ModelOps process for your data, your company can benefit from a full end-to-end AI lifecycle that optimizes your data and AI investments.

Overview

Your company needs to ensure that data is collected and explored efficiently and that the AI models that use the data are properly built and governed. You need integrated systems and processes to manage data and model assets across the AI lifecycle.

With Cloud Pak for Data, your company can manage the full AI lifecycle from a single platform with integrated services that support the entire flow from collecting data all the way to monitoring your models in production.

You can install the Cloud Pak for Data services that support ModelOps to improve, simplify, and automate AI lifecycle operations and management. You can streamline and accelerate data collection and management, model development, model validation, and model deployment. With Cloud Pak for Data, you can operate trusted AI through ongoing model monitoring and retraining on an end-to-end unified data and AI platform, and then use the resulting predictions to decide on the actions to address your company’s needs.

Watch the following video to see the steps in the ModelOps process.

This video provides an audio-visual presentation of the written use-case flow description in this documentation.

Process

You can use different tools and services for each step in the process, depending on how you want to implement your ModelOps use case in Cloud Pak for Data.

This image shows the steps in the ModelOps process and lists the services and tools that can be used in each step. The sections and tables that follow provide detailed information about the process, steps, services, and tools.

1. Collect the data

Collecting and organizing data is an important step in building your automated AI pipeline. Data scientists create projects, and data engineers collect data and add it to the projects so it can be organized and refined. You can collect data from multiple sources and ensure that it is secure and accessible for use by the Cloud Pak for Data tools and services that support your ModelOps AI lifecycle. You can address policy, security, and compliance issues to help you govern the data that is collected before you analyze the data and use it in your AI models.

Services and tools you can use What you can do Best to use when
Watson™ Knowledge Catalog
  • Create and catalog connections to diverse data sources from IBM® Cloud, on premises data services, and third-party data services.
  • Create and catalog data assets, point to data sets that are accessible through connections, and upload files, such as CSV files, as data assets.
Use Watson Knowledge Catalog for data collection when you need an inventory of data connections and data sets at the organizational level so data scientists and analysts can work with the data for various projects.
Data Virtualization
  • Create virtual data tables that can combine, join, or filter data from various relational data sources.
  • Make the resulting combined live data available as data assets in Watson Knowledge Catalog.
With Data Virtualization, you can query many data sources as one. Use Data Virtualization for data collection when you need to combine live data from multiple sources to generate views for input for projects. For example, you can use the combined live data to feed dashboards, notebooks, and flows so that the data can be explored.
Data Refinery
  • Access and refine data from diverse data source connections.
  • Materialize the resulting data sets as snapshots in time that might combine, join, filter, or mask data to make it usable for data scientists to analyze and explore.
  • Make the resulting data sets available to the project in Watson Knowledge Catalog.
With Data Refinery, you can simplify the process of preparing large amounts of raw data for analysis. Use Data Refinery for data collection when you need to access and join or filter data and materialize the results as data assets that represent a point in time. You can use the data as input for analysis or model training.
DataStage®
  • Deploy ready-to use, built-in business operations for your data flows.
  • Handle large volumes of data and complex data.
Use DataStage when you need to quickly design and run accurate data flows by using an intuitive interface that lets you connect to a wide range of data sources. You can integrate and transform data, and deliver it to your target system in batch or real time.
To learn about the services and tools that you can use to collect the data for your AI models, see:

2. Explore the data

To gain new insights and make business decisions, you can analyze and explore the data that you will use to build AI models. Data engineers can further refine the data in the projects, and data scientists can use various Cloud Pak for Data services, tools, and features to import, explore, and analyze the data before it is used in your AI models.

Services and tools you can use What you can do Best to use when
Watson Studio
  • Connections
  • Connected data assets
  • Uploaded files
  • Add connections and connected data assets to the project.
  • Use the Asset Browser to select the data through a connection and add it to the project as connected data assets.
  • Upload files, such as CSV or image files that support the data analysis, to the project.
  • Search for connections and data assets in Watson Knowledge Catalog and add them to your project.
Use Watson Studio to add relevant data to your projects from connections, connected data assets, and uploaded files so you can visualize, explore, analyze, and train models.
Data Refinery
  • Visualize the data.
Use Data Refinery visualizations to view and explore the data interactively to better understand it.
Watson Studio:
  • Dashboards
  • Notebooks
  • Visualize data in the dashboards.
  • Visualize data in notebooks.
Use Watson Studio dashboards and notebooks to view and explore the data interactively to better understand it.
To learn about the services and tools that you can use to explore and analyze the data that you use in your AI models, see:

3. Build the models

To get predictive insights based on the data that you collected, refined and analyzed, the next step is to build and train models. Data scientists use Cloud Pak for Data services to build the AI models, ensuring that the right algorithms and optimizations are used to make predictions that help to solve business problems.

Services and tools you can use What you can do Best to use when
Watson Studio and Watson Machine Learning
  • AutoAI
  • Automatically select algorithms, engineer features, generate pipeline candidates, and train models by using the pipeline candidates, and then evaluate and rank those models and pipelines.
  • Deploy the models that you like to a space, or export the model training pipeline that you like from AutoAI into a notebook to refine it.
Use AutoAI when you want an advanced and automated way to build a good set of training pipelines and models quickly, and you want to be able to export the generated pipelines to refine them.
Watson Studio and Watson Machine Learning
  • Notebooks
  • Scripts
  • Write your own feature engineering and model training and evaluation code in Python or R, based on training data sets that are available in the project, or connections to data sources such as databases, data lakes, or object storage. Use your favorite algorithms and libraries.
Use notebooks and scripts to build models that use ML algorithms and frameworks when you want to use Python or R coding skills to have full control over the code that is used to create, train, and evaluate the models.
Watson Studio and Watson Machine Learning
  • SPSS® Modeler flows
  • Create your own model training, evaluation, and scoring flows based on training data sets that are available in the project, or connections to data sources such as databases, data lakes, or object storage.
With SPSS Modeler flows, you can build flows to prepare and blend data, build and manage models, and visualize the results. Use SPSS Modeler to build models when you want a simple way to explore data and define model training, evaluation, and scoring flows.
RStudio® Server with R 3.6
  • Analyze data and build and test models by working with R in an RStudio Server with R 3.6 development environment.

Use RStudio Server with R 3.6 when you want to use a development environment to work in R.

Watson Machine Learning Accelerator
  • Train neural networks by using a deep learning experiment builder.
Use Watson Machine Learning Accelerator you want to train thousands of models, train deeper neural networks, and explore more complicated hyperparameter spaces.
Decision Optimization
  • Prepare data
  • Import models
  • Solve models and compare scenarios
  • Visualize data, find solutions, produce reports
  • Save models to deploy with Watson Machine Learning
Use Decision Optimization when you need to evaluate millions of possibilities to find the best solution to a prescriptive analytics problem.

To learn about the services and tools that you can use to build your AI models, see:

4. Deploy the models

When operations team members deploy your AI models, the models become available for applications to use for scoring and predictions to help drive actions.

Services and tools you can use What you can do Best to use when
Watson Machine Learning
  • Spaces user interface (UI)
  • Deploy models and other assets from projects to spaces by using the user interface.

Deploy models and other assets to test or production environments by using a simple user interface.

Watson Machine Learning
  • Command-line tool (cpdctl)
  • Use the cpdctl command-line tool to manage the lifecycle of models, including the configuration settings, and to automate an end-to-end flow that includes training the model, saving it, creating a deployment space, and deploying the model.

Deploy and manage models to test or production environments from a command-line.

Watson OpenScale
  • Python SDK
  • Create, deploy, and view the results of detailed models.
  • Test and deploy your models as APIs.
Use the Python SDK for development and automation when you want to configure data, add your machine learning engine, and select and monitor deployments.
To learn about the services and tools that you can use to deploy your AI models, see:

5. Monitor the models

After models are deployed, it is important to govern and monitor them to make sure that they are explainable and transparent. Data scientists need to be able to explain how the models arrive at certain predictions so that they can determine whether the predictions have any implicit or explicit bias. In addition, it's a best practice to watch for model performance and data consistency issues during the lifecycle of the model.

Services and tools you can use What you can do Best to use when
Watson OpenScale
  • Monitor model fairness issues across multiple features.
  • Monitor model performance and data consistency over time.
  • Explain how the model arrived at certain predictions with weight factors.
  • Maintain and report on model governance and lifecycle across your organization.
Use Watson OpenScale to monitor models when you have features that are protected or that might contribute to prediction fairness, you need to trace model performance and data consistencies over time, or you need to know why the model gives certain predictions.
To learn about the services and tools that you can use to monitor your AI models, see:

Examples

Case study

To see an end-to-end example of a ModelOps scenario that uses Cloud Pak for Data and some of the key services, read ModelOps approach to modernizing your bank loan department.

Industry accelerators

You can use industry accelerators to help you implement ModelOps processes with Cloud Pak for Data. An industry accelerator is a set of artifacts that help you address common business needs. For example, you might use the Financial Markets Customer Attrition Prediction accelerator, which uses Cloud Pak for Data with Watson Knowledge Catalog, Watson Studio, and Watson Machine Learning to help you to predict the customers that might leave. You can browse the Accelerators catalog for the Cloud Pak for Data industry accelerators and download the ones that you want to use.