Important:

IBM Cloud Pak® for Data Version 4.6 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.6 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Tutorial: Training, deploying, and monitoring a model

This tutorial guides you through running a notebook that trains a machine learning model in Watson Studio, and creates an online deployment in Watson Machine Learning. You can then follow the steps to evaluate the deployed model in Watson OpenScale.

Tutorial overview

This tutorial uses the German Credit Risk data sets to explore how to build, deploy and evaluate a machine learning model. This is the same data set used in the Auto Setup for Watson OpenScale to demonstrate evaluation capabilities.

In this tutorial, you will complete the following steps:

Add the Jupyter notebook for the tutorial as a project asset
Run the notebook in Watson Studio
Add the deployed model to you Watson OpenScale dashboard
Monitor the model in Watson OpenScale

Note: This tutorial uses Watson Machine Learning as the machine learning provider, but you can perform all of the tasks described here with any supported model engine, such as Azure, Amazon Web Services, SPSS, or a custom engine.

Overview of the data sets

The German Credit Risk sample data provides a collection of records for bank customers who were used to train the sample model. It contains 20 attributes for each loan applicant. The sample models provisioned as part of the auto setup are trained to predict level of credit risk for new customers. Two of the attributes considered for the prediction - sex and age - can be tested for bias to make sure that outcomes are consistent regarding gender or age of customers.

The sample data is structured (in rows and columns) and saved in a .csv file format.

Before you begin

Complete the following steps to prepare for the tutorial:

Make sure you are provisioned to use Watson Studio, Watson Machine Learning and Watson OpenScale.
Create a project, where you can run the sample pipeline, and name it Credit risk.
Create a deployment space, where you can view and test the results, and name it Credit risk - preproduction. Copy the space GUID from the Manage tab. You will need this when you run the notebook.
Download the Sample training data file

You can view the sample data file in a text editor or spreadsheet program:

Preview of the German Credit Risk data set.

Run the sample notebook in a project

In this section, you will run the sample notebook to train a model that can predict credit risk for prospective borrowers.

Adding the notebook to a project

From the Credit risk project, select New asset and choose Jupyter notebook editor.
Enter Credit risk as the name for your new notebook, then click From URL and paste this URL: https://github.com/IBM/watson-openscale-samples/blob/main/Cloud Pak for Data/WML/notebooks/binary/spark/Watson OpenScale and Watson ML Engine.ipynb

Configuring and running the notebook

You must specify a runtime environment, then provide credentials to run the notebook.

Click the information icon, then choose a Spark kernel from the Environments list. For example, you can pick Spark 3.3 with Python 3.9.
Follow the instructions in the notebook and provide the following information where indicated in the notebook cells:
- API credentials from IBM Cloud.
- COS service credentials from IBM Cloud
- Deployment space GUID that you copied in the pre-requisites.
Run each cell of the notebook to:
- Load the sample data
- Train the model to predict a binary value for Risk.
- Store the model in Watson Machine Learning
- Create an online deployment
- Connect to Watson OpenScale and supply the model details required to connect to Cloud Object Storage

You are now ready to evaluate the deployed model in Watson OpenScale.

Monitor the model in OpenScale

To evaluate the credit risk model, you start by launching Openscale, then you add the deployment to the Watson OpenScale dashboard, then configure details about the model.

Adding the deployment to the Watson OpenScale dashboard

Launch Watson OpenScale from the Services list.
From the System setup tab, click Machine Learning providers, then click Add new provider to connect to the space in Watson Machine Learning that contains the deployed model.
In the connection page, edit the name of the provider to enter Credit risk model.
Edit the Connection section and choose Watson Machine Learning as the provider.
Choose Credit risk - preproduction as the space and Pre-production as the space type. Save your choices. A notification confirms the new connection. You are now ready to configure evaluations for a model in the selected space.

Connecting to the Credit risk deployment

Now that you have connected to your deployment space, you can connect to the deployment you created in that space and set up evaluations.

Click the Insights tab and choose Add to dashboard.
Choose Credit risk - preproduction from the provider list.
Select German credit risk online as the deployment to evaluate and click Configure.
Your connection is confirmed. Click Configure monitors to configure an evaluation of the deployment.
Choose Fairness as the monitor to configure.

Configuring the deployment

In this section, you provide some details about the model and how to connect to the labeled training data you used to train the Credit risk model. You will use the training data to evaluate whether the model is properly trained to predict fair outcomes.

Edit the Model input section.
Choose Numerical/categorical as the data type.
Choose Binary classification as the algorithm type.

Because the notebook contained the metadata such as authentication credentials for IBM Cloud and the location of your resources in Cloud Object Storage, the rest of the deployment configuration is done automatically.

Configuration details for Credit Risk deployment

Evaluating the Credit Risk model for fairness

To evaluate whether response outputs from the model are fair, results are divided into groups. The Reference groups are the groups that are considered most likely to have positive outcomes. In this case, the Reference groups are male customers and customers over the age of 25. The Monitored groups are the groups that you want to review to ensure that the results do not differ greatly from the results for the monitored groups. In this case, the Monitored groups are females and customers aged 19 - 25.

To set the thresholds:

Click Fairness from the Evaluations list to configure the Fairness monitor.
Edit the Favorable section to specify "No risk" as the favorable outcome and "Risk" as the unfavorable outcome.
Specify 100 as the minimum sample size and leave the maximum size blank.
Add Age as a feature to evaluate, where the Monitored group is 18 - 25 and the reference group is 26 - 55 and assign a Fairness threshold value of 95%.
Add Sex as a second feature to evaluate, where the Monitored group is "female" and assign a Fairness threshold value of 95%.

With these settings, an alert will trigger if the evaluation finds that the difference in favorable outcomes for the Monitored groups as compared to the Reference groups differs by more than 5%.

Running the evaluation and viewing the results

To run the Fairness evaluation:

Return to the dashboard and click Evaluate now for the deployment.
When you are prompted to add Test data for the evaluation, upload the sample test data file in CSV format that you downloaded as part of setup.
Click Evaluate to start the test.

The results of the evaluation show that the fairness test for age passed, but that the fairness test for sex failed, as the outcome for the monitored group (female) was below the fairness threshold set in relation to the outcome for the reference group (male).

Evaluation results for credit risk demo

The evaluation demonstrated that there could be bias in your model. The source of the bias could be from an insufficient number of records in your sample training data for the monitored group, or it could indicate a problem with your model.

Next steps

Use the sample deployment to configure other evaluations, such as monitoring for quality. Learn about how to interpret and address results in wos-insight-overview.html.

Next steps

Watson OpenScale glossary

Parent topic: Evaluating AI models with Watson OpenScale