Quality metrics overview
Use quality monitoring to determine how well your model predicts outcomes. When quality monitoring is enabled, it generates a set of metrics every hour by default. You can generate these metrics on demand by clicking the Check quality now button or by using the Python client.
Quality metrics are calculated based on the following information:
- manually labeled feedback data,
- monitored deployment responses for these data.
For proper monitoring, feedback data must be logged to Watson OpenScale on a regular basis. The feedback data can be provided either by using “Add Feedback data” option or using Python client or REST API.
For machine learning engines other than Watson OpenScale, such as Microsoft Azure ML Studio, Microsoft Azure ML Service, or Amazon Sagemaker ML quality monitoring creates additional scoring requests on the monitored deployment.
You can review all metrics values over time on the Watson OpenScale dashboard:
To review related details, such as confusion matrix for binary and multi-class classification, which are available for some metrics, click the chart.
Supported quality metrics
The following quality metrics are supported by Watson OpenScale:
Binary classification problems
For binary models, Watson OpenScale tracks when the quality of the model falls below an acceptable level. For binary classification models, it will check the Area under ROC score which measures the model’s ability to distinguish two classes. The higher the Area under ROC score, the better the model is at identifying class A as class A and class B as class B.
- Area under ROC
- True positive rate (TPR)
- Logarithmic loss
- False positive rate (FPR)
- Area under PR
Regression classification problems
For regression models, Watson OpenScale tracks when the quality of the model falls below an acceptable level and checks the R squared score. R squared measures correlation between predicted values and actual values. The higher the R squared score, the better the model fits to the actual values.
- R squared
- Proportion explained variance
- Root of mean squared error
- Mean absolute error
- Mean squared error
Mutliclass classification problems
For multi-classification models, Watson OpenScale tracks when the quality of the model falls below an acceptable level and checks the Accuracy score which is the percentage of predictions the model got right.
- Weighted True Positive Rate (wTPR)
- Weighted False Positive Rate (wFPR)
- Weighted recall
- Weighted precision
- Weighted F1-Measure
- Logarithmic loss
Supported quality details
The following details for quality metrics are supported by Watson OpenScale:
Confusion matrix helps you to understand for which of your feedback data the monitored deployment response is correct and for which it is not.
For more information, see Confusion matrix.
- After Watson OpenScale detects problems with quality, such as accuracy threshold violations, you must build a new version of the model that fixes the problem. Using the manually labelled data in the feedback table, you must retrain the model along with the original training data.