AI for the Enterprise

Keep your AI applications on track with runtime accuracy monitoring

Share this post:

Learn how IBM Watson OpenScale can help you close the feedback loop for your machine learning models:

  • Why is maintaining model accuracy a challenge?
  • What is runtime accuracy monitoring, and how does it work?
  • How can we configure accuracy monitoring in IBM Watson OpenScale?

When we talk about measuring the accuracy of a machine learning model, we need to consider two distinct parts of the model’s lifecycle: the build stage and the deployment stage.

In the build stage, data scientists train a model against a training data set, and then evaluate its performance by comparing the model’s output against the labeled test data to calculate appropriate accuracy metrics.

The type of metric depends on the type of model—for a regression algorithm, you would use the coefficient of determination (R²); for a binary classification algorithm, the area under an ROC curve; and for a multi-class classification algorithm, the number of times any class was predicted correctly, normalized by the number of data points.

These measurements help the data science team judge whether a model is ready for production; if so, it can be integrated into business applications and move into the deployment stage.

Deployment creates a blind spot for accuracy

The deployment stage is when accuracy becomes truly important, because it’s the point when the business starts using the output of the model to make decisions. However, ironically, it’s also the point when accuracy becomes much more difficult to measure. Instead of having a neat set of labeled training and testing data to compare against, the model is handling unlabeled real-world data at scale, and in many cases, there is no easy way to tell whether its accuracy is degrading over time.

If the model does start to falter, and the business doesn’t realize that its recommendations are no longer reliable, it can have a negative impact on decision-making and potentially damage the business. A bank might find that it has rejected customers’ loan applications unnecessarily, for example; or a manufacturer might suffer downtime on its production line because its predictive maintenance model fails to anticipate a breakdown in one of its machines.

To mitigate these risks, data science teams will periodically retrain and redeploy models. However, the choice of when to retrain is typically ad hoc, and often comes too late. As a result, the data science team may spend valuable time retraining models that are still performing well, or may leave invalid models in production for too long, allowing them to make poor predictions that impact business decision-making.

Runtime accuracy monitoring with IBM Watson OpenScale

IBM Watson OpenScale offers a better approach by using its built-in payload and feedback databases to enable accuracy monitoring during runtime.

The payload database captures the real-time data that each model is scoring. A sample of this payload data can be assigned to human reviewers (who could be subject-matter experts from the relevant area of the business, or external crowdsourcing specialists such as Figure Eight) for manual labeling.

Once these experts have labeled the data, it is automatically added to the feedback database. Watson OpenScale uses the feedback database to calculate a new accuracy metric for the model and displays it in the dashboard. From the dashboard, data scientists can then click through to examine the specific transactions that generated the inaccurate results, and decide whether retraining is justified. This operation runs continuously as long as sufficient data is available in the feedback database. Thus, it provides an immediate view of model accuracy in the runtime.

By generating real-time alerts if the accuracy score falls below an acceptable threshold, Watson OpenScale helps to ensure that data scientists and operations personnel are immediately notified as the model starts to degrade—reducing the risk of unknowingly using an unreliable model to make decisions.

At the same time, the solution helps to unburden the data science team and empower line-of-business subject-matter experts to play a more active role in model management. These subject-matter experts know the data and the intended business outcomes, so they are often the best people to provide feedback on the model’s predictions. If real feedback data is not available, they are also in the best position to assess whether it is appropriate to source labeled data from alternate sources, such as a crowdsourcing vendor. As a result, by giving these experts greater control of the process, Watson OpenScale can help businesses troubleshoot problems faster and improve results.

Watson OpenScale walkthrough

Let’s take a look at how to set up an Accuracy Monitor in Watson OpenScale. First, navigate to the “What is Accuracy?” page, and click Next.

If your model was created using Apache Spark, you need to select a Spark instance to act as an engine for re-evaluating and retraining the model. You can either select an existing Spark instance or create a new one if necessary.

On the next screen, you specify the type of the model: binary classification, multi-class classification, or regression. This enables Watson OpenScale to calculate the appropriate accuracy metrics for the model type.

Next, you set the accuracy threshold as a percentage between 0 and 100. Once the Accuracy Monitor is up and running, it will automatically notify the model owner when the accuracy score falls below this threshold.

On the next screen, you specify the type of the model: binary classification, multi-class classification, or regression. This enables Watson OpenScale to calculate the appropriate accuracy metrics for the model type.

Finally, you set a minimum sample size for the evaluation data set in the feedback database. This will prevent the Accuracy Monitor from running the evaluation and updating the accuracy score until the selected number of records have been added to the feedback database. Setting an appropriate sample size is important, because a small sample could potentially skew the results.

When you click Save, the Accuracy Monitor will be provisioned and will start to run. To see the results, simply navigate to the Insights tab on the dashboard, where each model will now appear as a separate tile, displaying its current accuracy score and a warning if the score is lower than the threshold.

For more detail, you can click on one of the tiles to view charts and visualizations, showing how accuracy and other key metrics such as fairness and performance are changing over time.

This is just a brief glimpse of how IBM Watson OpenScale can transform the day-to-day management of machine learning models and make it easier to run AI-infused applications safely and at scale.

Discover for yourself IBM Watson OpenScale.

More stories
February 15, 2019

IBM’s Think 2019 highlighted best of AI customer service applications

IBM's Think 2019 was an industry-leading conference that took place February 12-15 in San Francisco and included 2,000+ technical and business sessions with execs from ExxonMobil, Sprint, Honda, KPM, and others. In addition, more than 800 leaders, 400 developers and 200 distinguished engineers from IBM appeared on stage and in private sessions. Speakers ranged from football great Joe Montana, skateboard pioneer Tony Hawk and astronaut Taylor Richardson.

Continue reading

February 12, 2019

Welcome to Think 2019!

Welcome to Think 2019! We decided to try something new this year and update this blog in real-time, so that you may have a glimpse into the conference experience. In case you have not been following along, Think 2019 is IBM's only conference of the year, taking place February 12-15 in San Francisco.

Continue reading

February 11, 2019

IBM Watson is now available anywhere

The team at IBM is thrilled to announce the next big evolution with Watson Anywhere. Now, Watson is available to reach data stored on any cloud—anywhere, making it more open and accessible than ever before.

Continue reading