Published: 18 January 2024
Contributor: Jim Holdsworth
Model drift refers to the degradation of model performance due to changes in data or changes in relationships between input and output variables. Model drift—also known as model decay—can negatively impact model performance, resulting in faulty decision-making and bad predictions.
To detect and mitigate drift, organizations can monitor and manage performance on their data and artificial intelligence platform. Performance of the model might begin well, but if not properly monitored over time, even the most well-trained, unbiased AI model can “drift” from its original parameters and produce unwanted results once deployed.
If an AI model’s training doesn’t align with incoming data, it can’t accurately interpret that data or use that live data to reliably make accurate predictions. If drift isn’t detected and mitigated quickly, it can digress further, increasing the harm to operations.
Models built using historical data can quickly become stagnant. In many cases, new data points are always coming in—new variations, new patterns, new trends—that the old historical data cannot capture.
Accelerate responsible, transparent and explainable AI workflows.
Subscribe to the IBM newsletter
The world is constantly changing, so with constantly changing data, models used to make sense of the world must be constantly reviewed and updated. Here are three types of model drift that need to be addressed, each with a different cause.
First is concept drift, which occurs when there is a shift between the input variables and the target variable, at which point the algorithm begins to provide incorrect answers because the definitions are no longer valid. The shift in independent variables can take effect over a variety of time periods that are:
The concept drift reoccurs and recedes on a regular basis, such as for the seasonality of buying behavior in response to weather changes. In wintry climates, snow shovel and snow blower sales will normally increase in the late autumn and early winter. Geographic adjustments must also be made for expected snowfall.
An unexpected development can drive new buying patterns. An example would be the sudden publicity around ChatGPT creating increased demand for AI hardware and software products, and a boost to the stock value of AI-related companies. A forecasting model trained before those news stories were published could not predict subsequent results. Another example is the arrival of the Covid-19 pandemic, which also created a sudden shift in behavior: sales of games and exercise equipment spiked, while restaurants and hotels were seeing many fewer visitors.
Some drift happens gradually, or at an expected pace. For example, spammers and hackers have used a variety to tools and tricks over the years. As protective software and spam filters haves improved, bad actors have upped their game accordingly. Any AI designed to provide protection for digital interactions needs to keep pace; a static model will soon be useless.
Second is data drift, where the underlying data distribution of the input data has changed. In retail, the sales of a product could be impacted by the introduction of another new product or the retirement of a competing product. Or if a website is first adopted by young people, but then gains acceptance from older people, the original model based on the usage patterns of younger users might not perform as well with the older user base.
Third is an upstream data change, which occurs when there is a change in the data pipeline. For example, upstream data could be changed to a different currency, such as USD vs Euros, or measurements in miles rather than kilometers or temperatures in Fahrenheit instead of Celsius. Such a change would throw off a model that wasn’t built to account for the change in how the data was labeled.
In order for organizations to detect and help correct model drift, they should consider the following.
The accuracy of an AI model can degrade within days of deployment because production data diverges from the model’s training data. This can lead to incorrect predictions and significant risk exposure. Organizations should use an AI program and monitoring tools that automatically detect when a model’s accuracy decreases (or drifts) below a pre-set threshold. This program for detecting model drift should also track which transactions caused the drift, enabling them to be re-labelled and used to retrain the model, restoring its predictive power during runtime.
There are two ways you can measure drift. First is statistical, which uses statistical metrics. This is often easier to implement because most of the metrics are usually already in use within the enterprise. Second is model-based. This measures the similarity between a point or groups of points versus the reference baseline.
Organizations should test their AI models periodically throughout their lifecycle. This testing ideally includes:
According to a Forrester Total Economic Impact study, “By building, running and managing models in a unified data and AI environment, [organizations] can ensure that the AI models remain fair, explainable and compliant anywhere. This end-to-end AI approach also uniquely empowers an organization to detect and help correct model drift and bias, and manage model risk when an AI model is in production.”
A best practice is to manage all models from a central dashboard. An integrated approach can help an organization track metrics continually and alert teams to drift in accuracy and data consistency through development, validation and deployment. A centralized, holistic view can help organizations break down silos and provide more transparency across the entire data lineage.
Detect drift scenarios and magnitude through an AI model that compares production and training data and model predictions in real time. This way, drift can be found quickly and retraining begun immediately. This detection is iterative, just as machine learning operations (MLOps) is iterative.
Time-based analysis is helpful to seeing how drift evolved and when it occurred. For example, if checks are run weekly, that will show how drift evolved each day. Analyzing timelines can also be helpful to determine if the drift was gradual or sudden.
Use a new training dataset that has more recent and relevant samples added to it. The goal is to get your large language models (LLM) back into production quickly and correctly. If retraining the model doesn’t resolve the issue, then a new model may be needed.
Rather than train a model with batch data, organizations can practice “online learning” by having their machine learning (ML) models updated using the latest real-world data as soon as it is available.
A model can appear to drift because the data used to train it is significantly different from the actual production data that will be used. In a medical use case, if high-resolution scans are used in training, but only low-resolution scans are available in the field, then the results will be incorrect.
Accelerate responsible, transparent and explainable AI workflows for both generative AI and machine learning models.
Track model performance. Receive alerts when drift occurs in model accuracy and data consistency.
Gain a competitive advantage with data governance strategies.
Unlock insights from CDOs and data science leaders to build a trusted data strategy that accelerates revenue growth.
Unlock expert perspectives on how your organization can improve data quality, AI performance and trust.
Accelerate responsible, transparent and explainable AI workflows across the lifecycle for both generative and machine learning models. Direct, manage, and monitor your organization’s AI activities to better manage growing AI regulations and detect and mitigate risk.