Adaptive thresholds in Smart Alerts
Adaptive thresholds are a powerful feature in Instana's alerting system that automatically adjusts to changing patterns in your metrics. Unlike static thresholds, which remain fixed regardless of your application's behavior, adaptive thresholds continuously evolve with your data, making them ideal for dynamic environments.
Adaptive thresholds (red line) follow the natural pattern of the metric data (blue line), adjusting automatically as the metric behavior changes.
When to use adaptive thresholds
Adaptive thresholds work best in the following scenarios:
- Metrics with gradual changes: When the underlying metric is expected to change gradually over time, but sudden deviations from this trend are undesirable.
- Seasonal metrics: When metrics follow daily or weekly patterns, and you need different thresholds for different times of the day or week.
- Metrics that evolve over time: When baseline behavior changes as your application grows or usage patterns shift.
How adaptive thresholds work
Forecasting algorithms
Adaptive thresholds use advanced time-series forecasting algorithm that excels at capturing both trends and seasonal patterns in your data. The system performs the following steps:
- Analyzes historical data to identify patterns.
- Predicts future values based on these patterns.
- Calculates confidence intervals around these predictions.
- Uses these intervals to determine appropriate thresholds.
Training period
The system requires historical data to establish reliable patterns. The required training period varies by alert type:
- Application Smart Alerts: At least 14 days of continuous metric data.
- Website Smart Alerts: At least 5 days of continuous metric data.
When you configure an adaptive threshold, the system:
- Analyzes existing historical metric data.
- Identifies baseline behavior.
- Detects seasonality patterns (if any).
- Establishes initial threshold values.
If sufficient historical data is available, the adaptive threshold becomes operational upon configuration. Once initialized, the system continuously updates as new data arrives, gradually adapting to changes in your application's behavior.
Model retraining
The system periodically retrains to make sure that it stays current with the evolving nature of your metrics. This retraining is important when:
- The seasonality pattern of a metric changes, for example, from nonseasonal to daily seasonal.
- Usage patterns shift because of business changes.
- Application behavior evolves after deployments or configuration changes.
This automatic retraining make sure that your thresholds remain relevant without manual intervention, even as the fundamental characteristics of your metrics change over time. Currently, the system retains models once per day.
Key parameters
Sensitivity
The sensitivity parameter controls how tightly the threshold fits around your metric data:
- Range: 1-16.
- Function: Controls the number of standard deviations (sigma) used to calculate threshold values.
- Effect:
- Lower values correspond to higher sensitivity and create tighter thresholds that detect smaller changes.
- Higher values correspond to lower sensitivity and create wider thresholds that detect only major changes.
High sensitivity setting (lower value) creates tighter thresholds around the metric (blue line), detecting smaller deviations.
Low sensitivity setting (higher value) creates wider thresholds around the metric (blue line), only detecting major deviations.
For alert conditions that use the > operators (for example, when response latency exceeds a threshold), the threshold is calculated by adding a certain margin to the predicted value. The sensitivity setting determines this margin by multiplying it with the calculated deviation.
For alert conditions that use the < operators (for example, when availability drops below a threshold), the threshold is calculated by subtracting a certain margin from the predicted value. The sensitivity setting also determines this margin by multiplying it with the calculated deviation.
How the system handles outliers
To prevent extreme values from distorting the thresholds, the system uses a clipping mechanism:
- During initialization: Extreme values are clipped based on percentiles of the historical data.
- After initialization: Values are clipped based on the system's predictions and confidence intervals.
This mechanism makes sure that occasional spikes or drops do not influence the threshold calculations while still allowing the system to adapt to genuine changes in metric behavior.
Best practices
Choose the right sensitivity
- Higher sensitivity (1-2): Use for critical metrics where small deviations matter.
- Medium sensitivity (2-4): Use as a default for most metrics.
- Lower sensitivity (4-16): Use for noisy metrics or when you only want alerts for major deviations.
- Lower adaptability (0.0-0.1): Use when threshold stability is more important over responsiveness to metric fluctuations.
- Medium adaptability (0.1-0.5): Provides an optimal balance between threshold stability and responsiveness.
- Higher adaptability (0.5-1.0): Appropriate when thresholds must rapidly adjust to changing metric patterns.
Draft comment:
Seasonality selection (private preview)
Seasonality selection (private preview)
- Start with Auto detection.
- Switch to No seasonality if your metric does not follow regular patterns.
- Select Daily or Weekly if you know that your metric follows these specific patterns.
Known limitations
Adaptive thresholds provide powerful capabilities for dynamic alerting, but the following limitations apply:
- Tag filtering restrictions: Some tags might not be available for filtering with adaptive thresholds. The system limits which tags you can combine with adaptive threshold configurations.
- Application Smart Alerts grouping: Application Smart Alerts do not support grouping by endpoint when you use adaptive thresholds. If you need endpoint-level grouping, use static thresholds.
Conclusion
Adaptive thresholds provide a powerful way to create dynamic, self-adjusting alerts that evolve with your application. By understanding how sensitivity, adaptability, and seasonality settings work together, you can fine-tune your alerts to catch real issues and minimizing the number of false positives.
Adaptive thresholds require sufficient historical data to establish reliable patterns. The specific training period depends on the alert type (see the preceding Training Period section for details). When enough data is available, the threshold becomes operational immediately after configuration. With proper configuration, adaptive thresholds can significantly reduce alert noise and help you focus on what truly matters.
Choosing the right adaptability (private preview)