Service level objective (SLO) Configuration Examples
Example 1: Application SLO with latency blueprint
Objective: Ensure 90% of the calls of the "Robot-shop" application have an average latency of better than 100 ms over a fixed period of 1 week.
The configuration of the SLO would be:
-
Entity: Robot-shop application
- Scope:
- Boundary: All services
- Include internal calls: false
- Include synthetic calls: false
- Service: All services
- Endpoint: All endpoints
-
Indicator:
- Blueprint: Latency
- Type: Time
- Aggregation: mean
- Threshold: 100 ms
-
Objective:
- SLO target: 90%
- Time window type: Rolling
- Time window length: 1 week
Scenario: Assuming the SLO had 400 bad minutes over the week-long SLO time window (400 minutes had mean latency > 100 ms) starting from 2025-03-04:
The error budget for this SLO would be calculated as:
- Minutes in time period x (1 - SLO target percentage)
- Total minutes in time window: 24 × 60 × 7 minutes in 1 week = 10080 minutes
- SLO target percentage: 90% (0.9)
- Error budget: 10080 x (1 - 0.9) = 1008 minutes
The SLO status would be calculated as:
- SLO status = 100% x (total minutes in time window - bad minutes in time window) / total minutes in time window
- 100% x (10080 total minutes - 400 bad minutes) / 10080 total minutes = 96.03%
Example 2: Website SLO with event-based availability blueprint
Objective: Ensure HTTP requests to the shopping cart page (cart.html) of Demo website could achieve 95% of availability over a fixed period of 4 days beginning on 1 March 2025.
The configuration of the SLO would be:
- Entity: Demo website
- Beacon: HTTP requests
- Custom filter: Location > Page name = Cart
- Indicator:
- Blueprint: Availability
- Type: Event count (Total count of good calls vs bad calls)
- Objective:
- SLO target: 95%
- Time window type: Fixed
- Time window length: 4 days
- Start: 2025-03-01 0:00
Scenario: Assuming there are 234 successful HTTP requests and 11 failed HTTP requests during the SLO time window starting from 1 March 2025:
The error budget for this SLO would be calculated as:
- Event:
- Good events count: 234 beacons
- Bad events count: 11 beacons
- Total events count: 234 + 11 = 245 beacons
- SLO target percentage: 95% (0.95)
- Error budget: 245 x (1 - 0.95) = 12 beacons
- Error budget remaining: 12 - 11 = 1 beacon
- Error budget remaining percentage: 100% * (12 - 11) / 12 = 8.33%
The SLO status would be calculated as:
- SLO status = 100% x total good events count in time window / total events count in time window
- 100% x 234 beacons / 245 beacons = 95.5%
Example 3: Synthetic monitoring with traffic blueprint
Objective: Ensure 3 synthetic monitoring tests (shopping cart, home page and product list page) are run 15 times every minute targeting 99% regularly over a fixed period of 1 week beginning on 2025-03-18.
The configuration of the SLO would be:
-
Entities:
- shopping cart test
- home page test
- product list page test
-
Indicator:
- Blueprint: Traffic
- Threshold: > 15 results per minute
-
Objective:
- SLO target: 99%
- Time window type: Fixed
- Time window length: 1 week
- Start: 2025-03-18 0:00
Scenario: Assuming the SLO had 21 minutes where synthetic tests are run less than 15 times during the SLO time window starting from 2025-03-18:
The error budget for this SLO would be calculated as:
- Minutes in time period x (1 - SLO Target Percentage)
- Total minutes: 24 × 60 × 7 minutes in 1 week = 10080 minutes
- SLO target Percentage: 99% (0.99)
- Error budget: 10080 x (1 - 0.99) = 101 minutes
- Error budget remaining: 101 - 21 = 80
- Error budget remaining percentage: 100% * (101 - 21) / 101 = 79.21%
The SLO status would be calculated as:
- SLO status = 100% x (total minutes in time window - bad minutes in time window) / total minutes in time window
- 100% x (10080 total minutes - 21 bad minutes) / 10080 total minutes = 99.8%
Example 4: Synthetic monitoring SLO with event-based availability using tag filters
Objective: Ensure that synthetic checkout journey tests achieve at least 99% availability over a fixed period of 1 day.
The configuration of the SLO would be:
- Entitiy: Synthetic
- Scope:
- Selection method: Based on filters
- Tag Filter Expression: synthetic.testName startsWith "checkout"
- Indicator:
- Blueprint: Availability
- Type: Event count (Total count of good test results vs bad test results)
- Good event: call.erroneous = false
- Bad event: call.erroneous = true
- Objective:
- SLO target: 99%
- Time window type: Fixed
- Time window length: 1 day
- Start: 2026-01-18 0:00
Scenario: Assume 14,850 successful synthetic test executions and 150 failed executions during the SLO time window that start on 2026‑01‑18.
The error budget for this SLO would be calculated as:
- Event:
- Good events count: 14,850 test executions
- Bad events count: 150 test executions
- Total events count: 14,850 + 150 = 15,000 executions
- SLO target percentage: 99% (0.99)
- Error budget: 15,000 × (1 − 0.99) = 150 executions
- Error budget remaining: 150 − 150 = 0 executions
- Error budget remaining percentage: 100% × (0 / 150) = 0%
The SLO status would be calculated as:
- SLO status = 100% × total good events count / total events count
- 100% × 14,850 / 15,000 = 99%
Example 5: Application SLO with Custom blueprint
Objective: Ensure 98% of the calls of the “Robot-shop” application do not result in an HTTP status code of 400 over a rolling period of 1 day.
The configuration of the SLO would be:
-
Entity: Robot-shop application
- Scope:
- Boundary: All services
- Include internal calls: false
- Include synthetic calls: false
- Service: All services
- Endpoint: All endpoints
-
Indicator:
- Blueprint: Custom
- Type: Event count (Total count of good calls vs bad calls)
- Good Filter: HTTP status code != 400
- Bad Filter: HTTP status code = 400
-
Objective:
- SLO target: 98%
- Time window type: Rolling
- Time window length: 1 day
Scenario: Assuming the SLO had 25000 good calls and 200 bad calls over the one day long SLO time window starting from 2025-03-10:
The error budget for this SLO would be calculated as:
- Event:
- Good events count: 25000 calls
- Bad events count: 200 calls
- Total events count: 25000 + 200 = 25200 calls
- SLO target percentage: 98% (0.98)
- Error budget: 25200 x (1 - 0.98) = 504 calls
- Error budget remaining: 504 - 200 = 304 calls
- Error budget remaining percentage: 100% * (504 - 200) / 504 = 60.32%
The SLO status would be calculated as:
- SLO status = 100% x total good events count in time window / total events count in time window
- 100% x 25000 calls / 25200 calls = 99.2%
Example 6: Application SLO with event-based latency blueprint
Objective: Ensure 92% of the calls of the “Robot-shop” application have latency of better than 100 ms over over a fixed period of 2 weeks.
The configuration of the SLO would be:
-
Entity: Robot-shop application
- Scope:
- Boundary: All services
- Include internal calls: false
- Include synthetic calls: false
- Service: All services
- Endpoint: All endpoints
-
Indicator:
- Blueprint: Latency
- Type: Event count (Total count of good calls vs bad calls)
- Good call: Latency < 100 ms
- Bad call: Latency >= 100 ms
-
Objective:
- SLO target: 92%
- Time window type: Fixed
- Time window length: 2 weeks
- Start: 2025-03-10 0:00
Scenario: Assuming the SLO had 50000 good calls and 1000 bad calls over the two week long SLO time window starting from 2025-03-10:
The error budget for this SLO would be calculated as:
- Event:
- Good events count: 50000 calls
- Bad events count: 1000 calls
- Total events count: 50000 + 1000 = 51000 calls
- SLO target percentage: 92% (0.92)
- Error budget: 51000 x (1 - 0.92) = 4080 calls
- Error budget remaining: 4080 - 1000 = 3080 calls
- Error budget remaining percentage: 100% * (4080 - 1000) / 4080 = 75.49%
The SLO status would be calculated as:
- SLO status = 100% x total good events count in time window / total events count in time window
- 100% x 50000 calls / 51000 calls = 98.04%
Example 7: Website SLO with time-based availability blueprint
Objective: Ensure HTTP requests to the Demo website could achieve 92% of availability with less than 5% of error rate over a rolling period of 3 days.
The configuration of the SLO would be:
- Entity: Demo website
- Beacon: HTTP requests
- Indicator:
- Blueprint: Availability
- Type: Time
- Error Rate Threshold: 5%
- Objective:
- SLO target: 92%
- Time window type: Rolling
- Time window length: 3 days
Scenario: Assuming the SLO had 200 bad minutes over the 3 day long SLO time window (200 minutes had mean error rate greater than 5%) starting from 2025-03-05:
The error budget for this SLO would be calculated as:
- Minutes in time period x (1 - SLO target percentage)
- Total minutes in time window: 24 × 60 × 3 minutes in 3 days = 4320 minutes
- SLO target percentage: 92% (0.92)
- Error budget: 4320 x (1 - 0.92) = 346 minutes
The SLO status would be calculated as:
- SLO status = 100% x (total minutes in time window - bad minutes in time window) / total minutes in time window
- 100% x (4320 total minutes - 200 bad minutes) / 4320 total minutes = 95.37%
Example 8: Infrastructure SLO with time-based saturation blueprint (CPU)
Objective: Ensure CPU utilization on production hosts stays below 75% for 99% of the time over a rolling period of 7 days.
- Entity: Infrastructure
- Infrastructure type: Host
- Tag Filter Expression:
availabilityZone = "us-east-1"
- Indicator:
- Blueprint: Saturation
- Type: Time
- Metric:
cpu.used - Aggregation: Mean
- Operator: >=
- Threshold: 75%
- Objective:
- SLO target: 99%
- Time window type: Rolling
- Time window length: 7 days
- Minutes in time period x (1 - SLO target percentage)
- Total minutes in time window: 24 × 60 × 7 minutes in 1 week = 10080 minutes
- SLO target percentage: 99% (0.99)
- Error budget: 10080 x (1 - 0.99) = 101 minutes
- SLO status = 100% x (total minutes in time window - bad minutes in time window) / total minutes in time window
- 100% x (10080 total minutes - 25 bad minutes) / 10080 total minutes = 99.75%
Example 9: Infrastructure SLO with event-based saturation blueprint (Memory)
Objective: Ensure memory utilization on database servers stays below 85% for 99.9% of metric snapshots over a rolling period of 1 day.
- Entity: Infrastructure
- Infrastructure type: Host
- Tag Filter Expression:
host.fqdn contains "company-name" AND availabilityZone = "us-east-1"
- Indicator:
- Blueprint: Saturation
- Type: Event count (Total count of good metric snapshots vs bad metric snapshots)
- Metric:
memory.used - Operator: >=
- Threshold: 85%
- Objective:
- SLO target: 99.9%
- Time window type: Rolling
- Time window length: 1 day
- Event:
- Good events count: 8,640 snapshots
- Bad events count: 5 snapshots
- Total events count: 8,640 + 5 = 8,645 snapshots
- SLO target percentage: 99.9% (0.999)
- Error budget: 8,645 x (1 - 0.999) = 8.65 snapshots (rounded to 9)
- Error budget remaining: 9 - 5 = 4 snapshots
- Error budget remaining percentage: 100% * (9 - 5) / 9 = 44.44%
- SLO status = 100% x total good events count in time window / total events count in time window
- 100% x 8,640 snapshots / 8,645 snapshots = 99.94%
Example 10: Infrastructure SLO with custom blueprint for Kubernetes cluster
Objective: Ensure the production Kubernetes cluster maintains at least 6 available nodes for 99.9% of the time over a rolling period of 7 days.
- Entity: Infrastructure
- Infrastructure type: Kubernetes Cluster
- Tag Filter Expression:
kubernetes.cluster.name = "prod-cluster"
- Indicator:
- Blueprint: Custom
- Type: Event count (Total count of good metric snapshots vs bad metric snapshots)
- Good events: nodes.count >= 6
- Bad events: nodes.count < 6
- Objective:
- SLO target: 99.9%
- Time window type: Rolling
- Time window length: 7 days
- Event:
- Good events count: 60,400 snapshots
- Bad events count: 80 snapshots
- Total events count: 60,400 + 80 = 60,480 snapshots
- SLO target percentage: 99.9% (0.999)
- Error budget: 60,480 x (1 - 0.999) = 60.48 snapshots (rounded to 60)
- Error budget remaining: 60 - 80 = -20 snapshots (exceeded)
- Error budget remaining percentage: 100% x (-20 / 60) = -33.33% (overspent)
- SLO status = 100% x total good events count in time window / total events count in time window
- 100% x 60,400 snapshots / 60,480 snapshots = 99.87%
Example 11: SLO behavior with time zone binding
Objective: Make sure that the SLO time window matches calendar time and stays the same each day, even during Daylight Saving Time changes, for accurate and consistent reporting. Ensure the SLO calculation is bound to a specific time zone, as the daylight saving time transition impacts the time window according to the configured time zone.
- Entity: Robot-shop application
- Scope:
- Boundary: All services
- Include internal calls: false
- Include synthetic calls: false
- Service: All services
- Endpoint: All endpoints
- Scope:
- Indicator:
- Blueprint: Latency
- Type: Time
- Aggregation: mean
- Threshold: 100 ms
- Objective:
- SLO target: 90%
- Time window type: Fixed
- Time window length: 3 days
- Bind Time zone: Enable
- Time zone: Europe/Berlin
Example 12: SLO with team association
Objective: Assign the SLO to one or more teams associated with the user. These team associations are then utilized to enforce access restrictions.
-
Entity: Robot-shop application
- Scope:
- Boundary: All services
- Include internal calls: false
- Include synthetic calls: false
- Service: All services
- Endpoint: All endpoints
- Scope:
-
Indicator:
- Blueprint: Latency
- Type: Time
- Aggregation: mean
- Threshold: 100 ms
-
Objective:
- SLO target: 90%
- Time window type: Rolling
- Time window length: 1 week
-
Details:
- Name: Sample SLO
- Tags (optional): Sample Tag
- Teams: Team 1, Team 2
Example 13: Calendar month SLO created mid-month
Objective: Ensure that 99% of calls to the "Payment Service" application have an average latency less than 200 ms over calendar-month periods, with the SLO created on 15 January.
The configuration of the SLO would be:
- Entity: Payment Service application
- Scope:
- Boundary: Inbound calls
- Include internal calls: false
- Include synthetic calls: false
- Service: All services
- Endpoint: All endpoints
- Scope:
- Indicator:
- Blueprint: Latency
- Type: Time
- Aggregation: mean
- Threshold: 200 ms
- Objective:
- SLO target: 99%
- Time window type: Fixed
- Time window length: 1 calendar month
- Bind Time zone: Enable
- Time zone: America/New_York
- Start: 2025-01-15 00:00
Scenario: The SLO is created on 15 January 2025. Calendar month SLOs align with month boundaries, creating a partial first period when created mid-month.
- Duration: 17 days (January 15 through January 31)
- Total minutes: 17 × 24 × 60 = 24,480 minutes
- Error budget: 24,480 × (1 - 0.99) = 245 minutes
- Bad minutes recorded: 30 minutes
- SLO status: 100% × (24,480 - 30) / 24,480 = 99.88%
- Error budget remaining: 245 - 30 = 215 minutes
- Duration: 28 days (complete calendar month)
- Total minutes: 28 × 24 × 60 = 40,320 minutes
- Error budget: 40,320 × (1 - 0.99) = 403 minutes
- Bad minutes recorded: 50 minutes
- SLO status: 100% × (40,320 - 50) / 40,320 = 99.88%
- Error budget remaining: 403 - 50 = 353 minutes
- Duration: 31 days (complete calendar month)
- Total minutes: 31 × 24 × 60 = 44,640 minutes
- Error budget: 44,640 × (1 - 0.99) = 446 minutes
Example 14: Calendar month SLO created on first day of month
Objective: Ensure that 95% of HTTP requests to the "E-commerce website" achieve availability over calendar month periods, with the SLO created on 1 March.
The configuration of the SLO would be:
- Entity: E-commerce Website
- Beacon: HTTP requests
- Custom filter: None
- Indicator:
- Blueprint: Availability
- Type: Time
- Error Rate Threshold: 5%
- Objective:
- SLO target: 95%
- Time window type: Fixed
- Time window length: 1 calendar month
- Bind Time zone: Enable
- Time zone: UTC
- Start: 2025-03-01 00:00
Scenario: The SLO is created on 1 March 2025 (the first day of the month). All measurement periods are complete calendar months.
- Duration: 31 days (complete calendar month)
- Total minutes: 31 × 24 × 60 = 44,640 minutes
- Error budget: 44,640 × (1 - 0.95) = 2,232 minutes
- Bad minutes recorded: 400 minutes
- SLO status: 100% × (44,640 - 400) / 44,640 = 99.10%
- Error budget remaining: 2,232 - 400 = 1,832 minutes
- Duration: 30 days (complete calendar month)
- Total minutes: 30 × 24 × 60 = 43,200 minutes
- Error budget: 43,200 × (1 - 0.95) = 2,160 minutes
- Bad minutes recorded: 350 minutes
- SLO status: 100% × (43,200 - 350) / 43,200 = 99.19%
- Error budget remaining: 2,160 - 350 = 1,810 minutes
Service levels smart alerts configuration examples
Example 1: Service levels smart alert to monitor the status of an SLO
Objective : Alert and raise an issue if the status of the Vending Machine Reliability SLO Configuration is less than 90%.
The configuration of the Service levels smart alert would be:
Rule:
Alert Type: Service Levels Objective
Metric: Status
Threshold:
Operator: <
value: 0.90
SLOs: Vending Machine Reliability
Time Threshold:
Expiry: 5 Minutes
Time window: 10 Minutes
Once the Smart Alert configuration is set up, the system will begin monitoring the status of the Vending Machine Reliability SLO configuration.
Scenario: Monitoring and Event Triggering
-
SLO Drops Below Threshold
- Assume the Vending Machine Reliability SLO drops to 89%, below the defined threshold of 90%.
- With the time threshold set to 10 minutes, the system will wait for the entire 10-minute window before taking any action.
- If the SLO remains below 90% after 10 minutes, the system triggers an event, raising an issue.
-
SLO Returns Above Threshold
- If, after some time, the SLO status recovers and rises above 90%, the system will continue monitoring.
- However, if the status stays above 90%, the system will wait for the expiry time threshold of 5 minutes.
- If the SLO remains above 90% for the full 5 minutes, the event will be automatically closed.
Example 2: Service levels smart alert to monitor the error budget of an SLO
Objective : Alert and raise an issue if the error budget consumption percentage of the Vending Machine Reliability SLO Configuration is more than 50%.
The configuration of the Service levels smart alert is:
Rule:
Alert Type: Error Budget
Metric: Burned Percentage
Threshold:
Operator: >
value: 0.50
SLOs: Vending Machine Reliability
Time Threshold:
Expiry: 5 Minutes
Time window: 10 Minutes
Once the smart alert configuration is set up, the system will begin monitoring the error budget consumption percentage of the Vending Machine Reliability SLO configuration.
Scenario: Monitoring and Event Triggering
-
Error budget consumption exceeds 50%
- Assume the error budget consumption exceeds 50%.
- With the time threshold set to 10 minutes, the system will wait for the full 10-minute window before taking any action.
- If the error budget consumption remains above 50% after 10 minutes, the system triggers an event, raising an issue.
-
Error budget consumption drops below 50%
- If, after some time, the error budget consumption drops back below 50%, the system will continue monitoring.
- If the error budget consumption stays below 50%, the system will wait for the expiry time threshold of 5 minutes.
- If the error budget consumption remains below 50% for the full 5 minutes, the event will be automatically closed.
Service levels burn rate smart alert calculation
The burn rate is calculated using the formula:
Burn Rate = (Error Budget Consumed * SLO Time Window) / Alerting Window
- For example:
- Assume the error budget consumed over the last 12 hours is 70%, and the SLO time window for the Vending Machine Reliability SLO is 1 day (24 hours).
- The burn rate for the last 12 hours would be: (0.70 * 24) / 12 = 1.4
- Similarly, if the error budget consumed for the last 2 hours is 20%
- The burn rate for the last 2 hours would be: (0.20 * 24) / 2 = 2.4
Example 3 - Smart Alert to monitor the burn rate of an SLO with a single alerting window and threshold
Objective: Alert and raise an issue if the burn rate of the Vending Machine Reliability SLO configuration is more than 1 for the last 12 hours.
The configuration of the Service levels smart alert should be:
Rule:
Alert Type: Error Budget
Metric: Burn Rate V2
Burn Rate Config:
[
Alert Window Type: SINGLE
Duration: 12 Hours
Duration Unit Type: Hour
Threshold:
Operator: >
Value: 1
]
SLOs: Vending Machine Reliability
Time Threshold:
Expiry: 5 Minutes
Time window: 10 Minutes
After the Smart Alert configuration is set up, the system begins monitoring the burn rate of the Vending Machine Reliability SLO configuration for the specified alerting window.
Scenario: Monitoring and Event Triggering
-
Burn rate exceeds 1 for the alerting window (Last 12 hours)
- Assume the calculated burn rate for last 12 hours starts exceeding 1.
- With the time threshold set to 10 minutes, the system waits for the full 10-minute window before taking any action.
- If the burn rate still remains above 1 after 10 minutes, the system triggers an alerting event, raising an issue.
-
Burn rate drops below 1 for the alerting window (last 12 hours)
- If, after some time, the burn rate for the alerting window drops below 1, the system continues monitoring.
- If the burn rate stays below 1, the system will wait for the expiry time threshold of 5 minutes.
- If the burn rate remains below 1 for the full 5 minutes, the event will be automatically closed.
Example 4 - Smart Alert to monitor the burn rate of an SLO with multiple alerting windows and respective thresholds
Objective: Alert and raise an issue if the burn rate of the Vending Machine Reliability SLO configuration is more than 1 for the last 24 hours and more than 4 for the last 2 hours.
The configuration of the Service levels smart alert should be:
Rule:
Alert Type: Error Budget
Metric: Burn Rate V2
Burn Rate Config:
[
Alert Window Type: LONG
Duration: 24 Hours
Duration Unit Type: Hour
Threshold:
Operator: >
Value: 1
,
Alert Window Type: SHORT
Duration: 2 Hours
Duration Unit Type: Hour
Threshold:
Operator: >
Value: 4
]
SLOs: Vending Machine Reliability
Time Threshold:
Expiry: 5 Minutes
Time window: 10 Minutes
After the Smart Alert configuration is set up, the system begins monitoring the burn rate of the Vending Machine Reliability SLO configuration for both long and short alerting windows.
Scenario: Monitoring and Event Triggering
-
Burn rate exceeds 1 for both long and short alerting windows (Last 24 hours and 2 hours)
- Assume the calculated burn rate for the last 24 hours starts exceeding 1 and for the last 2 hours starts exceeding 4.
- With the time threshold set to 10 minutes, the system waits for the full 10-minute window before taking any action.
- If the burn rate of both alerting windows still violates the thresholds after 10 minutes, the system triggers an alerting event, raising an issue.
-
Burn rate drops below 1 for the long alerting window but stays above 4 for the short alerting window
- If, after some time, the burn rate for the long alerting window drops below 1, but the burn rate for the short alerting window stays above 4, the system continues monitoring.
- If the burn rate remains below 1 for the long alerting window, the system waits for the expiry time threshold of 5 minutes.
- If the burn rate stays below 1 for the long alerting window for the full 5 minutes, regardless of the short alerting window's value, the event is automatically closed, as both thresholds must be violated in order to send an alert. The same applies in reverse — even when the short alerting window violates the threshold but the long alerting window does not.
-
Burn rate drops below 1 for the long alerting window and below 4 for the short alerting window
- If, after some time, the burn rate for both alerting windows drops below their respective thresholds, the system continues monitoring.
- If the burn rate remains below the thresholds, the system waits for the expiry time threshold of 5 minutes.
- If the burn rate stays below the thresholds for the full 5 minutes, the event is automatically closed.
Troubleshooting
The following are suggestions to resolve commonly-occurring problems with configuration SLOs.
-
Problem: No error budget is consumed, SLO status always is 100%.
- Solution: Use the indicator chart on the SLO dashboard to verify if the indicator is never exceeding the threshold during the SLO time window, resulting in no consumption of error budget. You may consider modifying the threshold accordingly.
-
Problem: No error budget is consumed, SLO status always is 100%.
- Solution: Use the traffic chart on the SLO dashboard to verify the entity is receiving traffic during the SLO time window. If not, the error budget and SLO status will not be impacted.
-
Problem: Error budget is consistently consumed rapidly, SLO status remains negative.
- Solution: Use the indicator chart on the SLO dashboard to verify if the indicator is consistently exceeding the threshold during the SLO time window, resulting in rapid consumption of error budget. You may consider modifying the threshold accordingly.
-
Problem: Burn rate alert is not triggered due to time window misalignment.
- Solution:
- Fixed time window SLO: If the SLO is configured with a fixed time window, the alert might not trigger if the burn rate calculation is based on an alerting window that is longer than the actual elapsed time in the SLO time period. For example, if the alerting window requires data from a 12-hour period, but the SLO time window just started, there might not be enough time for the burn rate to exceed the threshold. As a result, no alert is triggered even if the burn rate is high during the elapsed time.
- Rolling time window SLO: If the SLO is set to a rolling time window, the burn rate calculation might not trigger an alert if the alerting window extends beyond the SLO's creation time. For instance, if the alerting window goes past the period when the SLO was created or active, the burn rate cannot be calculated properly because the data is not available for the full alerting window.
- Solution: