Service level objectives
Service level objectives (SLOs) are important tools for observability that organizations can use to monitor the performance and availability of their services systematically and objectively. SLOs enable you to define clear targets for system performance to help in identifying degraded user experience and platform instability. Also, SLOs can be used to track the performance of a service over time to enable organizations to identify trends and patterns in service performance.
- Do you want to monitor the availability of your applications?
- Do you want to monitor the latency of your websites?
- Do you want to monitor the success rate of your synthetic tests?
- Do you want to customize which calls are good or bad and track them?
Users can set a target for system performance over a time based on the indicator provided. When the target is not reached, further analysis can be completed to determine the cause of the degraded performance.
Terminology
service level indicator (SLI): Defines a quantitative measure of a specific characteristic of the service that is provided to a customer. Common examples include error rate or response latency. The type of SLI is represented by the Blueprint within the configuration of the SLO.
SLO target: Defines the target value for the SLO that is measured by a SLI. As an example, the SLO might specify that a particular SLI is 99.9% of the defined time.
SLO status: The current evaluation of the SLI over the SLO time window, expressed as a percentage. The status is used to determine if the service is meeting the objectives specified by the SLO target.
Error budget: The target value of an SLO implicitly defines a limited allowance for service unreliability. This error budget accounts for both planned and unplanned downtime, recognizing that some level of service disruption is unavoidable in practice.
Burn rate: Burn rate is a ratio that indicates how quickly a service consumes its error budget relative to the target duration of its SLO. Its purpose is to detect early warning conditions that could lead to an SLO violation before it occurs.
Correction window: A specific time window, either one time or recurring, that is excluded from SLO status and error budget calculations.
SLOs in Instana
Instana provides the Service Levels area for creating and managing SLOs, Apdex configurations, Smart Alerts, and correction windows across multiple entity types.
To access the Service levels area, click Service levels in the navigation menu. The Service Levels area includes dedicated tabs for Service level objectives, Apdex, Smart Alerts, and Correction windows.
You can create Custom dashboard widgets for your SLOs (lite and legacy) and Apdex measurements.
Configuration of Instana SLOs can be automated with API or Terraform, and SLO data can be imported into Grafana. For more information, see SLO integrations.
For examples and troubleshooting information, see Getting started with service level objectives and Apdex configuration examples.
SLO legacy
SLO widgets from previous iterations of Instana SLOs remain supported, and are now referred to as SLO Legacy in the custom dashboard widgets. The SLO Legacy widgets and associated SLI configurations do not have any current integration with SLOs created in the SLO dashboard. For more information, see SLO widgets (legacy).
SLO lite
SLO widgets used on self-hosted Classic Edition (Docker) are still supported, and are now referred to as SLO Lite in the custom dashboard widgets. SLO Lite has limited capability and only widgets for Application SLOs are available. These widgets and indicator configurations cannot be migrated into the current SLO dashboard. For more information, see SLO widgets (lite).
Installation options
SLOs are supported on Instana SaaS, Standard Edition, and Custom Edition. Only the SLO Lite configuration is supported on Classic Edition.
Data retention
The metric data used to calculate SLOs is retained according to the granularity of the time window configuration as follows:
- Time-based minute granularity data: 13 months
- Event-based minute granularity data: 48 hours
- Event-based hour granularity data: 13 months
- Apdex minute granularity data: 48 hours
- Apdex hour granularity data: 91 days