Service level objectives (SLOs)

Service Level Objectives (SLOs) are important tools for observability that organizations can use to monitor the performance and availability of their services in a systematic and objective way. SLOs enable you to define clear targets for system performance to help in identifying degraded user experience and platform instability. Also, SLOs can be used to track the performance of a service over time to enable organizations to identify trends and patterns in service performance.

  • Do you want to monitor the availability of your applications?
  • Do you want to monitor the latency of your websites?
  • Do you want to monitor the success rate of your synthetic tests?
  • Do you want to customize which calls are good or bad and track them?

Users can set a target for system performance over a time based on the indicator provided. When the target is not reached, further analysis can be completed to determine the cause of the degraded performance.

Terminology

Service level indicator (SLI): Defines the defined quantitative measure of one characteristic of the level of service that is provided to a customer. Common examples include the error rate or response latency of a service. The type of SLI is represented by the Blueprint in the configuration of the SLO.

SLO target: Defines the target value for the SLO that is measured by a SLI. As an example, the SLO might specify that a particular SLI is 99.9% of the defined time.

SLO status: The current evaluation of the SLI over the SLO time window, expressed as a percentage. The status is used to determine if the service is meeting the objectives specified by the SLO target.

Error Budget: The specified target value of an SLO implicitly defines a limited budget where the service is allowed to not work fully reliably. This error budget allows the incorporation of planned or unplanned downtime of the service that is unavoidable in practice.

Burn Rate: A unitless value that measures how quickly a service consumes its error budget in relation to the target length of its service level objective (SLO). The goal of burn rate is to detect conditions that might result in an SLO violation before it happens.

SLOs in Instana

Instana supports the creation and management of SLOs for multipe entity and indicator types.

To access the Service levels area, click Service Levels in the navigation menu. You can view the list of existing SLOs, select an SLO to view a detailed dashboard, and create new SLOs.

You can create custom dashboard widgets for your SLOs and Apdex measurements.

You can configure Smart Alerts for your SLOs to notify users when thresholds for status, error budget or burn rate are exceeded on your SLO.

Configuration of Instana SLOs can be automated with API or Terraform, and SLO data can be imported into Grafana. For more information, see SLO Integrations.

For examples and troubleshooting information, see SLO Getting Started.

SLO dashboard

SLO Legacy

SLO widgets from previous iterations of Instana SLOs remain supported, and are now referred to as SLO Legacy in the custom dashboard widgets. The SLO Legacy widgets and associated SLI configurations do not have any current integration with SLOs created in the SLO dashboard. For more information, see Service level objectives widgets (legacy).

SLO Lite

SLO widgets used on self-hosted Classic Edition (Docker) are still supported, and are now referred to as SLO Lite in the custom dashboard widgets. SLO Lite has limited capability and only widgets for Application SLOs are available. These widgets and indicator configurations cannot be migrated into the current SLO dashboard. For more information, see Service level objectives widgets (lite).

Installation Options

SLOs are supported on Instana SaaS, Standard Edition, and Custom Edition. Only the SLO Lite configuration is supported on Classic Edition.

Data retention

The metric data used to calculate SLOs is retained according to the granularity of the time window configuration as follows:

  • Time-based minute granularity data: 13 months
  • Event-based minute granularity data: 48 hours
  • Event-based hour granularity data: 13 months