Service level objectives widgets (lite)
The SLO lite
widgets are subset of legacy SLI widgets; only widgets for application entities are supported. The underlying infrastructure of lite widgets is not compatible with legacy widgets. There is no migration path between SLO
widgets and SLO lite widgets.
When the feature.slo.full.enabled
feature flag is disabled and feature.slo.lite.enabled
is enabled, the lite
widgets are available. This capability is only available on self-hosted Classic Edition.
SLO widget (lite)
Instana enables users to create custom dashboard widgets for their SLOs to display and analyze the performance of their services over time. The widget can display either Time-based or Event-based SLI configurations.
The following image illustrates an example SLO widget that is called Robot Shop SLO and is configured by using a Time-based SLI configuration and an SLO target value of 95%. The SLI in the example is based on the 90th percentile of the latency metric and a threshold value of 2 seconds. An error budget is 1 minus the SLOs' target value that is multiplied by the time window. For the displayed time window of 7 days as shown on the widget, it is converted to an error budget of 504 minutes:
504 minutes = (1 - 95%) * 10080 minutes
The example SLO was not achieved in the selected time frame because the spent error budget of 565 minutes exceeds that limit.
Configuration
Adding SLO widgets (lite)
You can set up an SLO widget for any of your application perspectives or websites. To add an SLO legacy widget, go to one of your custom dashboards to open the dialog for adding a widget. Next, follow these steps:
- In the dialog sidebar, click SLO legacy > Next. The dialog opens the configuration section to set up an SLO widget.
- Select whether you want to monitor an Application Perspective or website.
- SLOs for application perspectives exclude factors beyond your control such as poor internet connectivity of users, and might not accurately reflect user experience.
- SLOs for websites most accurately reflect user experience and do not exclude factors beyond your control such as poor internet connectivity of users.
- Select the Application Perspective or website for the SLO from the list.
- Select the Service Level Indicator for the particular SLO type from the list. Create a SLI, as described in the SLI configuration, if no SLI is available for the previously selected SLO type.
- Enter the wanted SLO Target value, for example
99.9%
. - Select the Time Window Type, which defines the context and displayed time frame of the widget:
- Dynamic time window: The SLO is calculated for the time window that is selected in the global time picker.
- Rolling time window: A time window with a fixed window size, where the end is defined by the global time picker’s end date and time selection. As an example, the rolling time window enables the ability to always see the last week, without having to adjust to global time picker.
- Fixed time interval: A time window with a defined start and duration. As an example, you can configure a fixed one-month window that starts on
2020-01-01
. The time window will be automatically reset to the next month (2020-02-01
) when the month is completed.
- Enter a title for the widget.
- Verify your widget in the preview. If no preview is displayed, click Highlight missing configuration to immediately see what is missing.
- To create the SLO widget configuration, click Create.
- To save the SLO widget configuration on your custom dashboard, click Save changes.
SLI configuration
SLI types
Independent of the SLO's type, you can select one of two SLI types for your configuration:
- Event-based SLI uses defined groups of good and bad events to measure the service's reliability. Because every call is given the same weight, it more accurately reflects the actual user experience. For this reason, handling the error budget is more difficult because it depends on the number of events.
- Time-based SLI measures the service's reliability aggregated by minute. While the fact that the error budget is always a constant number of bad minutes makes it easier for people to manage, it is less accurate than event-based SLI since bad events are more significant in minutes with less traffic.
Managing SLIs through the UI
To create SLI configuration or clone an existing SLI configuration, go to the SLI Management dialog by clicking Manage SLIs on the SLO widget.
Next, follow these steps to either create configuration from scratch or by cloning an existing configuration.
Creating SLI configuration for application perspectives
- On the SLI Management dialog, click Create SLI.
- Provide the SLI configuration Details:
- Enter a name for the SLI configuration to identify the configuration uniquely.
- Select the type of SLI configuration, which can be time-based or event-based
- Click Create to save the new SLI configuration.
Time-based SLI for applications
Complete the following configuration of Time-based SLI:
- Select the boundary scope, either Inbound Calls or All Calls.
- Inbound calls: Include only calls that are initiated from outside the application and where the destination service is part of the selected application perspective.
- All calls: Include both inbound calls from outside the application and calls that occur within the application perspective itself.
- You can choose a specific service in your application or leave the default of
All Services
selected to apply to the entire Application Perspective. - If you want to narrow down further to an endpoint, you can select an endpoint from the list. Similar to service selection, you can choose to leave the default of
All Endpoints
to apply to the entire service. - Choose a metric on which the SLI configuration must be evaluated from the list of supported metrics.
- The following metrics are supported:
- Latency
- Call Count
- Error rate
- Erroneous Calls
- The following metrics are supported:
- Select the aggregation for the selected metric.
- Enter the threshold value for the selected metric.
After the metric and threshold are selected, SLI is computed as follows:
SLI = (1 - #minutes_where_threshold_is_violated / #minutes_in_time_window) * 100%
The following image shows how a time-based SLI is configured for an application k8s-demo
. The Boundary scope of the application is restricted to Inbound Calls
for the Endpoint DELETE /cart/:id
and on a Service called cart
. The Metrics to be evaluated is set to Latency
with a recommended Aggregation of 90th percentile
and a Threshold of 25 ms
.
Event-based SLI
Event-based SLI configuration gives the full flexibility of the Unbounded Analytics query builder to select a subset of good events and bad events.
- Good events: The set of calls that indicate the success criteria of a particular service. For example, all HTTP requests of an HTTP Service, which have the status code
2XX
. - Bad events: The set of calls that indicate the failure criteria of a particular service. For example, all HTTP requests of an HTTP Service, which have the status code
5XX
.
Complete the following configuration of Event-based SLI:
- Select the boundary scope, either Inbound Calls or All Calls.
- Inbound calls: Include only calls that are initiated from outside the application and where the destination service is part of the selected application perspective.
- All calls: Include both inbound calls from outside the application and calls that occur within the application perspective itself.
- Optional: You can include internal calls or Synthetic calls. By default, both calls are excluded.
- Internal calls: A particular type of calls that represent work that is done inside a service. These calls can be created from intermediate spans that are sent through custom tracing. For more information about internal calls, see Concepts of Tracing.
- Synthetic calls: Calls with a Synthetic endpoint as the destination, such as calls to the
health-check
endpoints.
When bad events and good events are defined, the SLI is calculated as follows:
SLI = #good_events / (#good_events + #bad_events) * 100%
The image shows how an event-based SLI is configured for an application, whose boundary scope is restricted to Inbound Calls
. The good events
are defined as calls with a status code of 200 and bad events
are calls with a status code of 500.
The resulting widget of this configuration shows the error budget of calls as opposed to minutes.
Cloning an existing SLI configuration
The parameters of the SLI cannot be modified to prevent invalidation of the calculated spent budgets. Therefore, the SLI configuration must be cloned when you change any parameter.
- From the SLI Management dialog, click the View/Clone SLI configuration icon on the selected SLI configuration that you want to clone.
- Edit the SLI configuration Details:
- Change the name for the SLI configuration to identify the configuration as a clone from the already-existing configuration.
- Edit the SLI configuration as required.
- Click Clone to create a clone of the SLI configuration.