Getting started with service level objectives
You can create your first Service Level Objective (SLO) in Instana and understand the key concepts and configuration options.
Before you begin
- Access permissions: The "Access Service Levels" permission and "Create, configure, and delete SLO configurations" permission
- Monitored entities: At least one of the following must already be configured in Instana:
- Application perspective
- Website with beacon data
- Synthetic test
- Infrastructure entities (hosts, containers, etc.)
Understanding SLO concepts
Before you create an SLO, it is important to understand how the key components work together.
The SLI/SLO/Error Budget relationship
┌─────────────────────────────────────────────────────────────┐
│ Service Level Indicator (SLI) │
│ "What you measure" │
│ Example: Response time, error rate, availability │
└────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Service Level Objective (SLO) │
│ "Your target" │
│ Example: 99% of requests < 100ms │
└────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Error Budget │
│ "Allowed failures" │
│ Example: 1% = 101 minutes/week OR 252 failed calls │
└─────────────────────────────────────────────────────────────┘
- Service level indicator (SLI): A quantitative measure of service performance (for example, latency, availability, or traffic)
- Blueprint: The type of SLI you're measuring (Latency, Availability, Traffic, Saturation, or Custom)
- SLO target: Your desired performance level (for example, 99%)
- Error budget: The inverse of your target (1% in this example), representing acceptable failures
- Good vs bad events/minutes:
- Good: Metrics that meet your threshold (for example, response time < 100ms)
- Bad: Metrics that exceed your threshold (for example, response time ≥ 100ms)
- Burn rate: How quickly you are consuming your error budget relative to the SLO time window
Decision guide: Choosing your SLO configuration
Step 1: Select your entity type
| Entity type | Best for | Common use cases |
|---|---|---|
| Application | Backend services, APIs | API latency, service availability, error rates |
| Website | User-facing web applications | Page load times, user experience, frontend errors |
| Synthetic tests | Proactive monitoring | Uptime monitoring, multi-step user flows |
| Infrastructure | System resources | CPU, memory, disk utilization |
Step 2: Choose your blueprint
| Blueprint | Measures | When to use |
|---|---|---|
| Latency | Response time | When speed matters (APIs, page loads) |
| Availability | Success rate | When uptime is critical (services, websites) |
| Traffic | Request volume | When load consistency matters |
| Saturation | Resource usage | For infrastructure capacity planning |
| Custom | User-defined criteria | For specific business requirements |
Step 3: Select measurement type
| Type | Error budget unit | Best for | Calculation method |
|---|---|---|---|
| Time-based | Minutes | Consistent traffic patterns | Aggregates metrics per minute |
| Event-based | Events (calls/beacons/results) | Variable traffic | Counts individual good/bad events |
- Static error budget: 1% of 10,080 minutes (1 week) = 101 minutes
- Dynamic error budget: 1% of total requests (varies with traffic)
Tutorial: Creating your first SLO
This tutorial creates an SLO to monitor application latency.
Scenario
Goal: Ensure 95% of API calls to your "Payment Service" application respond within 200ms over a rolling 7-day period.
Step-by-step instructions
-
Navigate to Service Levels
- From the Instana UI navigation menu, click Service Levels
- Click Create service level objective
-
Select entity
- Entity type: Application
- Select your application: Payment Service (from the searchable list)
- Click Next
-
Set scope
- Calls in scope: Inbound calls (calls from outside the application)
- Include hidden calls (optional):
- Internal calls: Unchecked (exclude internal service calls)
- Synthetic calls: Unchecked (exclude health checks)
- Services and endpoints: Select (use dropdown menus)
- Service: All services (or select specific service)
- Endpoint: All endpoints (or select specific endpoint)
- Click Next
-
Set indicator
- Blueprint: Latency
- Measurement type: Time-based (aggregates metrics per minute)
- Aggregation: Mean (average latency per minute)
- Threshold: 200 ms
- Click Next
What this means: Each minute, Instana calculates the average latency. If the average exceeds 200ms, that minute is marked as "bad" and consumes error budget.
-
Set objective
- SLO Target: 95% (95% of minutes must meet the threshold)
- Time Window: Rolling (continuously evaluates last 7 days)
- Length: 7 days
- Bind time zone: Disabled (uses UTC by default)
Error budget preview: 504 minutes (7 days × 24 hours × 60 minutes × 5%)
-
Enter details
- Name: Payment Service - Latency SLO
- Tags: production, payment, critical (optional, for filtering)
- Teams: payment team, audit team (optional)
- Click Create
Understanding your SLO dashboard
- Status: Current performance percentage (e.g., 96.5%) compared to the target (95%)
- Error budget remaining: Remaining minutes within the error budget (e.g., 450 of 504 minutes)
- Burn rate: How fast the error budget is being consumed (e.g., 1.2x = 20% faster than expected)
- Indicator chart: Latency over time with threshold line
- Error budget chart: Error budget consumption over time
- Traffic chart: Request volume over time
Next steps
-
Add Smart Alerts: Get notified when SLO status, error budget, or burn rate crosses thresholds
-
Create correction windows: Exclude planned maintenance or nonbusiness hours
-
Add SLO widgets: Display SLOs on custom dashboards
-
Explore more examples: Learn about different SLO configurations
-
Automate with API: Manage SLOs programmatically
Common questions
Q: Should I use time-based or event-based measurement?
A: Use time‑based when traffic patterns are consistent and you want a predictable error budget. Use event‑based when traffic is variable or when individual request success rates matter.
Q: What is a good SLO target to start with?
A: You can start with 95% for non-critical services, 99% for important services, and 99.9% for critical services. Adjust these targets based on actual performance and business requirements.
Q: How long should my time window be?
- 1 day: Provides fast feedback; useful for development or testing
- 7 days: Balances responsiveness with stability
- 28 days: Shows long‑term trends; recommended for production services
- Calendar month: It aligns with business reporting cycles, making it ideal for monthly SLA reviews and financial reporting periods. Available only within fixed time windows.
Q: What if my SLO status is always 100%?
A: Your threshold may be too lenient. Review the indicator chart and adjust the threshold to make it more challenging but still achievable.
Q: Can I change an SLO after creation?
A: Yes, you can update the name, target, time window type/length, timezone, and tags. However, you cannot change the entity, scope, or indicator configuration.
When is it appropriate to use calendar month time windows?
A: Use calendar month time windows when:
- You need to align SLO reporting with business calendars and monthly reviews
- Your organization tracks SLAs on a monthly basis
- You want consistent month-over-month comparisons
- Financial or operational reporting follows calendar month boundaries
Q: Do I select synthetic tests individually or by using filters?
A: Use individual synthetic test selection to have the SLO monitor a specific, fixed set of tests. Use filter‑based selection to have the SLO automatically include all synthetic tests that match attributes such as test name, location ID, or application ID. Filter‑based selection creates a dynamic scope, so newly created tests that meet the filter criteria are automatically included in the SLO.