What is a service level agreement (SLA)?
Explore IBM's SLA solution Subscribe to AI Topic Updates
Illustration with collage of pictograms of gear, robotic arm, mobile phone

Published: 27 November 2023
Contributor: Michael Goodwin

What is an SLA?

A service level agreement (SLA) is a contract between a service provider and a customer that outlines the service to be provided, the level of performance to be expected, how performance will be measured and approved and what happens if performance levels are not met.

SLAs are generally formed between a vendor and an external customer, but are also used internally, between two departments or teams within the same company. 

SLAs are an important part of outsourcing and technology vendor contracts, providing an end-to-end view of how the working relationship will go. They help ensure that all stakeholders have an accurate understanding of the service agreement.

SLAs set customer expectations, define review and redressing procedures and ultimately, help optimize the end user experience. SLAs pave the way for a more seamless working relationship, with fewer issues down the line, and help protect the interests of all parties involved.

The Total Economic Impact™ of IBM Robotic Process Automation

See a cost and benefit analysis of IBM Robotic Process Automation (RPA).

Related content

Register for the guide on observability

Types of service level agreements

There are three primary types of service level agreements: customer-level SLAs, service-level and multi-level SLAs.

Customer-level SLA

A customer-level SLA is an agreement between a service provider and a customer, whether the customer is external or internal. This agreement describes the service or different services that will be provided to the customer. For example, this agreement might be between a third-party cloud services provider and a tech company outlining the performance expectations of applications hosted in the cloud.

An internal SLA is an agreement between two different departments, teams, or sites within the same organization. This agreement could be between development and business teams outlining the deployment cadence and overall expectations for a certain application or product.

Service-level SLA

A service-level SLA is a contract that details a defined service that is provided to multiple customers. If a provider offers a product with the same level of service and support regardless of the customer, they might use a service-level SLA.

Multi-level SLA

A multi-level SLA is an agreement split into different levels to incorporate more than two parties, or different levels of service, into the same agreement. A multi-level SLA might be used between an organization and multiple external providers, such as in a multicloud model with numerous public cloud providers. The agreement can also be set up between more than two internal teams or departments.

An organization that offers a product at different pricing plans or service levels, like a SaaS product, for example, might also use a multi-level SLA that describes the service level and expectations for each product tier.

Related content

Subscribe to the IBM newsletter

Components of SLAs

SLAs will vary by company, product and the specific business needs of each organization, but most SLAs contain similar key features:

Overview

An overview section introduces the agreement and its most basic features, like the parties involved, a broad outline of the services to be provided, and the start date and duration of the agreement.

Description of services

This section delineates the specific services provided and all related details. It will include information on service delivery, turnaround times for deliverables, maintenance schedules, relevant dependencies and any other relevant information. This section should provide a thorough accounting of all factors and circumstances.

Stakeholder breakdown

A stakeholder section lists all parties involved in the agreement, what their roles and responsibilities are and how each can be contacted. A primary contact will often be designated as the goto contact for reporting end-user issues.

Performance tracking and reporting

A performance section details the service availability and service performance standards to be maintained, and what key performance indicators (KPIs) will be used to measure performance. This is usually defined within a service level objective (SLO)—an agreement within an SLA that establishes an agreed-upon performance target for a particular service over a period of time.

It often includes a workflow outlining how information will be collected and shared with stakeholders. Both performance levels, and the metrics used to gauge performance, should be carefully considered by all parties, as they are central to the entire agreement.

Exclusions

This section lists services or aspects of service delivery that are exempted from the agreement. This section excludes downtime due to issues with the customer’s equipment or factors outside of reasonable control (force majeure). It might also include exceptions for scheduled maintenance, dictating that such windows do not count against guaranteed uptime agreements.

Security protocols

A security section describes the security protocols and standards the provider maintains and provides information on how customer data will be protected. It also lists non-disclosure agreements (NDAs) and any measures involved with protecting sensitive information or intellectual property.

Redressing

This section defines the penalties that will be incurred by either side should they not fulfill the terms of the agreement. It details escalation procedures, time frames for resolutions and the compensation to be provided should the service provider not fulfill the terms of the SLA. This compensation could be financial, service credits, or otherwise. It also lists redemption terms like earnbacks.

Review and adjustment process

Vendor capabilities, workloads and customer requirements evolve over time. Accordingly, there should be an established process and timetable for reviewing and revising the agreed-upon terms and the KPIs being used to measure performance. This review allows the SLA to incorporate the most recent features of the provider’s product or service and address current customer needs.

Termination process and terms

The agreement should include a section that outlines the circumstances that allow for the cancellation of the service agreement prior to its expiration date, as well as the notice period required by each party if such action is pursued.

Signatures

The agreement will be signed by authorized stakeholders on each side, binding all parties involved to the terms of the agreement while it is in effect.

Determining SLA metrics

Service level objectives (SLOs) are a part of SLAs that set a performance threshold for a specific aspect of service, like error rates or request latency or uptime. These performance metrics are used to evaluate the quality of service provided and determine if the SLA is being met.

Monitoring the appropriate metrics is an important part of an SLA’s success. Without the right data, it will be difficult to know how the arrangement is serving the organization.

Common SLA metrics include:

Availability and uptime

Uptime is the amount of time that services are working properly and available for use. This metric is usually given as a percentage over a period of time, say, 99.5% per 30 days. (3.6 hours of downtime). Uptime requirements will vary by business type, and the SLA will reflect that.

For instance, 3.6 hours of downtime per month may be way too much for an e-commerce platform doing business globally. Such a company might need to be guaranteed more availability and would seek an SLA that reflects that.

Error rates

Error rates is a measurement that tracks production or service failure and the percentage of time that an IT service provider's service level falls below expected performance targets. The agreement might include SLOs for things like missed deadlines, delays in feature or update releases, negative help desk interactions, coding error rates, defect rates and other measures of technical quality.

Response time

Response time establishes the acceptable amount of time for a provider to log and respond to a client issue or request.

Resolution time

The resolution time establishes the acceptable amount of time for an issue to be resolved once it has been logged by the provider.

Mean time to recovery

This metric is the average time it takes to recover a product, service or system following a failure or outage.

First call resolution rate

This metric is a measure of the percentage of customers who have their issue resolved by the provider during their first interaction with the service desk or chat bot.

Abandonment rate

This is a key metric for customer service providers, or organizations whose service includes a customer service component. This is the rate at which customers abandoned their customer support inquiry before they received an answer from the help desk.

Security

A variety of security measures might be measured, like undisclosed vulnerabilities, antivirus updates or software patches, to evaluate a provider’s commitment to IT security.

Business results

By using the appropriate KPIs, organizations can determine how a provider’s services or products are contributing to broader business goals. For example, a company undergoing a digital transformation might ask: are the provider’s cloud resourcing tools helping us bring our cloud computing spend back under control? Tracking the right KPIs will help answer that question.

Benefits of an SLA

SLAs yield benefits for both the service provider and the customer. SLAs help to:

Improve quality of service and customer experience

In creating SLAs, organizations have an opportunity to closely examine their products, services and processes—and associated customer experiences—to determine what’s working well and what can be improved upon. An SLA establishes clear performance goals that provide benchmarks for measuring performance and customer experience success.

Facilitate communication

SLAs clarify the roles and responsibilities of all stakeholders, as well as processes and channels for troubleshooting issues and handling disputes. This clarity helps eliminate confusion and promote clear communication both internally and with external clients.

Increase service continuity

SLAs define expectations around service availability, set policies for downtime and lay out procedures for failure and disaster recovery. These measures help to minimize disruptions and unexpected downtime, and quickly resolve technical issues and service outages. Once satisfactory processes are in place, organizations can leverage automation to enhance service consistency.  

Minimize risk

The SLA process offers an opportunity to be proactive with risk management. The process identifies potential risks and threats ahead of time, and helps develop plans to avoid or mitigate such issues. Organizations can improve service delivery and response times, create stronger contingency plans, and bolster their overall risk management strategy.

Related solutions
IBM® Instana™ Observability

Democratize observability with a solution that anyone and everyone can use to get the data they want with the context they need. Purpose-built for cloud-native yet technology-agnostic, the platform automatically and continuously provides high fidelity data—1 second granularity and end-to-end traces—with the context of logical and physical dependencies across mobile, web, applications and infrastructure.

Explore Instana Request an Instana demo

IBM AIOps Insights

Leverage AI to streamline IT operations, increase uptime, improve efficiency and reduce costs​. IBM AIOps Insights™ is a robust solution for event and incident management that offers central IT operations teams a comprehensive view of their managed IT environment. This solution provides a holistic context in a single pane of glass. ​

Explore AIOps Insights

IBM Cloud Pak® for AIOps

Innovate faster, reduce operational cost and transform IT operations (ITOps) across a changing landscape with an AIOps platform that delivers visibility into performance data and dependencies across environments.

Explore Cloud Pak for AIOps
Resources What is IT service management (ITSM)?

Explore the practice of planning, implementing, managing and optimizing information technology services to meet the needs of end users and help organizations achieve their business goals.

What is IT infrastructure library (ITIL)?

Learn why an information technology infrastructure library (ITIL) is essential for your organization and how certification benefits you and your company.

What is site reliability engineering (SRE)?

Automate IT operations tasks, accelerate software delivery, and minimize IT risk with site reliability engineering.

Take the next step

IBM Instana provides real-time observability that everyone and anyone can use. It delivers quick time-to-value while verifying that your observability strategy can keep up with the dynamic complexity of current and future environments. From mobile to mainframe, Instana supports over 250 technologies and growing. 

Explore IBM Instana Book a live demo