Published: 27 November 2023
Contributor: Michael Goodwin
A service level agreement (SLA) is a contract between a service provider and a customer that outlines the service to be provided, the level of performance to be expected, how performance will be measured and approved and what happens if performance levels are not met.
SLAs are generally formed between a vendor and an external customer, but are also used internally, between two departments or teams within the same company.
SLAs are an important part of outsourcing and technology vendor contracts, providing an end-to-end view of how the working relationship will go. They help ensure that all stakeholders have an accurate understanding of the service agreement.
SLAs set customer expectations, define review and redressing procedures and ultimately, help optimize the end user experience. SLAs pave the way for a more seamless working relationship, with fewer issues down the line, and help protect the interests of all parties involved.
See a cost and benefit analysis of IBM Robotic Process Automation (RPA).
Register for the guide on observability
There are three primary types of service level agreements: customer-level SLAs, service-level and multi-level SLAs.
A customer-level SLA is an agreement between a service provider and a customer, whether the customer is external or internal. This agreement describes the service or different services that will be provided to the customer. For example, this agreement might be between a third-party cloud services provider and a tech company outlining the performance expectations of applications hosted in the cloud.
An internal SLA is an agreement between two different departments, teams, or sites within the same organization. This agreement could be between development and business teams outlining the deployment cadence and overall expectations for a certain application or product.
A service-level SLA is a contract that details a defined service that is provided to multiple customers. If a provider offers a product with the same level of service and support regardless of the customer, they might use a service-level SLA.
A multi-level SLA is an agreement split into different levels to incorporate more than two parties, or different levels of service, into the same agreement. A multi-level SLA might be used between an organization and multiple external providers, such as in a multicloud model with numerous public cloud providers. The agreement can also be set up between more than two internal teams or departments.
An organization that offers a product at different pricing plans or service levels, like a SaaS product, for example, might also use a multi-level SLA that describes the service level and expectations for each product tier.
Subscribe to the IBM newsletter
SLAs will vary by company, product and the specific business needs of each organization, but most SLAs contain similar key features:
An overview section introduces the agreement and its most basic features, like the parties involved, a broad outline of the services to be provided, and the start date and duration of the agreement.
This section delineates the specific services provided and all related details. It will include information on service delivery, turnaround times for deliverables, maintenance schedules, relevant dependencies and any other relevant information. This section should provide a thorough accounting of all factors and circumstances.
A stakeholder section lists all parties involved in the agreement, what their roles and responsibilities are and how each can be contacted. A primary contact will often be designated as the goto contact for reporting end-user issues.
A performance section details the service availability and service performance standards to be maintained, and what key performance indicators (KPIs) will be used to measure performance. This is usually defined within a service level objective (SLO)—an agreement within an SLA that establishes an agreed-upon performance target for a particular service over a period of time.
It often includes a workflow outlining how information will be collected and shared with stakeholders. Both performance levels, and the metrics used to gauge performance, should be carefully considered by all parties, as they are central to the entire agreement.
This section lists services or aspects of service delivery that are exempted from the agreement. This section excludes downtime due to issues with the customer’s equipment or factors outside of reasonable control (force majeure). It might also include exceptions for scheduled maintenance, dictating that such windows do not count against guaranteed uptime agreements.
A security section describes the security protocols and standards the provider maintains and provides information on how customer data will be protected. It also lists non-disclosure agreements (NDAs) and any measures involved with protecting sensitive information or intellectual property.
This section defines the penalties that will be incurred by either side should they not fulfill the terms of the agreement. It details escalation procedures, time frames for resolutions and the compensation to be provided should the service provider not fulfill the terms of the SLA. This compensation could be financial, service credits, or otherwise. It also lists redemption terms like earnbacks.
Vendor capabilities, workloads and customer requirements evolve over time. Accordingly, there should be an established process and timetable for reviewing and revising the agreed-upon terms and the KPIs being used to measure performance. This review allows the SLA to incorporate the most recent features of the provider’s product or service and address current customer needs.
The agreement should include a section that outlines the circumstances that allow for the cancellation of the service agreement prior to its expiration date, as well as the notice period required by each party if such action is pursued.
The agreement will be signed by authorized stakeholders on each side, binding all parties involved to the terms of the agreement while it is in effect.
Service level objectives (SLOs) are a part of SLAs that set a performance threshold for a specific aspect of service, like error rates or request latency or uptime. These performance metrics are used to evaluate the quality of service provided and determine if the SLA is being met.
Monitoring the appropriate metrics is an important part of an SLA’s success. Without the right data, it will be difficult to know how the arrangement is serving the organization.
Common SLA metrics include:
Uptime is the amount of time that services are working properly and available for use. This metric is usually given as a percentage over a period of time, say, 99.5% per 30 days. (3.6 hours of downtime). Uptime requirements will vary by business type, and the SLA will reflect that.
For instance, 3.6 hours of downtime per month may be way too much for an e-commerce platform doing business globally. Such a company might need to be guaranteed more availability and would seek an SLA that reflects that.
Error rates is a measurement that tracks production or service failure and the percentage of time that an IT service provider's service level falls below expected performance targets. The agreement might include SLOs for things like missed deadlines, delays in feature or update releases, negative help desk interactions, coding error rates, defect rates and other measures of technical quality.
Response time establishes the acceptable amount of time for a provider to log and respond to a client issue or request.
The resolution time establishes the acceptable amount of time for an issue to be resolved once it has been logged by the provider.
This metric is the average time it takes to recover a product, service or system following a failure or outage.
This metric is a measure of the percentage of customers who have their issue resolved by the provider during their first interaction with the service desk or chat bot.
This is a key metric for customer service providers, or organizations whose service includes a customer service component. This is the rate at which customers abandoned their customer support inquiry before they received an answer from the help desk.
A variety of security measures might be measured, like undisclosed vulnerabilities, antivirus updates or software patches, to evaluate a provider’s commitment to IT security.
By using the appropriate KPIs, organizations can determine how a provider’s services or products are contributing to broader business goals. For example, a company undergoing a digital transformation might ask: are the provider’s cloud resourcing tools helping us bring our cloud computing spend back under control? Tracking the right KPIs will help answer that question.
SLAs yield benefits for both the service provider and the customer. SLAs help to:
In creating SLAs, organizations have an opportunity to closely examine their products, services and processes—and associated customer experiences—to determine what’s working well and what can be improved upon. An SLA establishes clear performance goals that provide benchmarks for measuring performance and customer experience success.
SLAs clarify the roles and responsibilities of all stakeholders, as well as processes and channels for troubleshooting issues and handling disputes. This clarity helps eliminate confusion and promote clear communication both internally and with external clients.
SLAs define expectations around service availability, set policies for downtime and lay out procedures for failure and disaster recovery. These measures help to minimize disruptions and unexpected downtime, and quickly resolve technical issues and service outages. Once satisfactory processes are in place, organizations can leverage automation to enhance service consistency.
The SLA process offers an opportunity to be proactive with risk management. The process identifies potential risks and threats ahead of time, and helps develop plans to avoid or mitigate such issues. Organizations can improve service delivery and response times, create stronger contingency plans, and bolster their overall risk management strategy.
Democratize observability with a solution that anyone and everyone can use to get the data they want with the context they need. Purpose-built for cloud-native yet technology-agnostic, the platform automatically and continuously provides high fidelity data—1 second granularity and end-to-end traces—with the context of logical and physical dependencies across mobile, web, applications and infrastructure.
Leverage AI to streamline IT operations, increase uptime, improve efficiency and reduce costs. IBM AIOps Insights™ is a robust solution for event and incident management that offers central IT operations teams a comprehensive view of their managed IT environment. This solution provides a holistic context in a single pane of glass.
Innovate faster, reduce operational cost and transform IT operations (ITOps) across a changing landscape with an AIOps platform that delivers visibility into performance data and dependencies across environments.
Explore the practice of planning, implementing, managing and optimizing information technology services to meet the needs of end users and help organizations achieve their business goals.
Learn why an information technology infrastructure library (ITIL) is essential for your organization and how certification benefits you and your company.
Automate IT operations tasks, accelerate software delivery, and minimize IT risk with site reliability engineering.