A service-level agreement (SLA) is a formal contract between a service provider and a client guaranteeing quantifiable network performance at defined levels. SLAs offer service providers a way to distinguish themselves from their competitors in today's volatile and competitive market. A service provider may be an internal IT organization, an application service provider (ASP), a network service provider (NSP), an Internet service provider (ISP), a managed service provider (MSP), or any other type of service provider.
SLAs can be either very general or extremely detailed, and generally include the steps that should be taken by the service provider and the client in the event of failure. The service provider guarantees that the services it provides will be available for a certain percentage of time (for example, 99.9%). The provider also is able to impose limits on maximum and average response times, to notify the client of SLA downtime or before changes to network interfaces take place, and take advantage of Internet-based workflow automation, distribution, and reporting technologies. The client obtains rights and remedies if the provider fails to meet defined performance levels over the course of specified time periods. These rights, remedies, and exceptions vary from one SLA to another. The client also agrees to accept specified exceptions to the general terms of the agreement.
In each SLA, the service level must be precisely defined; otherwise, the parties will not have an agreement on what service or performance criteria the SLA is measuring at what level. For instance, a client may believe that an agreed service level measures Network A, Network B, and Network C, with the latter two connected to the first, while the service provider believes it measures Network A only. Also important are the decimal places in the uptime availability percentage: for example, there is less allowable downtime with 99.999% uptime than with 99.9% uptime hours of downtime a year compared to days. The SLA should include an exit clause for the client; the client will want the option to terminate the agreement should its business operations interrupt too often due to unsatisfactory resolutions for recurring availability, reliability, and security problems.
SLAs have been around for a while. In the 1960s, they were the general operating procedures for achieving defined service levels and responding to service problems to which a user organization agreed when buying or renting machine time on a mainframe. The big iron was the enterprise system by default, as no other technologies could match its processing capabilities.
When the client/server and networked desktop systems entered the world of computers, the notion of the distributed network system was conceived. These systems later evolved into enterprise-wide systems that ran the enterprise resource planning (ERP), supply chain management (SCM), and customer relationship management (CRM) systems across the networks.
During this evolution, enterprise dependence on the Internet has raised the visibility of the effects of network delays of a company's suite of applications. At the same time, the user (that is, the client) has committed itself to certain quality of service guarantees encompassing availability, reliability, and response time to ensure uninterrupted business operations, and has relied on external service providers to provide application, Internet, network, managed, and other services. As a result, SLAs have become more complex and broader in scope, with the user having several SLAs with different providers. A provider, in turn, may have its own SLAs with other providers, each with a different set of requirements, measurement criteria, and exceptions
New directions in the Internet (and also corporate intranets) provide new means and opportunities to converge and integrate disparate systems from different sources -- via Web services. Web services have made SLAs more challenging as the relationships among providers become more complex in the ever-expanding world of distributed network systems. These SLAs are seen as more than guaranteeing network performance and uptime availability; they are used to guarantee application performance as well, as each Web service has different characteristics and network requirements. Today, some SLAs can be or are already exposed as public Web services.
All Web services provide the flexibility of integrating and modifying system components over the Web to allow the user to change requirements and cope with competition for network resources under certain traffic conditions. This flexibility, however, is constrained by Simple Object Access Protocol (SOAP) and Universal Description and Discovery Interface (UDDI) interoperability issues, since the standard specifications for these protocols have been interpreted differently by major vendors. This means that interoperability issues must be resolved for a Web service before it is launched into the production environment and exposed as a public service in UDDI or in another public registry. This is true of a SLA-covered Web service (which we'll sometimes refer to as an SLA Web service) as well, whether that service stands alone or is part of a suite of Web services. A good example of the latter would be a single SLA that applies to every piece of a Web infrastructure, from the Internet to Web services applications.
Let me take a look at the architecture for a SLA-covered Web service before I go any further. This architecture, as shown below in Figure 1, requires three service roles: a service provider, a service client, and a service broker. The service provider publishes a SLA-covered Web service by creating a Web service on an appropriate platform and generating a WSDL document and a basic SLA for the service. It next sends service details to the service broker for storage in a repository. The service client registers with the broker, and then searches and finds the appropriate Web service in the broker's repository, retrieving the WSDL and SLA for the service. It then negotiates with the provider to formalize and finalize the SLA and bind to its Web service.
Figure 1. Architecture for a Web service covered by a SLA
Any Web service covered by a SLA must be monitored for scalability and performance as they pertain to HTTP, HTTPS, SOAP, UDDI, and Web Service Description Language (WSDL). All SOAP, WSDL, and other interoperability issues must be resolved before a SLA-covered Web service is launched into a production environment. Since the provider offering the service may be financially liable under the terms of the SLA if the service does not meet certain standards, it is particularly important to make sure that all such problems are under control.
Before setting up a SLA-covered Web service, testing mechanisms such as the tools and scripts from PushtoTest (see the Resources section below for a link) should be used to test various protocols and components of that Web service. After the service is launched, these testing tools may act as a SLA monitor.
Table 1 is a checklist example that a developer should consider when testing a service that will be covered by a SLA:
Table 1. Web service features that should be tested before a service is made public
|Statefulness||When you use SOAP to set a server value, does the server respond correctly in the subsequent states?|
|Access||Can an unauthorized user successfully access a control that only the administrators are authorized to use?|
|Response time||Is the Web service taking too long to respond (for example, more than 10 seconds)?|
|Time-out||What happens when the Web service times out?|
|Versioning||Can a new build break an existing Web service's functions?|
Like any agreement or insurance, a SLA will usually specify exceptions to its terms. The SLA for a Web service should include details on exceptions. I'll arbitrarily divide them into four areas: failures, network issues not within the direct control of service provider, denial of service, and scheduled maintenance. Table 2 explains certain specific situations that may fall under these categories.
Other exceptions can be added to suit a provider's situation, as long as the client companies can get reasonable recompensation for downtime. By including exceptions in a SLA, a provider can protect itself from liability in case of problems or network outages. On the other hand, if competing services offer SLAs with fewer exceptions, the client has the option of choosing those agreements that offer more uptime in business operations with better service guarantees. Even the differences among 99.5%, 99.9% and 99.999% uptime availability guarantees can influence the decision makers in making the selection of a SLA.
Table 2. Exceptions that can be potentially included in a SLA
|Failures||Hardware failure (note that faulty hardware is rare)|
Telecommunication failure (for example, a provider accidentally cuts a fiber line)
Monitoring/measurement system failure
|Network issues not within direct control of service provider||Backbone peering point issues (for example, UUnet has a router in California go down, denying Internet services to the entire West Coast)|
DNS issues not within the direct control of the service provider
|Denial of service||Client negligence/willful misconduct |
Network floods, hacks, and attacks
Acts of God, war strikes, unavailability of telecommunications, inability to get supplies or equipment needed for the provision of the SLA
|Hardware upgrades |
While SLAs focus on maximum upload availability and guaranteed bandwidths, SLAs cannot guarantee consistent response times for latency-sensitive Web service applications. Latency is the amount of time (usually measured in milliseconds) that a packet of data takes to get from one point to another and then back in a round trip. Latency problems occur when it takes too long for the packet to complete its trip. You would notice them if, for example, the audio produced by a Web service begins to stutter or the mouse cursor starts to shake a little.
The SLA should specify an average round-trip latency and packet loss over a given time period -- within a month, say. It should define the average round-trip latency as the average round-trip packet transfer between the network and its destination, and packet loss as the percentage of packets that are lost during a round-trip data transmission. The agreement should limit this loss to a certain level -- say, 1% or less -- and specify remedies, including payments or refunds, if the loss exceeds this level over the course of the agreed time frame.
So far, I have explained the technical parameters of a SLA-covered Web service. If you plan to offer Web services to your paying customers, they will generally want a SLA to make sure that they get the return on their investment that they expect. The topics discussed in this article should give you a head start in preparing your Web services to meet a SLA's requirements.
This article did not cover the various tools that could be used to measure the customer's expectations; in the real business world, you may find that even though your service meets the agreed service levels, you customer may still be dissatisfied with the service because technical service delivery has not met business expectations. In these cases, the client and provider must renegotiate the terms of the agreement to establish a level that satisfies the customer. For the developer, it is important to keep this in mind during the process of creating and implementing a Web service. The developer must consider both the customer's business and technical expectations.
- Get information on how to use SLAs in a Web services context from the other papers in this series:
- "Guarantee second-generation Web services applications with a SLA" (developerWorks, August 2004)
- "Integrate Web services into EAI with a SLA guarantee" (developerWorks, October 2004)
- "Secure multiple Web services with a SLA guarantee" (developerWorks, November 2004)
- "Firewall Web services with a SLA guarantee" (developerWorks, December 2004)
- "Localize Web services with a SLA guarantee" (developerWorks, January 2005)
- "Mitigate risk for vulnerability with a SLA guarantee" (developerWorks, January 2005)
- Learn more about PushToTest to test and monitor Web services.
- Read The Complete Book of Middleware, which focuses on the essential principles and priorities of system design and emphasizes the new requirements brought forward by the rise of e-commerce and distributed integrated systems.
- Read Enterprise Systems Integration, Second Edition, to provide yourself with the business insight and the technical know-how that ensures successful systems integration.
- Read "Understanding quality of service for Web services" by Anbazhagan Mani and Arun Nagarajan (developerWorks, January 2002) to improve the performance of your Web services.
- This IBM Redbook for Domino administrators goes into the nuts and bolts of developing a service-level agreement.
- Browse for books on these and other technical topics.
- Want more? The developerWorks SOA and Web services zone hosts hundreds of informative articles and introductory, intermediate, and advanced tutorials on how to develop Web services applications.
Judith M. Myerson is a systems architect and engineer. Her areas of interest include middleware technologies, enterprise-wide systems, database technologies, application development, network management, distributed systems, component-based technologies, and project management. You can contact her at email@example.com.