Use SLAs in a Web services context, Part 1: Guarantee your Web service with a SLA

Introduction, architecture, and testing mechanisms

Many businesses are demanding service-level agreements (SLAs) that guarantee the reliability of the IT services they pay for. As Web services enter the mainstream, customers will be asking for SLAs that ensure their quality. In this article, Judith M. Myerson explains how you can establish a service-level agreement (SLA) that covers a Web service. She covers the exceptions that should be included in a SLA, and gives examples of testing a Web service for SOAP interoperability before launching it into the production environment as an exposed Web service that is covered by a SLA.

Judith Myerson (jmyerson@bellatlantic.net), Systems architect and engineer

Judith M. Myerson is a systems architect and engineer. Her areas of interest include middleware technologies, enterprise-wide systems, database technologies, application development, network management, distributed systems, component-based technologies, and project management. You can contact her at jmyerson@bellatlantic.net.


developerWorks Contributing author
        level

29 October 2004 (First published 01 April 2002)

Also available in Japanese

Introduction: What is a SLA?

A service-level agreement (SLA) is a formal contract between a service provider and a client guaranteeing quantifiable network performance at defined levels. SLAs offer service providers a way to distinguish themselves from their competitors in today's volatile and competitive market. A service provider may be an internal IT organization, an application service provider (ASP), a network service provider (NSP), an Internet service provider (ISP), a managed service provider (MSP), or any other type of service provider.

SLAs can be either very general or extremely detailed, and generally include the steps that should be taken by the service provider and the client in the event of failure. The service provider guarantees that the services it provides will be available for a certain percentage of time (for example, 99.9%). The provider also is able to impose limits on maximum and average response times, to notify the client of SLA downtime or before changes to network interfaces take place, and take advantage of Internet-based workflow automation, distribution, and reporting technologies. The client obtains rights and remedies if the provider fails to meet defined performance levels over the course of specified time periods. These rights, remedies, and exceptions vary from one SLA to another. The client also agrees to accept specified exceptions to the general terms of the agreement.

In each SLA, the service level must be precisely defined; otherwise, the parties will not have an agreement on what service or performance criteria the SLA is measuring at what level. For instance, a client may believe that an agreed service level measures Network A, Network B, and Network C, with the latter two connected to the first, while the service provider believes it measures Network A only. Also important are the decimal places in the uptime availability percentage: for example, there is less allowable downtime with 99.999% uptime than with 99.9% uptime hours of downtime a year compared to days. The SLA should include an exit clause for the client; the client will want the option to terminate the agreement should its business operations interrupt too often due to unsatisfactory resolutions for recurring availability, reliability, and security problems.


SLA evolution

SLAs have been around for a while. In the 1960s, they were the general operating procedures for achieving defined service levels and responding to service problems to which a user organization agreed when buying or renting machine time on a mainframe. The big iron was the enterprise system by default, as no other technologies could match its processing capabilities.

When the client/server and networked desktop systems entered the world of computers, the notion of the distributed network system was conceived. These systems later evolved into enterprise-wide systems that ran the enterprise resource planning (ERP), supply chain management (SCM), and customer relationship management (CRM) systems across the networks.

During this evolution, enterprise dependence on the Internet has raised the visibility of the effects of network delays of a company's suite of applications. At the same time, the user (that is, the client) has committed itself to certain quality of service guarantees encompassing availability, reliability, and response time to ensure uninterrupted business operations, and has relied on external service providers to provide application, Internet, network, managed, and other services. As a result, SLAs have become more complex and broader in scope, with the user having several SLAs with different providers. A provider, in turn, may have its own SLAs with other providers, each with a different set of requirements, measurement criteria, and exceptions


New directions: Covering Web services with SLAs

New directions in the Internet (and also corporate intranets) provide new means and opportunities to converge and integrate disparate systems from different sources -- via Web services. Web services have made SLAs more challenging as the relationships among providers become more complex in the ever-expanding world of distributed network systems. These SLAs are seen as more than guaranteeing network performance and uptime availability; they are used to guarantee application performance as well, as each Web service has different characteristics and network requirements. Today, some SLAs can be or are already exposed as public Web services.

All Web services provide the flexibility of integrating and modifying system components over the Web to allow the user to change requirements and cope with competition for network resources under certain traffic conditions. This flexibility, however, is constrained by Simple Object Access Protocol (SOAP) and Universal Description and Discovery Interface (UDDI) interoperability issues, since the standard specifications for these protocols have been interpreted differently by major vendors. This means that interoperability issues must be resolved for a Web service before it is launched into the production environment and exposed as a public service in UDDI or in another public registry. This is true of a SLA-covered Web service (which we'll sometimes refer to as an SLA Web service) as well, whether that service stands alone or is part of a suite of Web services. A good example of the latter would be a single SLA that applies to every piece of a Web infrastructure, from the Internet to Web services applications.


SLA Web services architecture

Let me take a look at the architecture for a SLA-covered Web service before I go any further. This architecture, as shown below in Figure 1, requires three service roles: a service provider, a service client, and a service broker. The service provider publishes a SLA-covered Web service by creating a Web service on an appropriate platform and generating a WSDL document and a basic SLA for the service. It next sends service details to the service broker for storage in a repository. The service client registers with the broker, and then searches and finds the appropriate Web service in the broker's repository, retrieving the WSDL and SLA for the service. It then negotiates with the provider to formalize and finalize the SLA and bind to its Web service.

Figure 1. Architecture for a Web service covered by a SLA
Architecture for a Web service covered by a SLA

Getting ready for the real world: Testing mechanisms

Any Web service covered by a SLA must be monitored for scalability and performance as they pertain to HTTP, HTTPS, SOAP, UDDI, and Web Service Description Language (WSDL). All SOAP, WSDL, and other interoperability issues must be resolved before a SLA-covered Web service is launched into a production environment. Since the provider offering the service may be financially liable under the terms of the SLA if the service does not meet certain standards, it is particularly important to make sure that all such problems are under control.

Before setting up a SLA-covered Web service, testing mechanisms such as the tools and scripts from PushtoTest (see the Resources section below for a link) should be used to test various protocols and components of that Web service. After the service is launched, these testing tools may act as a SLA monitor.

Table 1 is a checklist example that a developer should consider when testing a service that will be covered by a SLA:

Table 1. Web service features that should be tested before a service is made public
Testing typeQuestions
StatefulnessWhen you use SOAP to set a server value, does the server respond correctly in the subsequent states?
AccessCan an unauthorized user successfully access a control that only the administrators are authorized to use?
Response timeIs the Web service taking too long to respond (for example, more than 10 seconds)?
Time-outWhat happens when the Web service times out?
VersioningCan a new build break an existing Web service's functions?

Exceptions to the rule

Like any agreement or insurance, a SLA will usually specify exceptions to its terms. The SLA for a Web service should include details on exceptions. I'll arbitrarily divide them into four areas: failures, network issues not within the direct control of service provider, denial of service, and scheduled maintenance. Table 2 explains certain specific situations that may fall under these categories.

Other exceptions can be added to suit a provider's situation, as long as the client companies can get reasonable recompensation for downtime. By including exceptions in a SLA, a provider can protect itself from liability in case of problems or network outages. On the other hand, if competing services offer SLAs with fewer exceptions, the client has the option of choosing those agreements that offer more uptime in business operations with better service guarantees. Even the differences among 99.5%, 99.9% and 99.999% uptime availability guarantees can influence the decision makers in making the selection of a SLA.

Table 2. Exceptions that can be potentially included in a SLA
TypeExamples
FailuresHardware failure (note that faulty hardware is rare)
Telecommunication failure (for example, a provider accidentally cuts a fiber line)
Software bugs/flaws
Monitoring/measurement system failure
Network issues not within direct control of service providerBackbone peering point issues (for example, UUnet has a router in California go down, denying Internet services to the entire West Coast)
DNS issues not within the direct control of the service provider
Denial of serviceClient negligence/willful misconduct
Network floods, hacks, and attacks
Acts of God, war strikes, unavailability of telecommunications, inability to get supplies or equipment needed for the provision of the SLA
Scheduled
maintenance
Hardware upgrades
Software upgrades
Backups

Latency problems

While SLAs focus on maximum upload availability and guaranteed bandwidths, SLAs cannot guarantee consistent response times for latency-sensitive Web service applications. Latency is the amount of time (usually measured in milliseconds) that a packet of data takes to get from one point to another and then back in a round trip. Latency problems occur when it takes too long for the packet to complete its trip. You would notice them if, for example, the audio produced by a Web service begins to stutter or the mouse cursor starts to shake a little.

The SLA should specify an average round-trip latency and packet loss over a given time period -- within a month, say. It should define the average round-trip latency as the average round-trip packet transfer between the network and its destination, and packet loss as the percentage of packets that are lost during a round-trip data transmission. The agreement should limit this loss to a certain level -- say, 1% or less -- and specify remedies, including payments or refunds, if the loss exceeds this level over the course of the agreed time frame.


Conclusion

So far, I have explained the technical parameters of a SLA-covered Web service. If you plan to offer Web services to your paying customers, they will generally want a SLA to make sure that they get the return on their investment that they expect. The topics discussed in this article should give you a head start in preparing your Web services to meet a SLA's requirements.

This article did not cover the various tools that could be used to measure the customer's expectations; in the real business world, you may find that even though your service meets the agreed service levels, you customer may still be dissatisfied with the service because technical service delivery has not met business expectations. In these cases, the client and provider must renegotiate the terms of the agreement to establish a level that satisfies the customer. For the developer, it is important to keep this in mind during the process of creating and implementing a Web service. The developer must consider both the customer's business and technical expectations.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into SOA and web services on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=SOA and web services
ArticleID=11659
ArticleTitle=Use SLAs in a Web services context, Part 1: Guarantee your Web service with a SLA
publish-date=10292004