Let's first take a look at some limitations of web services (in contrast to EAI applications) in order of ascending importance and then discuss how system interruption thresholds have an impact on how they function.
Limitation 1: Multiple web services rarely execute themselves. It takes a series of business processes or higher-level services in SOA to orchestrate the execution of multiple services to achieve a goal. EAI applications, on other hand, can execute themselves, sometimes as the conductor of web services.
Limitation 2: SOAP is not fully interoperable industry-wide, although web services are based on a growing list of industry standards. Most EAI standards are proprietary.
Limitation 3: There might be a problem integrating some web services that complete business processes in a short time with other web services that involve long-running applications based on a complex set of business process rules. This is because web services are loosely coupled and must be executed synchronously, so web service consumers must be present at all times. EAI applications, on the other hand, are tightly coupled, and can be either synchronous or asynchronous. It is not necessary for the SOA service consumers to be present at all times.
Limitation 4: Web services focusing on relationship, chain management, and resource planning have different sets of integration rules, although they are capable of collaborating with each other on integration and aggregation of applications between enterprises. In contrast, the components of an EAI system communicate with one another through an integration hub that serves as middleware among EAI applications (including chain management and resource planning), legacy systems, databases, web services, and non-web services.
Consider the major limitation that web services rarely execute themselves. What is the impact of a system interruption on the uptime availability of web services in the SLA? What system interruption thresholds need to be established when you develop performance metrics to indicate the impact of thresholds on uptime availability?
Are we talking about a threshold for a single web service? Multiple web services? Or are we referring to the higher-level services in the SOA as the conductor of web services and non-web services? What happens when a web service times out? What is the impact of timing out on determining a threshold?
To begin, consider three possible thresholds: 50%, 67% and 98%. What do they all mean? As shown in Figure 1, a system interruption at 50% threshold means the interruption between 0% and 49.999% of the entire system is not significant and should not be considered as part of performance metrics. It's too low to be counted toward, for example, 99.5% upward availability.
Figure 1. System interruption at 50% threshold
On the other hand, the interruption between 50% and 100% is significant and should be considered as part of the metrics. This is also true for the 67% and 98% thresholds (see Figures 2 and 3). The higher threshold can be counted toward, for example, 99.7% upward availability.
Figure 2. System interruption at 67% threshold
Figure 3. System interruption at 98% threshold
The higher the threshold, the more a business has to pay for monitoring threshold service. Likewise, the higher the threshold, the higher the penalty a provider has to pay if it fails to meet a certain threshold. The threshold penalty is in addition to that for the uptime availability penalty specified in a web services SLA.
In my article, "Guarantee your web service with a SLA" (see Resources), I mention that packet loss occurs when a digital voice stutters a little as it streams to a web site you are looking at, or when your mouse cursor shakes a little -- for a brief time. Both the stuttering voice and the shaky cursor might have interrupted the system at the fraction of a split second that a human eye would not notice but a performance metric indicator could automatically log in the event that the system interruption occurred at 0.6%.
The shaky cursor is just a slight annoyance that any user can endure up to 8 or 10 seconds. Obviously, this interruption threshold is highly insignificant, as the impact of packet loss is minimal.
One good example of significant system interruption is the denial of service due to provider negligence, excluding client negligence or willful misconduct, and monitoring or measuring system failure. Table 1 lists other exceptions.
All these exceptions are potentially included in the SLA. However, if competing services offer SLAs with fewer exceptions, it is recommended that the client be given the option of choosing those agreements that offer more uptime in business operations with better service guarantees, including higher system interruption thresholds (acts of God and war strikes are strictly exceptions unless otherwise indicated in terms of insurance and disaster recovery costs).
What happens when a web service times out? Will it have a domino effect on associated web services being timed out as well, resulting in system interruption? If the timing-out only affects a single web service resulting in a message asking the user to log in again or come back later to do so, then the interruption is very insignificant.
On the other hand, if multiple web services time out almost at the same time, you might see a "Not Found" page, significantly denying service to users in various stages of business processes (for example, approving and issuing a check to purchase an item). While the shaky cursor might be due to a packet loss, it might also be the result of a web service timing out.
Now apply the thresholds to the second-generation web services applications paradigm (see Resources) as shown in Figure 4 -- particularly the thresholds between the provider and the broker, between the provider and the client, and between the client and the provider.
Figure 4. Thresholds for each player in the web services paradigm
When the application provider, as shown in Figure 4, sends the application broker a request to publish a service, and an unanticipated interruption occurs along the way, the system interruption threshold must be at or greater than 98.7%. When the broker sends an alert to the provider on the success or failure of service publication, and packet loss occurs, the network interruption must be significant at or greater than the 98.7% threshold.
However, when the broker resends an alert to the client on the status of service discovery due to a temporarily unanticipated interrupted service, the system interruption threshold is slightly higher -- at 99.5% due to the importance of discovering a web service as specified in a SLA agreement. The threshold for system interruption between the client and the provider is set at 96.97% -- slightly lower than the other two thresholds.
The thresholds for application provider, broker, and client in the second-generation web services applications paradigm need not to be the same in both directions. The thresholds can vary in either direction for each role player, but also from time to time (for example, network traffic at its high and low points). This depends on, for example, network traffic, bandwidths competing with one another as they enter bottlenecks, and different bandwidth requirements among the web services and non-web services in SOA.
For instance, the threshold is set for 98.5% when the provider sends the broker a service publication request and for 98.7% when the broker sends an alert to the provider. Likewise, the threshold is set for 99.2% when the client sends the broker a service discovery request and for 99.5% when the broker sends an alert to the client.
Most SLAs treat uptime availability measurements as the blanket hiding the range of system interruptions that these SLAs have not addressed. One way of getting around this problem is to develop measurable thresholds that developers can use to test the uptime availability in the SOA.
Listing system interruption thresholds in the SLA depends on what unanticipated system interruptions of web service applications the client has agreed to as significant when the SLA specifies, for example, 99.9% uptime availability. For instance, if the client agrees to 98.1% threshold for a potentially significant system interruption incident, then you can apply this threshold to the guarantee of the 99.9% uptime availability. You can assist the clients in making a threshold agreement.
You should consider a range of possible system interruptions due to packet loss and traffic bottlenecks that are most likely to happen in terms of historical system interruptions resulting in significant negative impacts on uptime availability. After this, develop a list of solutions for integrating web services applications into EAI to maximize uptime availability as specified in a SLA. The higher the system interruption thresholds are, the better the guarantee the SLA will give on uptime availability for web services integration into EAI.
- Get information on how to use SLAs in a web services context from the other papers in this series:
- "Guarantee your web service with a SLA" (developerWorks, April 2002)
- "Guarantee second-generation web services applications with a SLA" (developerWorks, August 2004)
- "Secure multiple web services with a SLA guarantee" (developerWorks, November 2004)
- "Firewall web services with a SLA guarantee" (developerWorks, December 2004)
- "Localize web services with a SLA guarantee" (developerWorks, January 2005)
- "Mitigate risk for vulnerability with a SLA guarantee" (developerWorks, January 2005)
- Learn more about PushtoTest to test and monitor web services
- Read Judith M. Myerson's The Complete Book of Middleware, which focuses on the essential principles and priorities of system design and emphasizes the new requirements brought forward by the rise of e-commerce and distributed integrated systems.
- Get the business insight and the technical know-how to ensure successful systems integration by reading Enterprise Systems Integration, Second Edition.
- Publish your web service or application in the IBM UDDI version 2 registry, which features a graphical user interface and conformant APIs for public use.
- Explore and learn more about "Complex datatypes in SOAP-based web services" when you build them from existing Java code (developerWorks, May 2001).
- Go into the nuts and bolts of developing a service-level agreement in this IBM Redbook for Domino administrators.
- Browse for books on these and other technical topics.
- Want more? The developerWorks SOA and web services zone hosts hundreds of informative articles and introductory, intermediate, and advanced tutorials on how to develop web services applications.
Judith M. Myerson is a systems architect and engineer. Her areas of interest include middleware technologies, enterprise-wide systems, database technologies, application development, network management, security, and project management. You can contact her at firstname.lastname@example.org.