Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Optimize resource usage and reduce costs, Part 3: Use resilient design patterns to reduce total cost of ownership

Murali Narasimhadevara (muralin@us.ibm.com), Senior IT Architect and Webmaster, IBM, Software Group
Murali Narasimhadevara is a Senior IT architect with the IBM CIO office. Murali is also the Senior Webmaster for the IBM intranet, and has been helping develop it for the past eight years. He has extensive experience in building and managing high volume Web sites, application and Web server administration with a focus in WebSphere, performance/capacity planning, and enterprise application design. His areas of interest are in autonomic and utility computing for managing Web infrastructures.
Mahi R. Inampudi (inampudi@us.ibm.com), Lead IT Architect, IBM, Software Group
Mahi R. Inampudi is the lead IT architect for IBM's On Demand Workplace expertise location system (BluePages). Other responsibilities include the architecture and solution design for several of IBM's internal offerings and collaborating with the CIO office and IBM Research helping design applications using the latest SOA methods. Recent interests include leveraging newer technologies, such as WebSphere Extended Deployment, the Rational product suite, and IBM's intraGrid architecture.

Summary:  This is the third article in our series about why and how the IBM intranet portal team implemented WebSphere® Extended Deployment to upgrade the IBM internal enterprise application infrastructure. This article discusses design patterns that can help an application achieve autonomic resiliency to protect, optimize, and reconfigure itself, and heal from outages. The authors also discuss a short-circuit pattern to circumvent a deadlock situation, a service availability pattern, and show how to apply a service availability pattern to a sample application.

View more content in this series

Date:  14 Feb 2006
Level:  Intermediate
Also available in:   Japanese

Activity:  9396 views
Comments:  

Overview

This article discusses how characteristics of autonomic computing can affect the way we build software products, IT infrastructure architectures, and application architectures. Typical enterprise architectures do not always give optimum performance at 100% availability. At times, service exploiters that are dependent on the availability of your services could get into trouble.

With the help of design patterns, you can build autonomic resiliency into an application to protect, optimize, and reconfigure itself. In this article we discuss:

Review

Autonomic computing technology originates from the study of human anatomy. It's really all about how the human brain predicts events, and how:

  • It sends signals to the body parts to react to certain predictions to self-prevent any damage
  • The human body is self-healing
  • The body adjusts to changes in the environment

Do any of those principles apply to IT, applications, or software? IBM Research (see Resources) started studying autonomic computing based on the principle of building computer systems that regulate themselves similar to how our nervous system regulates and protects our bodies.

The fundamental, resilient features of the autonomic computing architecture for application development, often called self-CHOP, are:

  • Self-configuring
  • Self-healing
  • Self-operating
  • Self-protecting

Resilient design patterns

Autonomic application resilient design patterns aren't new. Until recently, most were basic patterns supporting simple and obvious goals, or for handling a simple problem. We'll describe how you can use a few autonomic application resiliency design patterns to reduce total cost of ownership. First we'll review a simpler resilient design pattern, then we'll discuss more complicated design patterns.

Typically, an autonomic application resilient design pattern involves four steps to build the logic into the applications. You then need software products to build the autonomic nature and application resiliency for the dynamic nature, or run time architectural issues, in enterprise architecture. The four steps, called a MAPE loop, are:

Monitor
With most enterprise architecture, it's common to have monitoring tools to capture events, alert the administration team, and help build metrics for application or service availability.
Analyze
In autonomic computing it's important to analyze the data gathered by the monitoring tools to understand the type of event and the root cause of the issue.
Plan
Knowledge gathered by the analysis step needs to be converted into a plan to create a resiliency pattern to build logic to handle the event, which could potentially cause undesired results.
Execute
When monitors trigger an event, and if the analysis data detects a serious event, there must be logic in the application code to act upon the plan created by the predefined policy definitions as the actions to certain events.

A short-circuit pattern

When applications call their own services or services that have shared infrastructure components, if one component slows down or doesn't respond it's going to result in a deadlock situation. The deadlock would then aggravate the problem, and eventually result in an outage of all applications that are dependent on the shared components.

Figure 1 below shows a scenario of an enterprise architecture with applications causing deadlocks at the enterprise level, potentially causing outages. In Scenario 1 in the figure, all applications are healthy in terms of performance, and all shared components are healthy. Application A is a Web application that also makes calls to Service Provider B, which uses the same front-end proxy infrastructure.

In Scenario 2 in Figure 1, Service Provider B slows down, resulting in an increased number of threads on proxy servers. Because Application A also makes calls to Service Provider B, and those calls also go through the same proxy front end, Application A requests will also add to the number of threads waiting for a response on proxy servers. It's similar to www.abcd.com code making calls to www.abcd.com code over HTTP. This causes a sudden increase in the threads waiting to get processed or waiting in the proxy server's queues, aggravating the issue to an outage condition for all applications, including Application B shown in Scenario 3.


Figure 1. Applications causing deadlocks
Scenario of an enterprise architecture which has applications causing deadlocks at enterprise level, potentially causing outages

An application making services calls needs better control over service components in the architecture, and more options for reacting to run time status changes such as performance degradation or outages. Relying on TCP level timeouts often is not an ideal choice, depending on the load an application gets and how reliable an external service is.

A thread timeout mechanism is the monitor mechanism for external service calls; it is an "event," and the kill thread logic that is a predefined mechanism for the execution of the plan. This timeout mechanism can also be thought of as the policy definition for this design pattern.

Figure 2 shows a possible workaround for the deadlock situation. A service call is made using a thread, putting governance and predefined rules such as timeouts on the thread, which then gives much better control over the service call. You need to weigh the architectural advantages versus the drawbacks of initiating threads from a servlet. At times, the former could be a lot more valuable and justify the intentional use of such anti-patterns.


Figure 2. Sequence diagram for short circuit pattern
Sequence diagram for short circuit pattern

Service availability pattern

Service availability patterns aren't new. You can see their applications in various places in our daily lives. We'll look at one scenario where this pattern helps, then review how the advantages can be applied to IT architectures.

Figure 3 shows a common daily scenario for those who drive. Traffic is monitored by different tools. After analyzing the data, a traffic controller normally broadcasts the traffic congestion areas and warns about delays. Drivers can then change their plans to choose alternate routes to help themselves and to smooth an uneven traffic situation.


Figure 3. Traffic control system
Traffic control system

IT Applications get lots of "traffic" in terms of Web transactions or application requests from end users, or from other applications exploiting Web services. Applications don't always give good response times, called slow down, for various reasons. Because of costs, it's very common for application architectures to have shared infrastructure components in the enterprise architectures.

As explained in A short-circuit pattern, one heavily used application or service having degraded performance issues or outages can affect the entire enterprise architecture. Let's apply the traffic controller pattern to one such situation in an IT enterprise architecture, and find a solution for a much healthier application infrastructure during outages or degraded performance situations.

Enterprise applications typically have many external interfaces within the outside data center, such as directory calls, database calls, MQ calls, or SOAP-based Web services calls. These applications don't have knowledge of the availability of the external interfaces run-time performance, so most of these interfaces normally have timeout settings. But, sometimes the timeouts are too late to prevent an outage to an application because there are too many threads in Web application servers waiting to get timed out.

For example, an application could be making a URL connection to another servlet to receive an XML document as the response. Another application makes a SOAP request over HTTP to retrieve a SOAP response. The timeout setting available in this scenario is the TCP level timeout value, or the default HTTP return codes returned by the service Web application servers. Unfortunately, imagine what would happen if the server was available and reachable but was responding very slowly. This would cause too many threads, assuming a heavily used application with a lot of concurrent users, waiting to get responses or waiting to get timed out in the application server's Web container. This could often cause a cascaded effect on the shared front-end servers, as shown in Figure 1.

In this design pattern, as shown in Figure 4, it's a regular practice for any enterprise to set up monitoring tools, such as Tivoli® products. They monitor their application infrastructure, and report any events such as degraded performance or outage, and send alert messages to the administration team to take actions immediately.


Figure 4. Service availability pattern
Service availability pattern

Imagine that a Web service or an LDAP server is facing degraded performance. Applications that are using it could get into trouble. Based on alerts given by monitoring tools, administrators and application support teams take appropriate actions; in a worst-case scenario, a recycle of the applications might be needed. Thinking from the autonomic computing perspective, because the monitoring tools already detected the failure, why not let the monitoring tools hook up with the application server infrastructure through a mediator called Service Availability Broadcast System? The Service Availability system simply analyzes data from monitoring tools, and passes on the messages to the interested applications that are subscribers of a given service availability topic. For example, an application that uses LDAP heavily might be interested in knowing any events related to LDAP availability so it can react to the LDAP events more gracefully.

The use case in Figure 5 shows the flow of how applications could subscribe to the availability event notifications of a target application, service, or an infrastructure component.


Figure 5. Use case for service availability pattern
Use case for Service Availability Pattern

Traditional monitoring tools will monitor and post any availability or performance-related events to the service availability system. Applications could subscribe to a particular target infrastructure component, such as a Web service, LDAP server, or DB2® server, or even monitor the network availability between two geographies that could impact the availability. Based on the information the Service Availability System receives, availability data from traditional monitoring data could then broadcast the event notification to the subscriber's applications that are interested in knowing the status of an enterprise component's availability information.

This pattern applies very well to a Web services model, too. Figure 6 shows the services triangle with the interaction between service providers, service consumer, and the dynamic discovery and invocation of services.


Figure 6. Service availability pattern for Web services model
Service Availability pattern for Web Services model

The SOAP Fault specification only handles the cases of known application issues; Listing 1 shows a sample.


Listing 1. SOAP Fault specification
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
 xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" >
  <soap:Body>
    <soap:Fault>
      <faultcode>-1000</faultcode>
   <faultstring>Database is unavailable.</faultstring>
   <detail/>
   </soap:Fault>
   </soap:Body>
   </soap:Envelope>

But the question remains of what happens if the Web application server hosting this Web service has performance issues. Even to return a SOAP Fault specification will have to go through the same infrastructure issues, so it does not help the situation much.

A Service availability pattern can help dynamic services to not only discover the service providers, but also to find the current run time status. The consumer will know the availability of the service provider at that moment, and can decide if it needs to call it or search to find another service in the service registry/UDDI.


Applying a service availability pattern

In the following example, Application A receives over seven million requests a day and a throughput of over 250 servlet requests per second during peak hours. Most of the application transactions need to make LDAP calls with varied levels of complexity in the search filter. During events that involve LDAP performance issues or outages, the optimal timeout value that is set (including the cases where some of the searches take longer) creates too many Web container threads. The threads that are waiting to either get timed out by the LDAP server or get a response from the LDAP server have a cascaded effect on the front-end IP sprayers and caching proxy servers. In this infrastructure, caching proxy servers are shared across the entire portfolio of applications, so they almost cause an outage for all applications that are supported by the shared caching proxy infrastructure.

Figure 7 shows how a service availability pattern was applied to yield a much healthier Application A during LDAP issues, and also protect caching proxy servers or front-end edge infrastructure.


Figure 7. Example of service availability pattern
Example of the use of Service availability pattern

Application A makes many concurrent calls. A custom monitoring tool for the LDAP cluster was in place for the infrastructure. The LDAP/ED availability tool gathers information from the LDAP monitoring tool and captures any availability events. As soon as the tool detects LDAP issues, it notifies the application (by updating an application property file). The application is scheduled to pick up any changes to the property file, gathering the knowledge of all LDAP availability events. After the outage or performance events are noted, the application simply tries to make any LDAP calls. It already knows the request is going to get timed out anyway, and provides a graceful error message to the end-user transaction saying the LDAP infrastructure is having issues and please try again later.

As soon as the availability status changes, the ED availability tool interprets it based on the LDAP monitor tool data it gets and notifies the application in a similar way. After the application knows the LDAP cluster is again available and performing well, it will start making LDAP calls and service is restored. This sample outlines how to get better event handling in typical Service-Oriented Architectures (SOA).


Resources

Learn

Get products and technologies

Discuss

  • Join the discussion: Drop in on the "Autonomic computing: an insider's perspective" discussion forum.

  • Blog: Dave Bartlett, IBM VP of Autonomic Computing, shares his perspective.

  • developerWorks blogs: Get involved in the developerWorks community.

About the authors

Murali Narasimhadevara

Murali Narasimhadevara is a Senior IT architect with the IBM CIO office. Murali is also the Senior Webmaster for the IBM intranet, and has been helping develop it for the past eight years. He has extensive experience in building and managing high volume Web sites, application and Web server administration with a focus in WebSphere, performance/capacity planning, and enterprise application design. His areas of interest are in autonomic and utility computing for managing Web infrastructures.

Mahi R. Inampudi

Mahi R. Inampudi is the lead IT architect for IBM's On Demand Workplace expertise location system (BluePages). Other responsibilities include the architecture and solution design for several of IBM's internal offerings and collaborating with the CIO office and IBM Research helping design applications using the latest SOA methods. Recent interests include leveraging newer technologies, such as WebSphere Extended Deployment, the Rational product suite, and IBM's intraGrid architecture.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Tivoli
ArticleID=103808
ArticleTitle=Optimize resource usage and reduce costs, Part 3: Use resilient design patterns to reduce total cost of ownership
publish-date=02142006
author1-email=muralin@us.ibm.com
author1-email-cc=
author2-email=inampudi@us.ibm.com
author2-email-cc=

Next steps from IBM

Rational products offer a complete collaborative lifecycle management set of tools for the IBM i platform for existing and new developers.


Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers