Achieving high availability across multiple sites using IBM PureApplication System

This article describes how multiple IBM® PureApplication™ System racks can be located in geographically distributed data centers to host business critical workloads that require the highest levels of availability and rapid recovery following a disaster.

Share:

Andrew J. F. Bravery (andrewjf_bravery@uk.ibm.com), Senior Technical Staff Member, IBM

Andy Bravery photoAndy Bravery is a Senior Technical Staff Member for IBM Cloud Labs, a division of IBM Software Group. He has been with IBM for 24 years and currently works with customers in designing cloud computing solutions. Before joining Cloud Labs, Andy led a team in IBM’s CIO Lab organization, exploring and implementing platform-as-a-service solutions to support internal business processes. As a member of the Emerging Technology Services team in Hursley, United Kingdom, he has a long history of working with early adopter clients on innovative solutions to gain a competitive edge.



Rehan Altaf (rehan@us.ibm.com), Cloud Architect, IBM

Photo of Rehan AltafRehan Altaf is a Cloud Architect for IBM Cloud Labs, a division of IBM Software Group. He has been with IBM for six years and currently works with customers in designing cloud computing solutions. Before joining Cloud Labs, Rehan was part of the IBM Support Services for the Tivoli team where he implemented private cloud solutions for customer in production data centers. He also worked in Lotus helping customers develop applications on Domino.



Animesh Singh (singhan@us.ibm.com), Senior Cloud Architect, IBM

Photo of Animesh SinghAnimesh Singh is a Senior Cloud Architect for IBM Cloud Labs, a division of IBM Software Group. He has been with IBM for seven years and currently works with customers in designing cloud computing solutions. He has been leading cutting edge projects for IBM enterprise customers in Telco, Banking, and Healthcare Industries, around cloud and virtualization technologies and has a proven track record of driving design and implementation of private and public cloud solutions from concept to production. He also led the design and development of early versions of the IBM public cloud offering, IBM SmartCloud Enterprise.



22 May 2013

Also available in Chinese

Introduction

PureApplication System provides a flexible platform for running a diverse range of different application workloads in a cloud infrastructure. Its design helps to eliminate single points of failure with the goal of allowing applications running in the rack to achieve high levels of availability.

Businesses that are seeking the very highest levels of resiliency must consider how to run their workloads across multiple systems and even across geographically distributed data centers, so that problems outside of any one piece of hardware, a local network, or power supply do not result in a potentially lengthy outage for that service.

This article describes a design for such a multisite environment using PureApplication System, and describes how deployed application workload across both sites could potentially continue to provide high availability through a series of possible failure scenarios. It is written with the interests of infrastructure and application architects in mind, to help them understand how PureApplication System can be used to help meet more demanding non-functional requirements.

The article High availability topologies for PureApplication System introduces some topologies for achieving high availability for applications running on a single PureApplication System rack and for a multirack setup. This is recommended reading before continuing with this article.


Setting up web applications for high availability

A true high availability solution needs to span multiple geographically distributed data centers. In this article, we discuss a design that assumes we have PureApplication System racks running the same application in two different data centers, with a wide area network between them. The networking infrastructure includes load balancing capabilities to distribute user requests for the application equally across the two sites.

Before we discuss our design further, we need to choose an application type to focus on. Application architecture is a key consideration when designing a topology to support high availability. Issues such as where in the application topology data is stored and what approaches are used to handle successive user requests in an application all affect the choices that can be made about how to achieve a high availability service. For example, if the state of a user's shopping cart is held in memory between successive user requests, the high availability architecture needs to take account of the required behavior if that component is affected by an outage. By selecting a simple common example application architecture and making a few associated assumptions, we can provide a concise introduction to this subject.

We focus on a classic three-tier web application architecture comprised of an application server fronted by a web server, and backed by a database. To further simplify the discussion, we assume that all application data is stored in the database, and that no data is stored in the application layer. In this way, we can focus the discussion on how to achieve high availability of applications running on PureApplication System, without complicating the story with considerations around data replication and concurrency. These important topics will be covered in a future article.

We also assume that the database component of the application is managed on a separate infrastructure from the web application, as is the practice in most customer environments. The database needs to be highly available too, of course, so we rely on the database infrastructure to provide those levels of service. The application running on PureApplication System remotely connects into that highly available database infrastructure from each data center. Our final assumption is that the client has redundant load balancing that can distribute requests across the remote data centers as part of their core network capability.

Figure 1 shows an overview of a sample setup with the features we describe here.

Figure 1. Overview of a sample multisite setup
Overview of a sample multisite setup

This architecture allows for the racks to be in geographically distributed data centers and operate in what is described as "Active-Active" mode, meaning that both racks handle live production requests from users during normal operational conditions. If either rack or data center has a problem, the user requests are routed to the operational rack and continue to be serviced. When the problem has been solved, the load is again handled by both racks.


Deploying an application across two systems

Let's take a more detailed look at the PureApplication System racks in this setup. We are focusing on web applications, so there needs to be a web server, an application server, and the application itself, arranged in a topology that supports high availability. It is useful to talk about a real application so we use TradeLite as an example.

Sample web application and database

TradeLite is the IBM WebSphere® Application Server end-to-end benchmark and performance sample application. The benchmark is designed and developed to cover WebSphere's extensive programming model. This provides a real world workload driving WebSphere's implementation of Java Platform Enterprise Edition 1.4 and web services, including key WebSphere performance components and features.

TradeLite does not rely on any state being maintained in the application server tier, and its associated database can be run on a remote database server that fits nicely with the design of our sample setup.

To achieve high availability service levels, the application needs to run on a clustered application server, so that on each rack, application requests are being serviced by multiple nodes. To achieve this on PureApplication System, we can select one of the WebSphere patterns that come as standard that provides this clustering capability – the WebSphere cluster pattern.

WebSphere cluster pattern

The WebSphere cluster pattern is a virtual system pattern. For a detailed description of virtual system patterns, and other types of patterns available in PureApplication System, refer to the article Preparing for IBM PureApplication System, Part 1: Onboarding applications overview.

Figure 2 shows the customized WebSphere cluster pattern in the PureApplication System pattern editor.

Figure 2. The customized WebSphere cluster virtual system pattern
The customized WebSphere cluster virtual system pattern

The WebSphere cluster pattern contains the components to build a WebSphere cluster such as a deployment manager node, custom nodes, and web server nodes. By configuring the pattern to deploy two custom nodes and two web server nodes, you can help attain high availability by eliminating potential single points of failure.

Script packages in the pattern automate configuration tasks, such as installing and configuring database drivers, tuning the Java virtual machine (JVM), and installing the TradeLite application.

Deploying this pattern to both racks creates separate, but identical copies running in each PureApplication System. There is no communication between the application instances, but the pattern permits configuration of both to access the same database for accessing and storing data. Figure 3 shows the pattern deployed to PureApplication System, and depicts the five virtual machines that are instantiated to create the cluster where the TradeLite application runs and the connection to the external database that gets configured.

Figure 3. The WebSphere cluster pattern deployed to PureApplication System
The WebSphere cluster pattern deployed to PureApplication System

All the expertise to create this application configuration was captured by the pattern designer. The pattern can now be deployed by users who are not skilled in WebSphere cluster configuration, and a consistent, deployment can be achieved every time.


High availability scenarios

This section describes some scenarios that demonstrate what is possible with PureApplication System.

Normal operation

In a normal operation, user requests to the application are distributed across the two PureApplication System racks running identical copies of the application in independent WebSphere clusters, as shown in Figure 4.

Figure 4. Application workload being handled normally
Application workload being handled normally

The load balancer helps enable a situation in which roughly equal numbers of requests are being served between the two racks, and continues to permit this as long as it is able to detect that both application instances are in a healthy state.


Operational failover scenarios

This section looks at a number of possible failure scenarios that might occur, and examines how our multisite setup is able to cope with them and continue to service user requests. The intention here is to give confidence that PureApplication System can provide a range of capabilities to help minimize impact to the business of these possible failures.

Cluster member failure

If a member of the WebSphere cluster fails on one rack, requests should continue to be handled by other members of the cluster and by the cluster on the other rack. This is standard WebSphere cluster functionality. Under ideal circumstances, it is anticipated that there is no discernible effect on any user and only minor loss of bandwidth.

Other patterns are available for PureApplication System that include the WebSphere Version 8.5 Intelligent Management capabilities. These patterns include elastic scaling, where a failed cluster member is compensated for by automatically starting a new member of the cluster. This can help businesses where maintaining bandwidth and performance is also important. For more information on the Intelligent Management features of WebSphere V8.5, see What's new in WebSphere Application Server V8.5.

Cluster failure

If an entire cluster fails, the load balancer detects that the cluster is not in a healthy state and starts to route transactions away from the failed cluster. Transactions continue to be served by the cluster on the other rack, and if capacity allows and systems are properly configured and maintained, the user should generally see no impact on performance.

Compute node failure

The high availability features of PureApplication System help maintain a state in which compute nodes are not single points of failure. PureApplication System automatically evacuates workload from a failing compute node and brings it up on an operational node. These recovery functions rely on a properly configured system that is fully operational with cloud resources consisting of at least two compute nodes.

Depending on the nature of the failure, this should happen without any discernible effect on the users because cluster members are already distributed across multiple compute nodes. Even if a compute node fails instantaneously, members on the other node should be available to continue service. In a high availability setup, service may also be provided by the cluster on the other rack.

Rack failure

If a rack fails, the load balancer notices that requests cannot be responded to and routes the workload to the other rack. See Figure 5.

Figure 5. Application workload being handled despite a rack outage
Application workload being handled despite a rack outage

Depending on the application, users may notice a break in service. For example, if the user is logged in, they may find that they have to log in again. Transient data that has not been written to the database is lost when establishing a new session with the other rack. Other than this, service is maintained although bandwidth can be reduced. This can be mitigated against with elastic scaling and sufficient spare capacity on the racks, such that additional cluster members can be started to take the increased load.


Conclusion

This article described a sample design for running web applications on geographically distributed PureApplication System racks to help achieve high availability for critical business workloads. This design describes ways that can help to achieve these demanding high availability service levels, providing resilience against component failures up to and including the loss of an entire data center.

PureApplication System is fundamental to this design and because of its pattern-based deployment approach, setting up and maintaining a multirack, multisite environment can be easy, repeatable, and in ideal circumstances, free from significant errors. PureApplication System coupled with the resilience and scalability of WebSphere clustering technology makes for an outstanding platform for today's businesses to run their business critical application workloads.

While this article has concentrated on the design of a high availability setup and how it behaved in various operational scenarios, other considerations exist around the lifecycle of applications that businesses need to consider when working out how to support their IT environments. One key consideration is how to efficiently apply maintenance to the operating systems, middleware, and applications while maintaining high availability service levels.

PureApplication System has a number of facilities that can be used to streamline application, middleware, and operating system maintenance tasks, while still promoting high availability of the business application. A sequel to this article, High availability during operational maintenance, describes these facilities in more detail.

Acknowledgements

The authors would like to thank Peter Van Sickel, Shaun Murakami, Simeon D. Monov, and Venkata Gadepalli for their contributions to the creation of this article. Thanks also to Kyle Brown, Jeffrey Coveyduc, and Susan Holic for reviewing the article prior to publishing.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Cloud computing on developerWorks


  • Bluemix Developers Community

    Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

  • developerWorks Labs

    Experiment with new directions in software development.

  • DevOps Services

    Software development in the cloud. Register today to create a project.

  • Try SoftLayer Cloud

    Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing, WebSphere
ArticleID=930966
ArticleTitle=Achieving high availability across multiple sites using IBM PureApplication System
publish-date=05222013