PureApplication System provides a flexible platform for running a diverse range of different application workloads in a cloud infrastructure. Its design helps to eliminate single points of failure with the goal of allowing applications running in the rack to achieve high levels of availability.
Businesses that are seeking the very highest levels of resiliency must consider how to run their workloads across multiple systems and even across geographically distributed data centers, so that problems outside of any one piece of hardware, a local network, or power supply do not result in a potentially lengthy outage for that service.
This article describes a design for such a multisite environment using PureApplication System, and describes how deployed application workload across both sites could potentially continue to provide high availability through a series of possible failure scenarios. It is written with the interests of infrastructure and application architects in mind, to help them understand how PureApplication System can be used to help meet more demanding non-functional requirements.
The article High availability topologies for PureApplication System introduces some topologies for achieving high availability for applications running on a single PureApplication System rack and for a multirack setup. This is recommended reading before continuing with this article.
Setting up web applications for high availability
A true high availability solution needs to span multiple geographically distributed data centers. In this article, we discuss a design that assumes we have PureApplication System racks running the same application in two different data centers, with a wide area network between them. The networking infrastructure includes load balancing capabilities to distribute user requests for the application equally across the two sites.
Before we discuss our design further, we need to choose an application type to focus on. Application architecture is a key consideration when designing a topology to support high availability. Issues such as where in the application topology data is stored and what approaches are used to handle successive user requests in an application all affect the choices that can be made about how to achieve a high availability service. For example, if the state of a user's shopping cart is held in memory between successive user requests, the high availability architecture needs to take account of the required behavior if that component is affected by an outage. By selecting a simple common example application architecture and making a few associated assumptions, we can provide a concise introduction to this subject.
We focus on a classic three-tier web application architecture comprised of an application server fronted by a web server, and backed by a database. To further simplify the discussion, we assume that all application data is stored in the database, and that no data is stored in the application layer. In this way, we can focus the discussion on how to achieve high availability of applications running on PureApplication System, without complicating the story with considerations around data replication and concurrency. These important topics will be covered in a future article.
We also assume that the database component of the application is managed on a separate infrastructure from the web application, as is the practice in most customer environments. The database needs to be highly available too, of course, so we rely on the database infrastructure to provide those levels of service. The application running on PureApplication System remotely connects into that highly available database infrastructure from each data center. Our final assumption is that the client has redundant load balancing that can distribute requests across the remote data centers as part of their core network capability.
Figure 1 shows an overview of a sample setup with the features we describe here.
Figure 1. Overview of a sample multisite setup
This architecture allows for the racks to be in geographically distributed data centers and operate in what is described as "Active-Active" mode, meaning that both racks handle live production requests from users during normal operational conditions. If either rack or data center has a problem, the user requests are routed to the operational rack and continue to be serviced. When the problem has been solved, the load is again handled by both racks.
Deploying an application across two systems
Let's take a more detailed look at the PureApplication System racks in this setup. We are focusing on web applications, so there needs to be a web server, an application server, and the application itself, arranged in a topology that supports high availability. It is useful to talk about a real application so we use TradeLite as an example.
Sample web application and database
TradeLite is the IBM WebSphere® Application Server end-to-end benchmark and performance sample application. The benchmark is designed and developed to cover WebSphere's extensive programming model. This provides a real world workload driving WebSphere's implementation of Java Platform Enterprise Edition 1.4 and web services, including key WebSphere performance components and features.
TradeLite does not rely on any state being maintained in the application server tier, and its associated database can be run on a remote database server that fits nicely with the design of our sample setup.
To achieve high availability service levels, the application needs to run on a clustered application server, so that on each rack, application requests are being serviced by multiple nodes. To achieve this on PureApplication System, we can select one of the WebSphere patterns that come as standard that provides this clustering capability – the WebSphere cluster pattern.
WebSphere cluster pattern
The WebSphere cluster pattern is a virtual system pattern. For a detailed description of virtual system patterns, and other types of patterns available in PureApplication System, refer to the article Preparing for IBM PureApplication System, Part 1: Onboarding applications overview.
Figure 2 shows the customized WebSphere cluster pattern in the PureApplication System pattern editor.
Figure 2. The customized WebSphere cluster virtual system pattern
The WebSphere cluster pattern contains the components to build a WebSphere cluster such as a deployment manager node, custom nodes, and web server nodes. By configuring the pattern to deploy two custom nodes and two web server nodes, you can help attain high availability by eliminating potential single points of failure.
Script packages in the pattern automate configuration tasks, such as installing and configuring database drivers, tuning the Java virtual machine (JVM), and installing the TradeLite application.
Deploying this pattern to both racks creates separate, but identical copies running in each PureApplication System. There is no communication between the application instances, but the pattern permits configuration of both to access the same database for accessing and storing data. Figure 3 shows the pattern deployed to PureApplication System, and depicts the five virtual machines that are instantiated to create the cluster where the TradeLite application runs and the connection to the external database that gets configured.
Figure 3. The WebSphere cluster pattern deployed to PureApplication System
All the expertise to create this application configuration was captured by the pattern designer. The pattern can now be deployed by users who are not skilled in WebSphere cluster configuration, and a consistent, deployment can be achieved every time.
High availability scenarios
This section describes some scenarios that demonstrate what is possible with PureApplication System.
In a normal operation, user requests to the application are distributed across the two PureApplication System racks running identical copies of the application in independent WebSphere clusters, as shown in Figure 4.
Figure 4. Application workload being handled normally
The load balancer helps enable a situation in which roughly equal numbers of requests are being served between the two racks, and continues to permit this as long as it is able to detect that both application instances are in a healthy state.
Operational failover scenarios
This section looks at a number of possible failure scenarios that might occur, and examines how our multisite setup is able to cope with them and continue to service user requests. The intention here is to give confidence that PureApplication System can provide a range of capabilities to help minimize impact to the business of these possible failures.
Cluster member failure
If a member of the WebSphere cluster fails on one rack, requests should continue to be handled by other members of the cluster and by the cluster on the other rack. This is standard WebSphere cluster functionality. Under ideal circumstances, it is anticipated that there is no discernible effect on any user and only minor loss of bandwidth.
Other patterns are available for PureApplication System that include the WebSphere Version 8.5 Intelligent Management capabilities. These patterns include elastic scaling, where a failed cluster member is compensated for by automatically starting a new member of the cluster. This can help businesses where maintaining bandwidth and performance is also important. For more information on the Intelligent Management features of WebSphere V8.5, see What's new in WebSphere Application Server V8.5.
If an entire cluster fails, the load balancer detects that the cluster is not in a healthy state and starts to route transactions away from the failed cluster. Transactions continue to be served by the cluster on the other rack, and if capacity allows and systems are properly configured and maintained, the user should generally see no impact on performance.
Compute node failure
The high availability features of PureApplication System help maintain a state in which compute nodes are not single points of failure. PureApplication System automatically evacuates workload from a failing compute node and brings it up on an operational node. These recovery functions rely on a properly configured system that is fully operational with cloud resources consisting of at least two compute nodes.
Depending on the nature of the failure, this should happen without any discernible effect on the users because cluster members are already distributed across multiple compute nodes. Even if a compute node fails instantaneously, members on the other node should be available to continue service. In a high availability setup, service may also be provided by the cluster on the other rack.
If a rack fails, the load balancer notices that requests cannot be responded to and routes the workload to the other rack. See Figure 5.
Figure 5. Application workload being handled despite a rack outage
Depending on the application, users may notice a break in service. For example, if the user is logged in, they may find that they have to log in again. Transient data that has not been written to the database is lost when establishing a new session with the other rack. Other than this, service is maintained although bandwidth can be reduced. This can be mitigated against with elastic scaling and sufficient spare capacity on the racks, such that additional cluster members can be started to take the increased load.
This article described a sample design for running web applications on geographically distributed PureApplication System racks to help achieve high availability for critical business workloads. This design describes ways that can help to achieve these demanding high availability service levels, providing resilience against component failures up to and including the loss of an entire data center.
PureApplication System is fundamental to this design and because of its pattern-based deployment approach, setting up and maintaining a multirack, multisite environment can be easy, repeatable, and in ideal circumstances, free from significant errors. PureApplication System coupled with the resilience and scalability of WebSphere clustering technology makes for an outstanding platform for today's businesses to run their business critical application workloads.
While this article has concentrated on the design of a high availability setup and how it behaved in various operational scenarios, other considerations exist around the lifecycle of applications that businesses need to consider when working out how to support their IT environments. One key consideration is how to efficiently apply maintenance to the operating systems, middleware, and applications while maintaining high availability service levels.
PureApplication System has a number of facilities that can be used to streamline application, middleware, and operating system maintenance tasks, while still promoting high availability of the business application. A sequel to this article, High availability during operational maintenance, describes these facilities in more detail.
The authors would like to thank Peter Van Sickel, Shaun Murakami, Simeon D. Monov, and Venkata Gadepalli for their contributions to the creation of this article. Thanks also to Kyle Brown, Jeffrey Coveyduc, and Susan Holic for reviewing the article prior to publishing.
- High availability topologies for PureApplication System
- Preparing for PureApplication System, Part 1: Onboarding applications overview
- What's new in WebSphere Application Server V8.5
- Achieving high availability during operational maintenance using IBM PureApplication System
- IBM PureApplication System web site
- IBM PureSystems resource page on developerWorks