Achieving high availability during operational maintenance using IBM PureApplication System

This article describes how IBM® PureApplication™ System and IBM WebSphere® Application Server combine to streamline and simplify updates to operating systems, applications, and middleware, thereby minimizing human error and total time to update and maintain high availability.

Animesh Singh (singhan@us.ibm.com), Senior Cloud Architect, IBM

Photo of Animesh SinghAnimesh Singh is a Senior Cloud Architect for IBM Cloud Labs, a division of IBM Software Group. He has been with IBM for seven years and currently works with customers in designing cloud computing solutions. He has been leading cutting edge projects for IBM enterprise customers in Telco, Banking, and Healthcare Industries, around cloud and virtualization technologies and has a proven track record of driving design and implementation of private and public cloud solutions from concept to production. He also led the design and development of early versions of the IBM public cloud offering, IBM SmartCloud Enterprise.



Andrew J. F. Bravery (andrewjf_bravery@uk.ibm.com), Senior Technical Staff Member, IBM

Andy Bravery photoAndy Bravery is a Senior Technical Staff Member for IBM Cloud Labs, a division of IBM Software Group. He has been with IBM for 24 years and currently works with customers in designing cloud computing solutions. Before joining Cloud Labs, Andy led a team in IBM’s CIO Lab organization, exploring and implementing platform-as-a-service solutions to support internal business processes. As a member of the Emerging Technology Services team in Hursley, United Kingdom, he has a long history of working with early adopter clients on innovative solutions to gain a competitive edge.



Rehan Altaf (rehan@us.ibm.com), Cloud Architect, IBM

Photo of Rehan AltafRehan Altaf is a Cloud Architect for IBM Cloud Labs, a division of IBM Software Group. He has been with IBM for six years and currently works with customers in designing cloud computing solutions. Before joining Cloud Labs, Rehan was part of the IBM Support Services for the Tivoli team where he implemented private cloud solutions for customer in production data centers. He also worked in Lotus helping customers develop applications on Domino.



22 May 2013

Also available in Chinese

Introduction

Operating systems, middleware, and applications require updates, fixes, and patches throughout the lifecycle of applications. PureApplication System provides an integrated maintenance management system that is available to help reduce application downtime and the possibility for human errors. The results may also minimize the time spent on repetitive maintenance operations. The user interface and automated tasks simplify the administrative task of applying maintenance to multiple running systems.

This article is one of two regarding high availability in a multisite implementation of PureApplication System. Ideally, the other article Achieving high availability across multiple sites using IBM PureApplication System should be read first. It introduces design principles for highly available stateless web applications, describes a design for a multisite environment, and explains how application workload deployed across two sites can provide high availability through a series of potential failure scenarios.

This article is intended for architects and operational team who manage updates to components of a PureApplication System environment. It introduces approaches available in PureApplication System for helping maintain middleware and applications. Using automation and knowledge of the environment, these approaches are available to help minimize application downtime during updates. The article includes a sample architecture to demonstrate choices that are available to help further reduce noticeable interruptions in application services. Finally, we look at how updates and fixes to application patterns, script packages, and images can be synchronized across multiple systems.


Overview of the sample architecture

This section briefly describes the sample application architecture for deployment of PureApplication System across two sites to achieve high availability through a series of maintenance scenarios. As stated above, design principles for highly available stateless web applications are described in the first article. This article uses the same architecture to discuss maintenance techniques and facilities. For your convenience, Figure 1 is provided as a reminder of this architecture.

Figure 1. Overview of the sample architecture
Overview of the sample architecture

To summarize, the TradeLite application is deployed on highly available IBM WebSphere Application Server clusters running on two racks. The racks are located in geographically distributed data centers, and operate in an "Active-Active" mode. We refer to PureApplication System in "Data Center A" as primary, and the one in "Data Center B" as secondary to identify them individually for clarity during the discussion.

So let us move on to how the sample architecture can be used to help minimize application downtime during maintenance. We describe the types of updates that are commonly encountered during the lifecycle of a web application running on PureApplication System, and then describe the procedures to apply those updates while helping maintain application high availability.


Maintenance of application deployments

You may receive different types of updates for different components of your deployed application on PureApplication System. The types of updates commonly encountered during application lifecycle maintenance operations in a data center include:

  • Application update
  • Middleware update
  • Patterns, images, and script packages update
  • Other updates such as operating system updates and security patches

We describe those update types in the context of PureApplication System, and how PureApplication System helps in seamlessly applying those updates while promoting high availability by utilizing the architecture described above.

Application update

An application update is generally a fix for bugs in the application, or a new version of the application that needs to be deployed. In web applications, these are typically delivered as a WAR file. An administrator can use the facilities of the WebSphere cluster to push a rolling upgrade out on to the primary rack without interrupting service. When the upgrade has completed successfully, the same operation can be carried out on the WebSphere cluster running on the secondary rack.

Middleware update

A middleware update is a fix for the middleware container hosting the application, or a new version of the middleware itself. In our scenario, the middleware container is WebSphere Application Server. We show how the integrated maintenance tooling in PureApplication System can allow an administrator to apply the service update to the WebSphere nodes in the primary rack cluster while the user requests are handled by the secondary rack. After the primary rack is successfully serviced, the same process can be carried out on the secondary rack.

You must consider two types of middleware updates: an emergency fix and a service level update. An emergency fix is an interim fix that addresses minor issues with the previous major release. A service level update is a version upgrade with major fixes and new features. They are handled differently in PureApplication System.

Patterns, images, and scripts packages update

PureApplication System leverages patterns for application deployments. Patterns as implemented in PureApplication System enable the potential for repeatable, rapid deployment of applications in to the cloud. This series of articles focuses on virtual system patterns, a further description of which can be found at Design a virtual system pattern.

Updates may need to be applied to pattern-related artifacts, such as images, scripts, and the pattern definitions themselves. These updates might be made by a pattern creator to add a new capability to the pattern, for example, add a new application version if it is part of the pattern, or to fix an issue with the current pattern version such as an incorrect configuration parameter.

In a multirack environment where the same applications are being run, it is important to synchronize these artifacts so that deployments are identical on each rack. We discuss the process for help in promoting this synchronization, and some of the utilities that PureApplication System provides to automate this work.

Other updates

Updates may need to be applied to running virtual machine instances, and may be related to the operating system or the security setup. The recommended way is to use the next version of hypervisor edition images containing those fixes as they are released, or use an emergency fix released for a critical update. Procedures for this are similar to how we apply updates for middleware.

Customers may also prefer to use their existing operating system update mechanisms, such as the Yellowdog Updater, Modified (YUM) repository, or IBM Tivoli® Endpoint Manager. Synchronization using these products is governed by the update mechanism used, and is outside the scope of this article.


Achieving high availability during maintenance

This section describes the three scenarios to apply the updates mentioned above, and steps to take to maintain application high availability during these updates.

Scenario 1: Continuous operation during application updates

Applications need to be updated to fix problems or add new features. Let's consider how to manage application updates across the multirack configuration detailed in our sample architecture. The goal is to update the application without any loss of service to users. .

Let's imagine we created a slightly modified version of our sample application, TradeLite. We assume that the code change does not require any corresponding change to the database.

Two features of our design help promote uninterrupted service for users during the update. The first feature is that our update procedure addresses one rack at a time, meaning that the other rack is always available to service requests. The other is the rolling update feature of WebSphere Application Server, which allows the new application code to be installed on to each of the nodes in the cluster one at a time while the remaining nodes continue to service user requests. As long as the cluster contains at least two nodes, it should be able to service requests during the update.

During an application update, the web server nodes are not stopped so the load balancer never detects an interruption in service at the HTTP layer of the cluster where the application is being updated. This means it continues to route requests to the cluster to be serviced. As application server nodes are stopped because of the update operation, the web server nodes route incoming requests to the other running cluster members. When the updated application servers are restarted, the web servers resume routing requests to the restarted, and now updated, cluster members.

Figure 2 shows how both racks continue to service requests while the update is being performed on the primary rack. The primary rack has slightly reduced bandwidth as application server nodes are taken out of the cluster to be upgraded.

Figure 2. Rolling update of application code on primary rack
Rolling update of application code on primary rack

Having successfully completed the application update on the primary rack, the same procedure can be used to update the secondary rack. Again, some slight degradation of bandwidth is to be expected on the rack being updated, but service can be maintained nevertheless. Figure 3 shows this second part of the procedure.

Figure 3. Rolling update of application code on secondary rack
Rolling update of application code on secondary rack

Scenario 2: Continuous operation during middleware updates

Scenario 2 focuses on middleware updates. In particular, we describe how to apply maintenance updates to WebSphere Application Server while helping maintain operations. Again, the goal is to perform this maintenance without interrupting application availability.

WebSphere interim fixes are classified and handled as emergency fixes in the PureApplication System console. You can upload WebSphere interim fixes to the PureApplication System emergency fix catalog and associated with existing virtual images to which they apply. You can then apply the emergency fix to the deployed instances of that image, and the existence of the fix is made clear to image owners.

If the fix is a major release with version number upgrades, WebSphere fix packs are made available as new virtual images that can be imported into the PureApplication System virtual image catalog. The virtual image with the new WebSphere fix pack can then be declared to be a service level update, and is then available to be applied to deployed instances of that original image.

An emergency fix or a fix pack virtual image that has been uploaded to PureApplication System becomes available to apply through a Service button on the action bar of any deployed virtual system instance to which that service is relevant. A service level (fix pack) update takes more time than an emergency fix because it involves replacing a set of virtual images rather than applying a relatively small patch or set of patches to existing images.

On PureApplication System, rolling updates like we saw for application updates in the previous scenario are currently not supported for WebSphere patches. WebSphere Application Server fixes and updates are applied to all nodes in the cluster at the same time. All nodes in the cluster are taken out of service and then restarted when the update has completed. This must be kept in mind when planning an update.

Figure 4 shows how our sample architecture maintains service to users through the primary rack while service is being applied to the secondary rack. Because the WebSphere cluster is taken out of commission during the service update, the load balancer stops routing requests to the secondary rack. When the load balancer detects that the secondary rack cluster is operational, it resumes spreading the requests across both racks.

Figure 4. Service being applied to WebSphere on secondary rack
Service being applied to WebSphere on secondary rack

The key benefit here is that PureApplication System helps automate these repetitive maintenance tasks, and keeps an association between the middleware images and related fixes making the process less error prone. The PureApplication System Information Center section, Applying service to virtual system instances, describes how to apply an emergency fix or a service level update.

Scenario 3: Maintenance of patterns, images, and scripts

As described before, we used virtual system patterns for deploying the application in our scenarios, and the system pattern is made up of images and scripts. Patterns, images, and scripts are not runtime artifacts, in that when a pattern has been deployed, the running application is not going to be affected if a change is made to any of those resources it was created from. However, to help promote multiple racks running the same workload are using the same assets as the basis of their deployments, procedures are needed to keep them synchronized.

PureApplication System provides a set of commands that you can use to export artifacts associated with a pattern into a compressed archive file format, and also provides matching import commands to load that export package into another PureApplication System rack.

Updates to patterns

To keep a pattern synchronized across racks when it is updated, it needs to be exported from the source rack and imported onto any target racks. Listing 1 is an example of the export command.

Listing 1. Sample export command
pure -h 10.60.0.10 -u singhan -p singhan -a -f samples/exportPattern.py --pattern 
"WebSphere 7.0.0.23 Cluster" --target MyWAS7Cluster.tgz

You can chose specific parts from the pattern to include in the archive using a filter file, which allows you to specifically select what to export rather than exporting everything in the pattern. This is an important feature when considering maintenance of patterns because the updated part may be just a couple of scripts, so it would be highly inefficient to include copies of the base images in the package when they had not actually changed in the update. Scripts typically may add a few kilobytes of data to the package, but images can be several gigabytes. Export, transmission to and from storage across the network, and import could be severely slowed down.

Listing 2 is an example of the matching import command that installs a pattern update package onto a target rack.

Listing 2. Matching import command
pure -h 10.60.0.20 -u singhan -p singhan -a -f samples/importPatterns.py 
-MyWAS7Cluster.tgz

If the package was built using a filter file, only the filtered artifacts are in the exported package, and therefore, they are the only parts that are imported into the target rack.

We recommend to have an external, off-rack storage available on which to perform these export/import operations because this could also serve as a backup mechanism for your pattern library.

Updates to images

If an image is updated on one rack and needs to be transferred to another, either export /import the image using the patterns commands and filter file as mentioned above, or use the explicit export(d) command in a script.

This method starts the asynchronous process of exporting a virtual image to a remote appliance or a local directory. The various pieces of the image are extracted from the internal repository and used to generate an open virtual appliance (OVA) file. This file is then copied to the specified location. We recommend using a remote web accessible location for these files because they can take up significant local storage.

Updates to script packages

If script packages need to be updated, they can be exported/imported using the patterns commands and filter file methods mentioned above. Scripts that are not yet part of any pattern can be transferred using a dummy pattern name.


Conclusion

This article described methodologies for applying updates to running web applications on IBM PureApplication System while helping maintain high availability. It described the kinds of updates that are commonly encountered during application lifecycle maintenance operations in a data center, and how PureApplication System can seamlessly apply those updates to help promote application high availability. Application updates, middleware updates, or changes to the underlying cloud artifacts like patterns, scripts, and images can be done using simple tooling and procedures that are repeatable and help eliminate potential for errors.

PureApplication System running in a multisite setup with the features of WebSphere cluster administration provides a robust platform for managing business critical applications in a high availability environment.

Acknowledgements

The authors would like to thank Jeff Coveyduc, Venkata Gadepalli, Peter Van Sickel, Shaun Murakami, Simeon D. Monov, and Manuel Silveyra for their contributions to the creation of this article.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Cloud computing on developerWorks


  • Bluemix Developers Community

    Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

  • developerWorks Labs

    Experiment with new directions in software development.

  • DevOps Services

    Software development in the cloud. Register today to create a project.

  • Try SoftLayer Cloud

    Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing, WebSphere
ArticleID=930971
ArticleTitle=Achieving high availability during operational maintenance using IBM PureApplication System
publish-date=05222013