Implementing a Highly Available Infrastructure for WebSphere Application Server Network Deployment, Version 5.0 without Clustering

This article discusses how to implement a highly available infrastructure for the Deployment Manager component of WebSphere Application Server Network Deployment, Version 5.0. The article describes how to do this without having to purchase multiple servers and hardware clustering software for each component in the infrastructure.

Share:

Tom Alcott, Consulting IT Specialist, WebSphere World Wide Technical Sales Support, IBM

Tom Alcott is an advisory I/T specialist with IBM U.S. He has been a member of the WorldWide WebSphere Technical Sales Support team since its inception. Before he started working with WebSphere, he worked as a systems engineer for IBM's Transarc Lab supporting TXSeries. His background includes over 20 years of application design and development on both mainframe-based and distributed systems. He has written and presented extensively on WebSphere run-time and security issues..



21 July 2003

Introduction

Network Deployment is a J2EE and Web services Web application server with deployment services that include clustering edge services, and high availability for distributed configurations. Network Deployment provides a name service and security service in each application server to isolate the applications and application servers from administrative process failures. The result is that application servers and the applications running on them can continue to run uninterrupted in the event of an administrative process failure. However, if the Network Deployment Deployment Manager fails to run, then you might not have a means to administer your Network Deployment cell.

Before discussing how you can implement a highly available infrastructure on a low budget, the article first describes the administrative domain for Network Deployment, and the processes that comprise a domain, which is also referred to as a cell. Network Deployment provides a distributed administrative domain to administer multiple physical servers and application server processes. The Deployment Manager is the administrative process used to provide a centralized management view for all nodes in a cell, and to manage clusters and workload balancing of application servers across one or several nodes. A deployment manager hosts the administrative console; it provides a single, central point of administrative control for all elements of the entire WebSphere Application Server distributed cell. A node agent manages all WebSphere Application Server servers as well as JMS servers on a node. The following figure depicts a sample Network Deployment cell:

A Network Deployment cell
Diagram showing a Network Deployment cell

The above diagram shows that multiple servers, and therefore multiple node agents as well as multiple application servers, can run on these servers. However, as the central controller of administration and configuration for the administrative domain (or cell), the Deployment Manager represents a Single Point Of Failure within the cell. The Deployment Manager works with node agents to carry out all administrative and configuration tasks. It does not participate directly in distributing requests to cluster members; however, the application servers run-time depends on the data distributed by the Deployment Manager. When the Deployment Manager is unavailable, it impacts several items:

  • the ability to make configuration changes
  • the ability for changes to be propagated to the application servers (including the stopping and starting of application servers)
  • the ability to have optimally up-to-date cluster information regarding server state, which can affect the performance of EJB WLM (Note: The Deployment Manager propagates changes in a Server Cluster run-time state to application servers running in the Server Cluster, which in turn is propagated to EJB clients. If the Deployment Manager fails, then the EJB clients could possibly be using old Server Cluster information. The extent of any performance impact depends on how many (if any) application servers in a Server Cluster are stopped after a Deployment Manager fails. As EJB clients attempt to route requests to the stopped servers, these requests will fail and the requests will be redirected to another application server in the Server Cluster. The performance is impacted as the EJB clients wait for the request(s) to the stopped application servers to time-out before the EJB client indicates that the application server is unavailable.)

While application servers will continue to run and respond to client requests in the event of a Node Agent or Deployment Manager failure, the cell cannot be administered effectively in such cases. You can choose to manually modify the local XML files that make up the configuration repository, and you can stop and start individual server processes by either running the startServer and stopServer scripts or by using wsadmin. However, this involves a great amount of manual effort and would likely be subject to errors. You therefore need to make some provision to allow the Deployment Manager to be highly available, so that even in the event of a catastrophic server failure, you would still be able to effectively administer your WebSphere Application Server cell.

Ideally, you would configure the Deployment Manager in a high availability cluster using the clustering software appropriate for operating systems, such as HACMP for AIX, SunCluster for Solaris, MC/Serviceguard for HP-UX, and MS Cluster for Windows 2000 and NT. Such a configuration would provide for automatic failover and recovery of the Deployment Manager, but would require additional hardware and software. The configuration of such a cluster is the recommended approach. However, this approach also requires a higher budget. Fortunately, for those of you working with a lower budget, you do have other options. While the approach described in this article does not provide for fully automatic failover and recovery, it is relatively inexpensive and is easy to implement. The steps consist of:

  1. Make regular backups of the cell configuration using the backupConfig script
  2. Install Network Deployment on a backup or alternate server
  3. Restore the configuration on a backup server using the restoreConfig script
  4. Change the IP address on the backup server to match the IP address of the original server
  5. Start Deployment Manager on the backup server

Make regular backups of the cell configuration using the backupConfig script

First, you must make backups of your cell on a regular basis (in case you were to experience an outage). WebSphere Application Server V5 provides a command line tool for this purpose: backupConfig.sh/bat. This tool is located in the bin directories for both WebSphere Application Server and the Network Deployment run-time. For the purposes of this article, which is interested in the cell configuration aspect, you should run the batch file provided with Network Deployment. The following figure depicts the execution of this script:

Running the backupConfig script
Running the backupConfig script

Notice that the default execution stops the Deployment Manager. While it is a good idea to do this (this prevents changes from being made while the backup is running), this action is not necessary. If you execute backupConfig using the -nostop option, the Deployment Manager will not be stopped. Once you have the backup, place a copy of it in a highly available file system; otherwise, a disk outage on your Deployment Manager server could make the file unavailable to you. As you can see, backupConfig creates a file with the name WebSphereConfig_date.zip. A single backup per day is sufficient for most production environments. But note that subsequent backups on a given day have a number appended to the file name, for example, WebSphereConfig_2003-01-17_1.zip.

Before you continue to the next step, wait for a Deployment Manager failure and then remove the machine from the network. This will then free up its IP address.


Install Network Deployment on a backup or alternate server

Next, you must install Network Deployment on your backup server. The important part of this step is that you specify the Node Name, IP Address, and Cell Name from the original server. The installation takes place on the server ojai, but it is now being used as a backup for the server talcott. Using the Installation Wizard, you override the values that were populated by the installation with the ones for the original server (see the following figure):

Completing the Network Deployment Installation Wizard
Screen capture of the Network Deployment Installation Wizard

You need to specify the values for the original server because Java™ remembers the IP resolution of a hostname. As a result, all of the running Node Agent JVMs have cached the IP address for the original server that the Deployment Manger was running on, and once a connection has been made, Java does not provide an API to clear this IP cache from the JVM. If Java did not have this behavior, or if there was an API to clear the cache, then you could use a DNS switching approach, where the DNS entry for the server is modified, instead of having to specify the original IP and names. As part of this process, you will also need to copy the keyrings from the was deployment manager root/etc directory (for example, /opt/WebSphere/DeploymentManager/etc) to the new machine. This assumes that you no longer use the dummy keyring shipped with the product. Using the dummy keyring for production purposes is not recommended since the private key contained in this keyring is the same in every copy of WebSphere Application Server; therefore, it is not private!


Restore the configuration on a backup server using the restoreConfig script

Once you have completed installing Network Deployment on the backup server, you are now ready to restore the cell configuration from the original server. Use the restoreConfig.sh/bat script to do this. The following figure depicts the execution of this script:

Running the restoreConfig script
Running the restoreConfig script

Although the Deployment Manager is not running in the example depicted above, restoreConfig has a nostop option that you can specify if the Deployment Manager were running.


Change the IP address on the backup server to match the IP address of the original server

Next, change the IP address on the backup server to match that of the original server or add a Network Interface Card (NIC) with the IP address of the original server. The steps to do this differ depending on the operating system used; you simply need to use the appropriate command or tool for your operating system.


Start Deployment Manager on the backup server

Lastly, start the Deployment Manager by running the startManager script. When you see the ADMU3000I: Server dmgr open for e-business; process id is xxxx message, as shown in the figure below, you are ready to administer your cell using the Administrative Console or wsadmin. You can continue to do this until your original server is repaired.

Message indicating that the Deployment Manager has started
Message indicating that the Deployment Manager has started

Finishing up

You can accomplish the steps relatively quickly at little cost. When walking through the steps in this article, the time required was as follows:

TaskTime
Installing Network Deployment10 minutes
Backing up the cell configuration1 minute
Restoring the configuration2 minutes
Changing the IP address1 minute
Starting the Deployment Manager1 minute
Total time (approximate) 15 minutes

Of course, the amount of time required to perform these tasks in your environment will vary depending on your server CPU speed, network speed, and size of the configuration backup file. It is important that you perform this procedure repeatedly to ensure that you will have no surprises in the event of a serious outage.


Conclusion

If you follow the steps in this article, you will be prepared in the event of a prolonged administrative outage of a Network Deployment cell, which might occur from a CPU or disk failure. By simply backing up on a regular basis and ensuring that the backups are highly available, and scripting some of the process to ensure repeatability, you can guard against an extended outage. Best of all, you can use an existing server by changing its IP address or by adding an NIC to it, thereby minimizing the hardware and software required to accomplish these tasks. You can use this as leverage with your boss when you ask for a raise!


Acknowledgments

The author would like to thank Keys Botzum for his valuable comments and suggestions.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=13215
ArticleTitle=Implementing a Highly Available Infrastructure for WebSphere Application Server Network Deployment, Version 5.0 without Clustering
publish-date=07212003