Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Make a quicker and easier recovery from an unplanned deployment manager failover

Rohith Ashok (rashok@us.ibm.com), Software Architect, IBM
Rohith Ashok is a Software Engineer at IBM Research Triangle Park in Raleigh, North Carolina. As an architect, he works with the WebSphere z/OS Systems Management team to drive the evolution of WebSphere Application Server.

Summary:  Few things are more stressful and frustrating than an unplanned outage. This is especially true when the unplanned outage is the deployment manager of a large Network Deployment cell. This article walks through examples of some features available in IBM® WebSphere® Application Server Network Deployment to help you prepare for an easy recovery from such an outage.

Date:  22 Nov 2006
Level:  Intermediate
Also available in:   Chinese  Russian

Activity:  6689 views
Comments:  

Introduction

By its very nature, WebSphere Application Server Network Deployment is a distributed system ranging across many machines. While few things are more stressful and frustrating than an unplanned outage, there are ways you can lessen the impact. The goal of this article is to show how you can harness this system and make recovery a quick and simple task.

This article assumes you have a good understanding of WebSphere Application Server Network Deployment and some knowledge of network configuration.


Sample cell

For the purposes of this article, let's assume we have a cell with five nodes (Figure 1). The node count isn't significant, but the more nodes you have, the greater the impact when the deployment manager is lost.


Figure 1. Sample cell
Figure 1. Sample cell

This cell is spread across five different systems:

  • System 1: Node 1 (1 NodeAgent / 20 application servers)
    Host: node1.samplecompany.com (10.10.0.1)

  • System 2: Node 2 (1 NodeAgent / 20 application servers)
    Host: node2.samplecompany.com (10.10.0.2)

  • System 3: Node 3 (1 NodeAgent / 20 application servers)
    Host: node3.samplecompany.com (10.10.0.3)

  • System 4: Node 4 (1 NodeAgent / 20 application servers)
    Host: node4.samplecompany.com (10.10.0.4)

  • System 5: Deployment Manager node
    Host: dmgr.samplecompany.com (10.10.0.5)

On each cluster, a single application is installed for a total of 20 applications. If this was a production cell, and the deployment manager's machine simply died and was completely unrecoverable, it would be very difficult to reconstitute this cell quickly. Your best bet would be to try and restore the deployment manager from some backup, assuming one was taken.

The deployment manager holds the configuration -- the master repository -- for the entire cell. If you look at the file system on the deployment manager, you will see all four nodes defined, as well as the deployment manager node. This configuration makes up the entire cell.


Listing 1. System 5
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/dmgrNode/...
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node1/...
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node2/...
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node3/...
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node4/...

The application server nodes, however, do not need the entire cell configuration; rather, they need only the subset of the configuration that they themselves require. Each individual node has the entire configuration pertaining to its own node and just the serverindex.xml of the other nodes.


Listing 2. System 1
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node1/...
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node2/
   serverindex.xml
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node3/
   serverindex.xml
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node4/
   serverindex.xml
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/dmgrNode/
   serverindex.xml


Method 1: Set up the backup node

The plan here is to use an application server node as a backup. By replicating the entire configuration tree from the actual deployment manager node to another application server node, the deployment manager can be quickly restarted on the backup node. To begin the process, you first need an application server node. In a production environment, it is best not to use a node that serves applications, but rather to choose a new node dedicated as a backup.

A. Create backup node

  1. We will first install WebSphere Application Server on a new system: System 6. When installing the product, install it into the same location on the file system as the deployment manager system. If you installed WebSphere Application Server in the /WebSphere/AppServer directory on the deployment manager system, make sure it is installed on the backup system in the same place.

  2. Create a new Custom or Standalone profile. The node name we will give is backupNode. Then, federate the new node into the existing cell.


Listing 3. System 6
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/backupNode/...

  • System 6: Backup node
    Host: backup.samplecompany.com (10.10.0.6)

This node's purpose is strictly for backup/failover, not for hosting any application servers. When the backup deployment manager is running, it will create its logs in the exact same place as does the deployment manager system. Even though /WebSphere/AppServer/profiles/Dmgr01/logs has not been created by the profile management tools, it will be created by the running server.

B. Configure backup node

Right now, the backupNode is just like any other node; it only manages its own configuration and a small subset (serverindex.xml) for the other nodes. To set up this node as a backup, you need to automatically replicate the deployment manager's cell-wide configuration. To do this, you need to set up a custom property on the backupNode's nodeagent, either using the administrative console or by using a simple jython script.

To use the administrative console:

  1. The custom property can be set on the administrative console (Figure 2) by navigating to System Administration => Node Agents => nodeagent => File Synchronization Service => Custom Properties => New.



    Figure 2. Add custom property using administrative console
    Figure 2. Add custom property using administrative console

    The new property to create is:

    • Name: recoveryNode
    • Value: true
    • Description: (optional)
  2. Once this property is set, the nodeagent must be restarted. The next time the nodeagent is synchronized, it will extract the entire cell configuration, not just its own. Anytime a configuration change is made on the deployment manager and synchronized, this backupNode will also download the new change, regardless of the node the change was actually directed to.

  3. You do not need to create any application servers on the backupNode. You only need the nodeagent up and running to keep the configuration current.

Alternatively, you can use a jython script to set the custom property. Below is a simple script that illustrates how to set the custom property on the backupNode's nodeagent. (Before running, change all occurrences of <node> in the script to the actual nodename.)


Listing 4. Jython script to set a custom property
#Script to create the recoveryNode custom property

#Step 1: Get a handle to the nodeagent
nodeagent = AdminConfig.getid('/Node:<node>/Server:nodeagent/')
print nodeagent

#Step 2: Get a handle to the ConfigSynchronizationService
syncservice = AdminConfig.list('ConfigSynchronizationService', nodeagent)
print syncservice

#Step 3: Create the custom property
AdminConfig.create("Property", syncservice, [["name", "recoveryNode"],["value", "true"]])

#Step 4: Save
AdminConfig.save

C. Create a script to start the deployment manager

Now that you have the deployment manager configuration automatically replicated on the backupNode, you need a way to start it. You cannot use the standard StartServer.sh script, as the script makes some assumptions about running on the local node. StartServer [.sh | .bat]/ startManager [.sh | .bat] works by reading the configuration of the local node and starting a server on that node. If you try to run startManager [.sh | .bat] on the backupNode, it will not let you to start the deployment manager; the deployment manager does not have backupNode as its node name, but rather dmgrNode.

Since you cannot use the existing startManager script, you must create another script to start the server:

  1. Run ./startNode.[sh | .bat] -script. This command will generate: start_nodeagent.sh (or .bat if Windows). You must then edit the script to point to the deployment manager configuration rather than the nodeagent.

  2. Rename the script to start_backup_dmgr.sh.

  3. Change the contents of the file; the key change is at the end of the file in the exec call:

    Listing 5. Modify start_backup_dmgr.sh

    exec "/build/websphere/WASX/u0622.06.wasx/AppServer/java/bin/java"
    ...
    "/build/websphere/WASX/u0622.06.wasx/AppServer/profiles/AppSrv01/config"
    "productionCell" "backupNode" "nodeagent" ""
    

  4. Change the "backupNode" to dmgrNode and "nodeagent" to dmgr to read:

    Listing 6. Modify start_backup_dmgr.sh

    exec "/build/websphere/WASX/u0622.06.wasx/AppServer/java/bin/java"
    ...
    "/build/websphere/WASX/u0622.06.wasx/AppServer/profiles/AppSrv01/config"
       "productionCell" "dmgrNode" "dmgr" ""

The output of your new start_backup_dmgr.sh is very different than the expected startManager [.sh | .bat] output. The backup deployment manager should only be used for operational control, since any changes made to the system are not reflected back to the original deployment manager.


Method 2: Shared file system

A second option is use a file system that is shared between the primary deployment manager system and a backup system. You can set up a second system to share the deployment manager profile itself, and start the deployment manager on that second system. If you examine a V6.0.X or V6.1 installation, all the data pertaining to a deployment manager (or any profile for that matter) is stored under the WAS_HOME/profiles/<profile> directory.

Consider the earlier example where the deployment manager is installed on System 5 under the HFS: /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/dmgrNode/...

Again, you need to install WebSphere Application Server on System 6. However, this time you do not need to create any profiles, you can instead share the deployment manager's profile directory. By mounting the shared directory in exactly the same place on System 6, you have essentially duplicated the system. The duplication of the profile enables the same deployment manager to be started on the other system. Once this directory is shared, you can run startManager [.sh | .bat] from the backup system.

The downside to this method is the shared file system must be maintained by you. There can also be some considerable performance costs to using a shared file system, both on the master system and on the backup system. One major advantage of this system is that any changes made to the deployment manager's master repository while using the backup system are committed. Once the primary deployment manager machine is rebuilt, the changes will not be lost when the file system is remounted.


Configuring DNS for multi-homed hosts

Once you have found a way to start a deployment manager on a remote system, you still may have some issues to resolve. When you originally configured that deployment manager, a specific host that resolved to a specific IP was given. In our example, the hostname dmgr.samplecompany.com resolved to 10.0.0.5. To enable the deployment manager to be located on the backup system (10.0.0.6) we need to set up what's called a multi-home DNS entry. In our example, we are using multiple interfaces across multiple machines, each with a single IP address per machine. Here is an example of our DNS entry:


Listing 7. DNS entry
ADMIN:SYSTEM6:/u/ADMIN>nslookup dmgr.samplecompany.com
Defaulting to nslookup version 4
Starting nslookup version 4
Server:  dnsserver.samplecompany.com
Address:  10.0.0.100

Name:    dmgr.samplecompany.com
Addresses:  10.0.0.5, 10.0.0.6

By configuring the DNS entry for two hosts, it enables an easy transition between them since no re-configuration is needed. As long as only one system on either host is running at any given time, only one deployment manager will be used. The node agents and clients will automatically try the first (primary) address and then failover to try the second (backup) address. Once a cell has been setup in this way, the deployment manager and all node agents must be restarted to re-read the configuration. By default, the JVM caches DNS entries and the restart is needed to refresh them.


Conclusion

This article discussed one way of building a more fault tolerant deployment manager using features that exist in WebSphere Application Server Network Deployment V6.x, and described how to use either a WebSphere Application Server configuration option or a shared file system to automatically replicate the entire cell-wide configuration to a backup system. This backup system then uses a preconfigured DNS entry to start up and replace the failed deployment manager. The loss of a deployment manager is a great blow to a production cell, and so it is critical that this server remain running and functional to ensure a highly available topology.


Resources

About the author

Rohith Ashok is a Software Engineer at IBM Research Triangle Park in Raleigh, North Carolina. As an architect, he works with the WebSphere z/OS Systems Management team to drive the evolution of WebSphere Application Server.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=175240
ArticleTitle=Make a quicker and easier recovery from an unplanned deployment manager failover
publish-date=11222006
author1-email=rashok@us.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers