By its very nature, WebSphere Application Server Network Deployment is a distributed system ranging across many machines. While few things are more stressful and frustrating than an unplanned outage, there are ways you can lessen the impact. The goal of this article is to show how you can harness this system and make recovery a quick and simple task.
This article assumes you have a good understanding of WebSphere Application Server Network Deployment and some knowledge of network configuration.
For the purposes of this article, let's assume we have a cell with five nodes (Figure 1). The node count isn't significant, but the more nodes you have, the greater the impact when the deployment manager is lost.
Figure 1. Sample cell
This cell is spread across five different systems:
System 1: Node 1 (1 NodeAgent / 20 application servers)
Host: node1.samplecompany.com (10.10.0.1)System 2: Node 2 (1 NodeAgent / 20 application servers)
Host: node2.samplecompany.com (10.10.0.2)System 3: Node 3 (1 NodeAgent / 20 application servers)
Host: node3.samplecompany.com (10.10.0.3)System 4: Node 4 (1 NodeAgent / 20 application servers)
Host: node4.samplecompany.com (10.10.0.4)System 5: Deployment Manager node
Host: dmgr.samplecompany.com (10.10.0.5)
On each cluster, a single application is installed for a total of 20 applications. If this was a production cell, and the deployment manager's machine simply died and was completely unrecoverable, it would be very difficult to reconstitute this cell quickly. Your best bet would be to try and restore the deployment manager from some backup, assuming one was taken.
The deployment manager holds the configuration -- the master repository -- for the entire cell. If you look at the file system on the deployment manager, you will see all four nodes defined, as well as the deployment manager node. This configuration makes up the entire cell.
Listing 1. System 5
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/dmgrNode/... /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node1/... /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node2/... /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node3/... /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node4/... |
The application server nodes, however, do not need the entire cell configuration; rather, they need only the subset of the configuration that they themselves require. Each individual node has the entire configuration pertaining to its own node and just the serverindex.xml of the other nodes.
Listing 2. System 1
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node1/... /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node2/ serverindex.xml /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node3/ serverindex.xml /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/node4/ serverindex.xml /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/dmgrNode/ serverindex.xml |
Method 1: Set up the backup node
The plan here is to use an application server node as a backup. By replicating the entire configuration tree from the actual deployment manager node to another application server node, the deployment manager can be quickly restarted on the backup node. To begin the process, you first need an application server node. In a production environment, it is best not to use a node that serves applications, but rather to choose a new node dedicated as a backup.
We will first install WebSphere Application Server on a new system: System 6. When installing the product, install it into the same location on the file system as the deployment manager system. If you installed WebSphere Application Server in the /WebSphere/AppServer directory on the deployment manager system, make sure it is installed on the backup system in the same place.
Create a new Custom or Standalone profile. The node name we will give is
backupNode. Then, federate the new node into the existing cell.
Listing 3. System 6
/WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/backupNode/... |
System 6: Backup node
Host: backup.samplecompany.com (10.10.0.6)
This node's purpose is strictly for backup/failover, not for hosting any application servers. When the backup deployment manager is running, it will create its logs in the exact same place as does the deployment manager system. Even though /WebSphere/AppServer/profiles/Dmgr01/logs has not been created by the profile management tools, it will be created by the running server.
Right now, the backupNode is just like any other node; it only manages its own configuration and a small subset (serverindex.xml) for the other nodes. To set up this node as a backup, you need to automatically replicate the deployment manager's cell-wide configuration. To do this, you need to set up a custom property on the backupNode's nodeagent, either using the administrative console or by using a simple jython script.
To use the administrative console:
The custom property can be set on the administrative console (Figure 2) by navigating to System Administration => Node Agents => nodeagent => File Synchronization Service => Custom Properties => New.
Figure 2. Add custom property using administrative console
The new property to create is:
- Name:
recoveryNode - Value: true
- Description: (optional)
- Name:
Once this property is set, the nodeagent must be restarted. The next time the nodeagent is synchronized, it will extract the entire cell configuration, not just its own. Anytime a configuration change is made on the deployment manager and synchronized, this backupNode will also download the new change, regardless of the node the change was actually directed to.
You do not need to create any application servers on the backupNode. You only need the nodeagent up and running to keep the configuration current.
Alternatively, you can use a jython script to set the custom property. Below is a simple script that illustrates how to set the custom property on the backupNode's nodeagent. (Before running, change all occurrences of <node> in the script to the actual nodename.)
Listing 4. Jython script to set a custom property
#Script to create the recoveryNode custom property
#Step 1: Get a handle to the nodeagent
nodeagent = AdminConfig.getid('/Node:<node>/Server:nodeagent/')
print nodeagent
#Step 2: Get a handle to the ConfigSynchronizationService
syncservice = AdminConfig.list('ConfigSynchronizationService', nodeagent)
print syncservice
#Step 3: Create the custom property
AdminConfig.create("Property", syncservice, [["name", "recoveryNode"],["value", "true"]])
#Step 4: Save
AdminConfig.save |
C. Create a script to start the deployment manager
Now that you have the deployment manager configuration automatically replicated on the backupNode, you need a way to start it. You cannot use the standard StartServer.sh script, as the script makes some assumptions about running on the local node. StartServer [.sh | .bat]/ startManager [.sh | .bat] works by reading the configuration of the local node and starting a server on that node. If you try to run startManager [.sh | .bat] on the backupNode, it will not let you to start the deployment manager; the deployment manager does not have backupNode as its node name, but rather dmgrNode.
Since you cannot use the existing startManager script, you must create another script to start the server:
Run
./startNode.[sh | .bat] -script. This command will generate:start_nodeagent.sh(or.batif Windows). You must then edit the script to point to the deployment manager configuration rather than the nodeagent.Rename the script to
start_backup_dmgr.sh.Change the contents of the file; the key change is at the end of the file in the exec call:
Listing 5. Modify start_backup_dmgr.shexec "/build/websphere/WASX/u0622.06.wasx/AppServer/java/bin/java" ... "/build/websphere/WASX/u0622.06.wasx/AppServer/profiles/AppSrv01/config" "productionCell" "backupNode" "nodeagent" ""
Change the "backupNode" to
dmgrNodeand "nodeagent" todmgrto read:
Listing 6. Modify start_backup_dmgr.shexec "/build/websphere/WASX/u0622.06.wasx/AppServer/java/bin/java" ... "/build/websphere/WASX/u0622.06.wasx/AppServer/profiles/AppSrv01/config" "productionCell" "dmgrNode" "dmgr" ""
The output of your new start_backup_dmgr.sh is very different than the expected startManager [.sh | .bat] output. The backup deployment manager should only be used for operational control, since any changes made to the system are not reflected back to the original deployment manager.
A second option is use a file system that is shared between the primary deployment manager system and a backup system. You can set up a second system to share the deployment manager profile itself, and start the deployment manager on that second system. If you examine a V6.0.X or V6.1 installation, all the data pertaining to a deployment manager (or any profile for that matter) is stored under the WAS_HOME/profiles/<profile> directory.
Consider the earlier example where the deployment manager is installed on System 5 under the HFS: /WebSphere/AppServer/profiles/Dmgr01/config/cells/productionCell/nodes/dmgrNode/...
Again, you need to install WebSphere Application Server on System 6. However, this time you do not need to create any profiles, you can instead share the deployment manager's profile directory. By mounting the shared directory in exactly the same place on System 6, you have essentially duplicated the system. The duplication of the profile enables the same deployment manager to be started on the other system. Once this directory is shared, you can run startManager [.sh | .bat] from the backup system.
The downside to this method is the shared file system must be maintained by you. There can also be some considerable performance costs to using a shared file system, both on the master system and on the backup system. One major advantage of this system is that any changes made to the deployment manager's master repository while using the backup system are committed. Once the primary deployment manager machine is rebuilt, the changes will not be lost when the file system is remounted.
Configuring DNS for multi-homed hosts
Once you have found a way to start a deployment manager on a remote system, you still may have some issues to resolve. When you originally configured that deployment manager, a specific host that resolved to a specific IP was given. In our example, the hostname dmgr.samplecompany.com resolved to 10.0.0.5. To enable the deployment manager to be located on the backup system (10.0.0.6) we need to set up what's called a multi-home DNS entry. In our example, we are using multiple interfaces across multiple machines, each with a single IP address per machine. Here is an example of our DNS entry:
Listing 7. DNS entry
ADMIN:SYSTEM6:/u/ADMIN>nslookup dmgr.samplecompany.com Defaulting to nslookup version 4 Starting nslookup version 4 Server: dnsserver.samplecompany.com Address: 10.0.0.100 Name: dmgr.samplecompany.com Addresses: 10.0.0.5, 10.0.0.6 |
By configuring the DNS entry for two hosts, it enables an easy transition between them since no re-configuration is needed. As long as only one system on either host is running at any given time, only one deployment manager will be used. The node agents and clients will automatically try the first (primary) address and then failover to try the second (backup) address. Once a cell has been setup in this way, the deployment manager and all node agents must be restarted to re-read the configuration. By default, the JVM caches DNS entries and the restart is needed to refresh them.
This article discussed one way of building a more fault tolerant deployment manager using features that exist in WebSphere Application Server Network Deployment V6.x, and described how to use either a WebSphere Application Server configuration option or a shared file system to automatically replicate the entire cell-wide configuration to a backup system. This backup system then uses a preconfigured DNS entry to start up and replace the failed deployment manager. The loss of a deployment manager is a great blow to a production cell, and so it is critical that this server remain running and functional to ensure a highly available topology.
-
WebSphere Application Server Information Center
-
IBM developerWorks WebSphere application servers zone
-
Setting the recoveryNode property
-
NFSv4: General Information and References for the NFSv4 protocol
-
DNS Resources Directory




