Network Deployment is a J2EE and Web services Web application server with deployment services that include clustering edge services, and high availability for distributed configurations. Network Deployment provides a name service and security service in each application server to isolate the applications and application servers from administrative process failures. The result is that application servers and the applications running on them can continue to run uninterrupted in the event of an administrative process failure. However, if the Network Deployment Deployment Manager fails to run, then you might not have a means to administer your Network Deployment cell.
Before discussing how you can implement a highly available infrastructure on a low budget, the article first describes the administrative domain for Network Deployment, and the processes that comprise a domain, which is also referred to as a cell. Network Deployment provides a distributed administrative domain to administer multiple physical servers and application server processes. The Deployment Manager is the administrative process used to provide a centralized management view for all nodes in a cell, and to manage clusters and workload balancing of application servers across one or several nodes. A deployment manager hosts the administrative console; it provides a single, central point of administrative control for all elements of the entire WebSphere Application Server distributed cell. A node agent manages all WebSphere Application Server servers as well as JMS servers on a node. The following figure depicts a sample Network Deployment cell:
A Network Deployment cell
The above diagram shows that multiple servers, and therefore multiple node agents as well as multiple application servers, can run on these servers. However, as the central controller of administration and configuration for the administrative domain (or cell), the Deployment Manager represents a Single Point Of Failure within the cell. The Deployment Manager works with node agents to carry out all administrative and configuration tasks. It does not participate directly in distributing requests to cluster members; however, the application servers run-time depends on the data distributed by the Deployment Manager. When the Deployment Manager is unavailable, it impacts several items:
- the ability to make configuration changes
- the ability for changes to be propagated to the application servers (including the stopping and starting of application servers)
- the ability to have optimally up-to-date cluster information regarding server state, which can affect the performance of EJB WLM (Note: The Deployment Manager propagates changes in a Server Cluster run-time state to application servers running in the Server Cluster, which in turn is propagated to EJB clients. If the Deployment Manager fails, then the EJB clients could possibly be using old Server Cluster information. The extent of any performance impact depends on how many (if any) application servers in a Server Cluster are stopped after a Deployment Manager fails. As EJB clients attempt to route requests to the stopped servers, these requests will fail and the requests will be redirected to another application server in the Server Cluster. The performance is impacted as the EJB clients wait for the request(s) to the stopped application servers to time-out before the EJB client indicates that the application server is unavailable.)
While application servers will continue to run and respond to client requests in the event of a Node Agent or Deployment Manager failure, the cell cannot be administered effectively in such cases. You can choose to manually modify the local XML files that make up the configuration repository, and you can stop and start individual server processes by either running the
stopServer scripts or by using
wsadmin. However, this involves a great amount of manual effort and would likely be subject to errors. You therefore need to make some provision to allow the Deployment Manager to be highly available, so that even in the event of a catastrophic server failure, you would still be able to effectively administer your WebSphere Application Server cell.
Ideally, you would configure the Deployment Manager in a high availability cluster using the clustering software appropriate for operating systems, such as HACMP for AIX, SunCluster for Solaris, MC/Serviceguard for HP-UX, and MS Cluster for Windows 2000 and NT. Such a configuration would provide for automatic failover and recovery of the Deployment Manager, but would require additional hardware and software. The configuration of such a cluster is the recommended approach. However, this approach also requires a higher budget. Fortunately, for those of you working with a lower budget, you do have other options. While the approach described in this article does not provide for fully automatic failover and recovery, it is relatively inexpensive and is easy to implement. The steps consist of:
- Make regular backups of the cell configuration using the backupConfig script
- Install Network Deployment on a backup or alternate server
- Restore the configuration on a backup server using the restoreConfig script
- Change the IP address on the backup server to match the IP address of the original server
- Start Deployment Manager on the backup server
Make regular backups of the cell configuration using the backupConfig script
First, you must make backups of your cell on a regular basis (in case you were to experience an outage). WebSphere Application Server V5 provides a command line tool for this purpose:
backupConfig.sh/bat. This tool is located in the
bin directories for both WebSphere Application Server and the Network Deployment run-time. For the purposes of this article, which is interested in the cell configuration aspect, you should run the batch file provided with Network Deployment. The following figure depicts the execution of this script:
Running the backupConfig script
Notice that the default execution stops the Deployment Manager. While it is a good idea to do this (this prevents changes from being made while the backup is running), this action is not necessary. If you execute
backupConfig using the
-nostop option, the Deployment Manager will not be stopped. Once you have the backup, place a copy of it in a highly available file system; otherwise, a disk outage on your Deployment Manager server could make the file unavailable to you. As you can see,
backupConfig creates a file with the name
WebSphereConfig_date.zip. A single backup per day is sufficient for most production environments. But note that subsequent backups on a given day have a number appended to the file name, for example,
Before you continue to the next step, wait for a Deployment Manager failure and then remove the machine from the network. This will then free up its IP address.
Install Network Deployment on a backup or alternate server
Next, you must install Network Deployment on your backup server. The important part of this step is that you specify the Node Name, IP Address, and Cell Name from the original server. The installation takes place on the server
ojai, but it is now being used as a backup for the server
talcott. Using the Installation Wizard, you override the values that were populated by the installation with the ones for the original server (see the following figure):
Completing the Network Deployment Installation Wizard
You need to specify the values for the original server because JavaÂ remembers the IP resolution of a hostname. As a result, all of the running Node Agent JVMs have cached the IP address for the original server that the Deployment Manger was running on, and once a connection has been made, Java does not provide an API to clear this IP cache from the JVM. If Java did not have this behavior, or if there was an API to clear the cache, then you could use a DNS switching approach, where the DNS entry for the server is modified, instead of having to specify the original IP and names. As part of this process, you will also need to copy the keyrings from the
was deployment manager root/etc directory (for example,
/opt/WebSphere/DeploymentManager/etc) to the new machine. This assumes that you no longer use the dummy keyring shipped with the product. Using the dummy keyring for production purposes is not recommended since the private key contained in this keyring is the same in every copy of WebSphere Application Server; therefore, it is not private!
Restore the configuration on a backup server using the restoreConfig script
Once you have completed installing Network Deployment on the backup server, you are now ready to restore the cell configuration from the original server. Use the
restoreConfig.sh/bat script to do this. The following figure depicts the execution of this script:
Running the restoreConfig script
Although the Deployment Manager is not running in the example depicted above,
restoreConfig has a
nostop option that you can specify if the Deployment Manager were running.
Change the IP address on the backup server to match the IP address of the original server
Next, change the IP address on the backup server to match that of the original server or add a Network Interface Card (NIC) with the IP address of the original server. The steps to do this differ depending on the operating system used; you simply need to use the appropriate command or tool for your operating system.
Start Deployment Manager on the backup server
Lastly, start the Deployment Manager by running the
startManager script. When you see the
ADMU3000I: Server dmgr open for e-business; process id is xxxx message, as shown in the figure below, you are ready to administer your cell using the Administrative Console or
wsadmin. You can continue to do this until your original server is repaired.
Message indicating that the Deployment Manager has started
You can accomplish the steps relatively quickly at little cost. When walking through the steps in this article, the time required was as follows:
|Installing Network Deployment||10 minutes|
|Backing up the cell configuration||1 minute|
|Restoring the configuration||2 minutes|
|Changing the IP address||1 minute|
|Starting the Deployment Manager||1 minute|
|Total time (approximate)||15 minutes|
Of course, the amount of time required to perform these tasks in your environment will vary depending on your server CPU speed, network speed, and size of the configuration backup file. It is important that you perform this procedure repeatedly to ensure that you will have no surprises in the event of a serious outage.
If you follow the steps in this article, you will be prepared in the event of a prolonged administrative outage of a Network Deployment cell, which might occur from a CPU or disk failure. By simply backing up on a regular basis and ensuring that the backups are highly available, and scripting some of the process to ensure repeatability, you can guard against an extended outage. Best of all, you can use an existing server by changing its IP address or by adding an NIC to it, thereby minimizing the hardware and software required to accomplish these tasks. You can use this as leverage with your boss when you ask for a raise!
The author would like to thank Keys Botzum for his valuable comments and suggestions.