Maintaining maximum system uptime is becoming increasingly critical to the success of on demand computing. WebSphere MQ is an important piece of messaging middleware to help an enterprise accelerate the transformation into an on-demand business. Architects need to consider the importance of how to include WebSphere MQ in a highly available configuration. This article describes how to achieve high availability for WebSphere MQ using clustering provided at the hardware level using open source software.
WebSphere MQ provides asynchronous message and queuing capabilities with assured, once-only delivery of messages. By using WebSphere MQ and heartbeat together, it is possible to further enhance the availability of WebSphere MQ queue managers.
In Part 1 of this series, I introduced you to high availability (HA) concepts and how to install and configure heartbeat. In this article, I will discuss the HA implementation for WebSphere MQ in a cold standby configuration using heartbeat. In this implementation, heartbeat detects that there is a problem with the primary. This could be hardware or software problem. The standby machine will:
- Take over the IP address.
- Take over the shared disks that store the queue and log files of the queue manager.
- Start the queue manager and associated processes.
To get the most out of this article, you should have a basic understanding of WebSphere MQ and high availability clusters. You should also be familiar with the first article in this series, High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server.
Implementing HA for WebSphere MQ
A queue manager that is to be used in a heartbeat cluster needs to have its logs and data on shared disks so that they can be accessed by a surviving node in the event of a node failure. A node running a queue manager must also maintain a number of files on internal disks. These files include files that relate to all queue managers on the node, such as /var/mqm/mqs.ini, and queue manager-specific files that are used to generate internal control information. Files related to a queue manager are therefore divided between internal and shared disks.
Regarding the queue manager files that are stored on shared disk, it is possible to use a single shared disk for all the recovery data (logs and data) related to a queue manager. However, for optimal performance in a production environment, it is a recommended practice to place logs and data in separate filesystems such that they can be separately tuned for disk I/O.
Figure 1 shows the organization of the filesystem for our setup. The links shown were created automatically using shell scripts, which will be explained below.
Figure 1. Filesystem organization for the queue manager - ha.queue.manager
In the sections to follow, I will take you through the steps of installing WebSphere MQ and creating and testing a highly available queue manager configuration.
Follow the steps outlined in this section to install WebSphere MQ 5.3.0.2 and Fixpack 7 on both the primary and backup nodes. For more information, you can refer to the WebSphere MQ for Linux for Intel and Linux for zSeries Quick Beginnings document (see Resources for a link):
- Extract the WebSphere MQ 5.3.0.2 RPMs using the following commands:
rm -rf /tmp/mq5.3.0.2-install
mkdir /tmp/mq5.3.0.2-install
tar xzf C48UBML.tar.gz -C /tmp/mq5.3.0.2-install
tar xf /tmp/mq5.3.0.2-install/MQ53Server_LinuxIntel.tar -C /tmp/mq5.3.0.2-installHere, C48UBML.tar.gz is the installation image file for WebSphere MQ.
- Set the kernel level:
export LD_ASSUME_KERNEL=2.4.19 - Replace the Java runtime environment (JRE) that comes with WebSphere MQ with the IBM 1.4.2 JDK JRE:
mv /tmp/mq5.3.0.2-install/lap/jre /tmp/mq5.3.0.2-install/lap/jre.mq
ln -s /opt/IBMJava2-142/jre /tmp/mq5.3.0.2-install/lap/jre - Accept the license:
/tmp/mq5.3.0.2-install/mqlicense.sh -accept -text_only - Install the WebSphere MQ RPMs:
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesRuntime-5.3.0-2.i386.rpm
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesSDK-5.3.0-2.i386.rpm
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesServer-5.3.0-2.i386.rpm
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesClient-5.3.0-2.i386.rpm
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesSamples-5.3.0-2.i386.rpm
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesJava-5.3.0-2.i386.rpm
rpm -Uvh /tmp/mq5.3.0.2-install/MQSeriesMan-5.3.0-2.i386.rpm - Clean up:
rm -rf /tmp/mq5.3.0.2-install/ - Extract the fixpack 7 RPMs:
rm -rf /tmp/mq5.3.0.7-install/
mkdir /tmp/mq5.3.0.7-install/
tar xzf U496732.nogskit.tar.gz -C /tmp/mq5.3.0.7-install/ - Install the fixpack 7 RPMs:
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesRuntime-U496732-5.3.0-7.i386.rpm
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesSDK-U496732-5.3.0-7.i386.rpm
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesServer-U496732-5.3.0-7.i386.rpm
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesClient-U496732-5.3.0-7.i386.rpm
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesSamples-U496732-5.3.0-7.i386.rpm
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesJava-U496732-5.3.0-7.i386.rpm
rpm -Uvh /tmp/mq5.3.0.7-install/MQSeriesMan-U496732-5.3.0-7.i386.rpm - Clean up again:
rm -rf /tmp/mq5.3.0.7-install/
Create a highly available MQ manager and queue
On some platforms, the creation of a highly available queue manager is automated by scripts in WebSphere MQ HA Support Packs such as MC63 and IC61. However, these support packs are not available for Linux.
The scripts used in this section are modified versions of the scripts in the MC63 support pack and have the following limitations:
- One filesystem only for log and data (/ha).
- One queue manager at a time.
Follow the steps outlined below to create a highly available queue manager, ha.queue.manager:
- Create the following directories on the shared disk (/ha):
- /ha/ha.queue.manager
- /ha/ha.queue.manager/data
- /ha/ha.queue.manager/log
- On the primary node (ha1) create a highly available queue manager using the command shown below (as root):
/ha/hacode/mq/hascripts/hacrtmqm ha.queue.managerThe
hacrtmqmcommand will create the queue manager and will ensure that its directories are arranged to allow for HA operation. The source code for thehacrtmqmscript is included with this article (see Downloads for a link). - Add the following two lines to the .bashrc (startup script) file on both of the nodes for the
mqmuser.LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL - Run the
setmqcapcommand, inputting the number of processors that you have paid for. Run this command on ha1:/opt/mqm/bin/setmqcap 4 - Start the queue manager, ha.queue.manager, using the
strmqmcommand as usermqm./opt/mqm/bin/strmqm ha.queue.manager - Enable MQSC commands by typing:
/opt/mqm/bin/runmqsc ha.queue.managerA message tells you that an MQSC session has started. MQSC has no command prompt.
- Create a local queue, HA.QUEUE, by entering the following command:
define qlocal (HA.QUEUE) - Create a channel, HA.CHANNEL, by entering the following command:
define channel(HA.CHANNEL) chltype(svrconn) trptype(tcp) mcauser('mqm') - Stop MQSC by typing
end. Some messages are displayed, and the command prompt is displayed again. - Stop the queue manager, ha.queue.manager, manually, using the
endmqmcommand:/opt/mqm/bin/endmqm ha.queue.manager - On the backup node (ha2), create the queue manager as user
mqm. Use the command shown below, but all on one line. You may have to mount /ha as root:cd /ha/hacode/mq/hascripts/
./halinkmqm ha.queue.manager ha\!queue\!manager
/ha/ha.queue.manager/data standbyInternally,
hacrtmqmuses a script calledhalinkmqmto relink the subdirectories used for IPC keys and create a symlink from /var/mqm/qmgrs/$qmgr to the /ha/$qmgr/data/qmgrs/$qmgr directory. Do not runhalinkmqmon the node on which you created the queue manager with hacrtmqm -- it has already been run there. The source code for thehalinkmqmscript is included with this article (in the file accessible from Downloads). - Run the
setmqcapcommand, inputting the number of processors that you have paid for:/opt/mqm/bin/setmqcap 4 - Start the queue manager,
ha.queue.manager, using thestrmqmcommand, on the backup node. Make sure it starts. - Stop the queue manager on the backup node.
Configure heartbeat to manage WebSphere MQ Server
The steps needed to configure heartbeat to manage the MQ server are outlined below:
As mentioned before, resources that are managed by heartbeat are basically just start/stop scripts. You'll create a script to start and stop the WebSphere MQ queue manager and any associated processes. A very basic script is shown in Listing 1. You can further customize it to suit your setup. This script has to be placed in the /etc/rc.d/init.d directory.
Listing 1. mqseries script#!/bin/bash # # /etc/rc.d/init.d/mqseries # # Starts the MQ Server # # chkconfig: 345 88 57 # description: Runs MQ . /etc/init.d/functions # Source function library. PATH=/usr/bin:/bin:/opt/mqm/bin QMGRS="ha.queue.manager" PS=/bin/ps GREP=/bin/grep SED=/bin/sed #====================================================================== SU="sh" if [ "`whoami`" = "root" ]; then SU="su - mqm" fi #====================================================================== killproc() { # kill the named process(es) pid=`$PS -e | $GREP -w $1 | $SED -e 's/^ *//' -e 's/ .*//'` [ "$pid" != "" ] && kill -9 $pid } #====================================================================== start() { for qmgr in $QMGRS ; do export qmgr echo "$0: starting $qmgr" $SU -c "strmqm $qmgr" $SU -c "nohup runmqlsr -m $qmgr -t tcp -p 1414 >> /dev/null 2&t;&1 < /dev/null &" done } #====================================================================== stop() { for qmgr in $QMGRS ; do export qmgr echo ending $qmgr killproc runmqlsr $SU -c "endmqm -w $qmgr &" sleep 30 done } case $1 in 'start') start ;; 'stop') stop ;; 'restart') stop start ;; *) echo "usage: $0 {start|stop|restart}" ;; esac- Now you'll configure the /etc/ha.d/haresources file (on both nodes) to include the above
mqseriesscript, like so:ha1.haw2.ibm.com 9.22.7.46
Filesystem::hanfs.haw2.ibm.com:/ha::/ha::nfs::rw,hard mqseriesThis dictates that on startup of heartbeat it will have ha1 serve the cluster IP address, mount the shared filesystem /ha, and then start WebSphere MQ processes. On shutdown, heartbeat will first stop WebSphere MQ processes, then unmount the filesystem, and, finally, give up the IP address.
This section outlines the steps needed to test the high availability of the queue manager, ha.queue.manager.
- Start the heartbeat service on the primary and then on the backup node:
/etc/rc.d/init.d/heartbeat startIf it fails, look in /var/log/messages to determine the reason and then correct it. After heartbeat starts successfully, you should see a new interface with the IP address that you configured in the ha.cf file. You can display it by running the
ifconfigcommand. Listing 2 shows the relevant portion of the output for your setup:
Listing 2. Interface for cluster IP address... eth0:0 Link encap:Ethernet HWaddr 00:D0:59:DA:01:50 inet addr:9.22.7.46 Bcast:9.22.7.127 Mask:255.255.255.128 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:76541 errors:0 dropped:0 overruns:0 frame:0 TX packets:61411 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:8830515 (8.4 Mb) TX bytes:6755709 (6.4 Mb) Interrupt:11 Base address:0x6400 Memory:c0200000-c0200038 ...
Once you've started heartbeat, take a peek at your log file (default is /var/log/ha-log) before testing it. If everything is peachy, the primary machine's log (ha1 in this example) should look something like the one shown in Listing 3 (some lines have been wrapped for layout purposes).
Listing 3. Contents of the ha-log file... heartbeat: 2004/09/01_11:17:13 info: ************************** heartbeat: 2004/09/01_11:17:13 info: Configuration validated. Starting heartbeat 1.2.2 heartbeat: 2004/09/01_11:17:13 info: heartbeat: version 1.2.2 heartbeat: 2004/09/01_11:17:13 info: Heartbeat generation: 10 heartbeat: 2004/09/01_11:17:13 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud) heartbeat: 2004/09/01_11:17:13 info: ping heartbeat started. heartbeat: 2004/09/01_11:17:13 info: pid 9226 locked in memory. heartbeat: 2004/09/01_11:17:13 info: Local status now set to: 'up' heartbeat: 2004/09/01_11:17:14 info: pid 9229 locked in memory. heartbeat: 2004/09/01_11:17:14 info: pid 9230 locked in memory. heartbeat: 2004/09/01_11:17:14 info: pid 9231 locked in memory. heartbeat: 2004/09/01_11:17:14 info: pid 9232 locked in memory. heartbeat: 2004/09/01_11:17:14 info: pid 9233 locked in memory. heartbeat: 2004/09/01_11:17:14 info: Link 9.22.7.1:9.22.7.1 up. heartbeat: 2004/09/01_11:17:14 info: Status update for node 9.22.7.1: status ping ... heartbeat: 2004/09/01_11:19:18 info: Acquiring resource group: ha1.haw2.ibm.com 9.22.7.46 mqseries heartbeat: 2004/09/01_11:19:18 info: Running /etc/ha.d/resource.d/IPaddr 9.22.7.46 start heartbeat: 2004/09/01_11:19:18 info: /sbin/ifconfig eth0:0 9.22.7.46 netmask 255.255.255.128 broadcast 9.22.7.127 heartbeat: 2004/09/01_11:19:18 info: Sending Gratuitous Arp for 9.22.7.46 on eth0:0 [eth0] ... heartbeat: 2004/09/01_11:19:49 info: Running /etc/init.d/mqseries start ...
You can see that it is doing the IP takeover and then starting WebSphere MQ processes. Use the
pscommand to make sure WebSphere MQ is running on the primary node. Put a few persistent messages on the BGQUEUE. You can do this by running the MQ Sender program, send.bat or send.sh (based on your OS). You should run this program from a machine that has the MQ Client installed. Listing 4 shows the output of the run on the node ha1.
Listing 4. Putting persistent messages on the HA queue[root@ha1 mq]# ./send.sh MSender is running Hostname = ha.haw2.ibm.com QManager = ha.queue.manager Channel Name = HA.CHANNEL Channel Port = 1414 Q = HA.QUEUE Enter a message: Hello This is a test Done Sending Message [root@ha1 mq]#
Browse and get the messages. Use the MQ Browse program, receive.bat or receive.sh (based on your OS). You will fetch all the messages put onto the queue before, except the last one, "test." You will get the last message after failover has happened. Listing 5 shows the output of the run on the node ha1.
Listing 5. Getting persistent messages from the HA queue[root@ha1 mq]# ./receive.sh MBrowse is running Hostname = ha.haw2.ibm.com QManager = ha.queue.manager Channel Name = HA.CHANNEL Channel Port = 1414 Q = HA.QUEUE Browsed message: Hello Actually get message?y Actually getting the message Browsed message: This Actually get message?y Actually getting the message Browsed message: is Actually get message?y Actually getting the message Browsed message: a Actually get message?y Actually getting the message Browsed message: test Actually get message?n MQJE001: Completion Code 2, Reason 2033 MQ exception: CC = 2 RC = 2033
Ignore the MQ Exception, with reason code 2033, at the end. It occurs because there are no more messages to get from the queue.
- Simulate failover. This can be done simply by stopping heartbeat on the primary system using the command:
/etc/rc.d/init.d/heartbeat stopYou should see all the services come up on the second machine in less than a minute. If you do not, look in /var/log/messages to determine the problem and correct it. You can fail back over to the primary by starting heartbeat again. Heartbeat will always give preference to the primary system and will start to run there if possible. Make sure WebSphere MQ is running by checking the /var/log/ha-log file and the
pscommand on the backup machine. Browse and get the last message. Run the MQ Browse program, receive.bat or receive.sh (based on your OS). You will get the last message this time. Listing 6 shows the output of the run on the node ha2.
Listing 6. Getting persistent messages from the HA queue[root@ha2 mq]# ./receive.sh MBrowse is running Hostname = ha.haw2.ibm.com QManager = ha.queue.manager Channel Name = HA.CHANNEL Channel Port = 1414 Q = HA.QUEUE Browsed message: test Actually get message?y Actually getting the message MQJE001: Completion Code 2, Reason 2033 MQ exception: CC = 2 RC = 2033
- Start the heartbeat service back on the primary. This should stop the WebSphere MQ server processes on the secondary and start them on the primary. The primary should also take over the cluster IP.
Now you see how by using a shared disk, the messages put on a queue before a failover can be recovered afterwards.
In this installment, you have seen how to implement high availability for WebSphere MQ using open source software on the Linux operating system. In the next installment we will discuss the HA implementation of IBM LoadLeveler scheduler.
The author wishes to thank Mike Burr for providing technical guidance for the installation of WebSphere MQ.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code package for this article | hahbcode.tar.gz | 25 KB | HTTP |
Information about download methods
- Read the other articles in this series:
- "High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server"
- "High-availability middleware on Linux, Part 3: IBM LoadLeveler"
- "High-availability middleware on Linux, Part 4: IBM WebSphere Application Server"
- Check out the High-Availability Linux project Web site for more information on heartbeat, including heartbeat success stories.
- You can download most of the software needed for this series of articles at these locations (note that not all of the downloads are free):
- Red Hat Enterprise Linux 3.0 (2.4.21-15.EL)
- Heartbeat 1.2.3
- IBM Java™ 2 SDK 1.4.2
- IBM WebSphere® MQ for Linux 5.3.0.2 with Fix Pack 7
- IBM WebSphere Base Edition 5.1.1 for Linux with Cumulative Fix 1
- IBM DB2® Universal Database™ Enterprise Server Edition V8.1 for Linux
- You'll find a host of WebSphere MQ-related technical documents, including Redbooks, case studies, and white papers, in the developerWorks WebSphere MQ Library.
- You'll find a list of WebSphere MQ family books on the WebSphere MQ documentation page, including a great guide to getting started with WebSphere MQ, WebSphere MQ for Linux for Intel and Linux for zSeries Quick Beginnings.
- For information on configuring WebSphere MQ for high availability on AIX, read MC63: WebSphere MQ for AIX -- Implementing with HACMP.
- WebSphere Business Integration Message Broker and high availability environments shows how to achieve high availability through a combination of software design and clustering (developerWorks, March 2004).
- Learn about the features in DB2 Universal Database that provide high-availability capabilities in "An Overview of High Availability and Disaster Recovery for DB2 UDB" (developerWorks, April 2003).
- For a detailed discussion of
availability and how to plan for and maintain it in an enterprise middleware environment, read "Planning for Availability in the Enterprise" (developerWorks, December 2003).
- Get more information on load balancing and failover support for Linux on POWER in the article
"Creating a WebSphere Application Server V5 cluster" (developerWorks, January 2004).
- Find more resources for Linux developers in the developerWorks Linux zone.
- Get involved in the developerWorks community by participating in
developerWorks blogs.
- Browse for books on these and other technical topics.
- Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- Innovate your next Linux development project with IBM trial software, available for download directly from developerWorks.
Hidayatullah H. Shaikh is a Senior Software Engineer on the IBM T.J. Watson Research Center's On-Demand Architecture and Development Team. His areas of interest and expertise include business process modeling and integration, service-oriented architecture, grid computing, e-commerce, enterprise Java, database management systems, and high-availability clusters. You can contact Hidayatullah at hshaikh@us.ibm.com.




