Question & Answer
Question
Cause
Answer
IBM MQ supports multi-instance queue managers, which require a shared file system on network storage, such as a NAS or a cluster file system like IBM Spectrum Scale (formerly IBM General Parallel File System, or GPFS). In order to work reliably with IBM MQ, shared file systems must provide data write integrity, guaranteed exclusive access to files, and must release locks on failure. Further details are described in the testing statement for IBM MQ multi-instance queue manager file systems.
To gain confidence that a file system not listed by IBM will work successfully with IBM MQ multi-instance queue managers, follow this test plan:
- Run amqmfsck with each command-line option
Run the amqmfsck utility with each command-line option to verify the file system compatibility with multi-instance queue managers.
- Create a test multi-instance queue manager
Create a multi-instance queue manager to use for further verification testing.
- Run the amqsfhac integrity checker during failovers
Run the amqsfhac integrity checker sample program while failing over the queue manager in various ways to ensure no messages are lost or corrupted.
- Delete the test multi-instance queue manager
Delete the test multi-instance queue manager once testing is complete.
Run amqmfsck with each command-line option
On distributed platforms other than Windows, IBM MQ provides a utility called amqmfsck which tests whether a file system is suitable for use with a multi-instance queue manager. amqmfsck must be run on the shared file system with different options. In each of the cases below, replace directoryName with the location that the shared file system is mounted. On IBM i you must run amqmfsck from the Qshell (STRQSH).
-
Test the basic file system behavior (no command-line option). On one machine run:
Testing basic file system behavior on Linux and UNIX
amqmfsck directoryName
Testing basic file system behavior in the IBM i Qshell
/QSYS.LIB/QMQM.LIB/AMQMFSCK.PGM directoryName
-
Test writing to a file concurrently (the -c option). On both machines at the same time run:
Testing concurrent writes on Linux and UNIX
amqmfsck -c directoryName
Testing concurrent writes in the IBM i Qshell
/QSYS.LIB/QMQM.LIB/AMQMFSCK.PGM -c directoryName
-
Test waiting for and releasing file locks (the -w option). On both machines at the same time run:
Testing waiting and releasing file locks on Linux and UNIX
amqmfsck -w directoryName
Testing waiting and releasing file locks in the IBM i Qshell
/QSYS.LIB/QMQM.LIB/AMQMFSCK.PGM -w directoryName
Create a test multi-instance queue manager
These instructions create a queue manager called TESTQM with its data under /shared/data, logs under /shared/logs, and a listener on port 1414. They also create two queues called MQMI.TEST and MQMI.SIDE for the amqsfhac integrity checker program. You can use any queue manager name, data path, log path, listener port number, and queue names you wish if you modify the following commands accordingly.
- Creating multi-instance queue managers on IBM i
On IBM i systems you must use alternate instructions to create multi-instance queue managers. Refer to this IBM Knowledge Center link to determine the correct IBM i and IBM MQ commands and options to create the multi-instance queue manager and add it to the configuration of the second machine.
-
Create a multi-instance queue manager on one machine, specifying the queue manager's data directory and log directory on the shared file system to be tested. On IBM i, refer to the Knowledge Center link above:
crtmqm -ld /shared/logs -md /shared/data TESTQM
-
Add the new queue manager to the MQ configuration on the other machine. On IBM i, refer to the Knowledge Center link above:
addmqinf -s QueueManager -v Name=TESTQM -v Directory=TESTQM -v Prefix=/var/mqm -v DataPath=/shared/data/TESTQM
-
On both machines, start a queue manager instance. On IBM i use the STRMQM command:
Starting the multi-instance queue manager on Linux and UNIX
strmqm -x TESTQM
Starting the multi-instance queue manager on IBM i
STRMQM MQMNAME(TESTMQ) STANDBY(*YES)
-
On both machines, display the queue manager status to see which is the active instance and which is the standby. On IBM i use the WRKMQM command:
Display the queue manager status on Linux and UNIX
dspmq -x
Display the queue manager status on IBM i
WRKMQM MQMNAME(TESTQM)
-
On the machine with the active instance, define a listener which the queue manager can automatically start after fail over. On IBM i use the CRTMQMLSR command:
Defining the listener on Linux and UNIX
echo "DEFINE LISTENER(PORT.1414) TRPTYPE(TCP) PORT(1414) CONTROL(QMGR)" | runmqsc TESTQM
Defining the listener on IBM i
CRTMQMLSR LSRNAME(PORT.1414) MQMNAME(TESTMQ) PORT(1414) CONTROL(*QMGR)
-
On the machine with the active instance, end the queue manager so it fails over to the standby instance. On IBM i use the ENDMQM command:
Failing over the queue manager on Linux and UNIX
endmqm -is TESTQM
Failing over the queue manager on IBM i
ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED) ALWSWITCH(*YES) RECONN(*YES)
-
On both machines, display the queue manager status to confirm that the active instance shut down cleanly and the standby instance became active:
Display the queue manager status on Linux and UNIX
dspmq -x
Display the queue manager status on IBM i
WRKMQM MQMNAME(TESTQM)
-
Restart the queue manager as a standby instance on the machine where you ended it:
Starting the multi-instance queue manager on Linux and UNIX
strmqm -x TESTQM
Starting the multi-instance queue manager on IBM i
STRMQM MQMNAME(TESTMQ) STANDBY(*YES)
-
On the machine with the active instance, make sure the listener is running. On IBM i use the WRKMQMLSR command:
Check the listener status on Linux and UNIX
echo "DISPLAY LSSTATUS(PORT.1414)" | runmqsc TESTQM
Check the listener status on IBM i
WRKMQMLSR MQMNAME(TESTMQ)
-
On the machine with the active instance, create the local queues used by the amqsfhac integrity checker program, using any names you wish. On IBM i use the CRTMQMQ command. The integrity checker program needs a main queue to put and get messages, and another for the side messages:
Create the local queues on Linux and UNIX
echo "DEFINE QLOCAL(MQMI.TEST) MAXDEPTH(10000)" | runmqsc TESTQM
echo "DEFINE QLOCAL(MQMI.SIDE)" | runmqsc TESTQM
Create the local queues on IBM i
CRTMQMQ QNAME(MQMI.TEST) QTYPE(*LCL) MQMNAME(TESTMQ) MAXDEPTH(10000)
CRTMQMQ QNAME(MQMI.SIDE) QTYPE(*LCL) MQMNAME(TESTMQ)
Run the amqsfhac integrity checker sample program during failovers
IBM provides an integrity checker sample program called amqsfhac along with its source code amqsfhac.c in the installable samples component, which is available on both client and server installations. These steps will use the amqsfhac sample program to test the integrity of your multi-instance queue manager file system in a variety of fail over scenarios caused by normal and exceptional conditions.
-
On the client machine, run the amqsfhac sample program in the environment where you set the MQSERVER variable and point it at your test multi-instance queue manager and its two queues:
amqsfhac TESTQM MQMI.TEST MQMI.SIDE uowSize iterations verbose
- uowSize is the number of messages to put and get in a single transaction. It should range between 1 and the MAXUMSGS value of the queue manager (typically 10000).
- iterations is the number of times the messages will be put and gotten. It should be a number large enough to keep the program busy while failing over the queue manager.
- verbose controls the amount of output from the amqsfhac sample program:
- 0 - No verbose output
- 1 - Verbose output
- 2 - Very verbose output
For example: amqsfhac TESTQM MQMI.TEST MQMI.SIDE 1000 200 1
-
While the amqsfhac program is running, make the queue manager fail over. The first time through these steps, fail the queue manager over normally by running this command on the active machine:
Failing over the queue manager on Linux and UNIX
endmqm -is TESTQM
Failing over the queue manager on IBM i
ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED) ALWSWITCH(*YES) RECONN(*YES)
-
After the fail over is done and the amqsfhac program has completed, check the status of the queue manager on both machines to confirm it is active on only one:
Display the queue manager status on Linux and UNIX
dspmq -x
Display the queue manager status on IBM i
WRKMQM MQMNAME(TESTQM)
-
Display the queues used by the amqsfhac program on the active machine to confirm they are both empty. On IBM i use the WRKMQMQ command
Display the queue depths on Linux and UNIX
echo "DISPLAY QLOCAL(MQMI.*) CURDEPTH" | runmqsc TESTQM
Display the queue depths on IBM i
WRKMQMQ QNAME(MQMI.*) QTYPE(*LCL) MQMNAME(TESTQM)
-
Go to the queue manager errors directory and confirm that there are three error logs, and that all have the correct permissions, owner and group. Review the recent error log messages from the failover and look for any unexpected issues. Depending on why the queue manager failed over, some errors are to be expected.
Display the queue manager error logs on Linux and UNIX
ls -l /shared/data/TESTQM/errors/AMQ*.LOG
-rw-rw---- 1 mqm mqm 109634 13 Sep 08:30 AMQERR01.LOG
-rw-rw---- 1 mqm mqm 262355 12 Sep 20:16 AMQERR02.LOG
-rw-rw---- 1 mqm mqm 262174 10 Sep 15:43 AMQERR03.LOGDisplay the queue manager error logs in the IBM i Qshell
ls -l /shared/data/TESTQM/errors/AMQ*.LOG
-rw-rw---- 1 QMQM QMQMADM 100278 17 May 13:33 AMQERR01.LOG
-rw-rw---- 1 QMQM QMQMADM 262519 16 May 29:50 AMQERR02.LOG
-rw-rw---- 1 QMQM QMQMADM 263144 15 Sep 07:12 AMQERR03.LOG -
Look at the recent FFSTs (AMQ*.FDC files) in the /var/mqm/errors directory (/QIBM/UserData/mqm/errors on IBM i) on both systems. On the active system, FFSTs showing Probe Id AO074001 and KN673000 are normal when a standby instance takes over the queue manager. Depending on why the queue manager failed over, other FFSTs might be expected.
-
Restart the queue manager as a standby instance on the machine where it is not running:
Starting the multi-instance queue manager on Linux and UNIX
strmqm -x TESTQM
Starting the multi-instance queue manager on IBM i
STRMQM MQMNAME(TESTMQ) STANDBY(*YES)
-
Repeat these steps for additional fail over causes. Multi-instance fail over can be triggered by hardware and software failures, including network problems which prevent the queue manager from writing to its data or log files. To be confident that a shared file system will provide integrity and work with a multi-instance queue manager when a problem occurs unexpectedly, test all possible failure scenarios multiple times, including:
- Shutting down the operating system including syncing the disks
- Halting the operating system without syncing the disks
- Physically pressing the server's reset button
- Physically pulling the network cable out of the server (test this at least five times)
- Physically pulling the power cable out of the server
- Physically switching the machine off
- Any other failover causes appropriate to your environment or system
Delete the test multi-instance queue manager
If you used an existing queue manager to test your file system, delete the TCP/IP listener and delete the two local queues used by the amqsfhac program. On IBM i, use the DLTMQMLSR and DLTMQMQ commands. If you wish to delete the queue manager, follow these instructions:
- Deleting multi-instance queue managers on IBM i
On IBM i systems you must use alternate instructions to delete multi-instance queue managers. Follow the instructions below to stop the test queue manager, then refer to the IBM Knowledge Center link to determine the correct IBM i and IBM MQ commands and options to delete the queue manager and its configuration information.
-
Stop the test queue manager on the active system without allowing it to fail over:
Stopping the multi-instance queue manager on Linux and UNIX
endmqm -i TESTQM
Stopping the multi-instance queue manager on IBM i
ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED)
-
If there is a standby instance running on the other system, stop it as well:
Stopping the standby instance on Linux and UNIX
endmqm -ix TESTQM
Stopping the standby instance on IBM i
ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED) INSTANCE(*STANDBY)
-
On one of the machines, delete the queue manager. On IBM i, refer to the Knowledge Center link above:
dltmqm TESTQM
-
On the other machine, remove the queue manager configuration. On IBM i, refer to the Knowledge Center link above:
rmvmqinf -s QueueManager TESTQM
Was this topic helpful?
Document Information
Modified date:
04 June 2020
UID
ibm16117868