IBM Support

Testing a shared file system for compatibility with IBM MQ Multi-instance Queue Managers

Question & Answer


Question

How can I test a shared file system for compatibility with IBM MQ multi-instance queue managers?

Cause

The testing statement for IBM MQ multi-instance queue manager file systems documents the requirements for a shared file system to be supported by IBM MQ multi-instance queue managers, as well as the shared filesystems already tested by IBM.  This document describes how to focus testing of a file system that was not listed, to give the best confidence that the system will keep integrity and work successfully.

Answer

IBM MQ supports multi-instance queue managers, which require a shared file system on network storage, such as a NAS or a cluster file system like IBM Spectrum Scale (formerly IBM General Parallel File System, or GPFS). In order to work reliably with IBM MQ, shared file systems must provide data write integrity, guaranteed exclusive access to files, and must release locks on failure. Further details are described in the testing statement for IBM MQ multi-instance queue manager file systems.

To gain confidence that a file system not listed by IBM will work successfully with IBM MQ multi-instance queue managers, follow this test plan:

Run amqmfsck with each command-line option

On distributed platforms other than Windows, IBM MQ provides a utility called amqmfsck which tests whether a file system is suitable for use with a multi-instance queue manager. amqmfsck must be run on the shared file system with different options. In each of the cases below, replace directoryName with the location that the shared file system is mounted. On IBM i you must run amqmfsck from the Qshell (STRQSH).

  1. Test the basic file system behavior (no command-line option). On one machine run:

    Testing basic file system behavior on Linux and UNIX

    amqmfsck directoryName

    Testing basic file system behavior in the IBM i Qshell

    /QSYS.LIB/QMQM.LIB/AMQMFSCK.PGM directoryName

  2. Test writing to a file concurrently (the -c option). On both machines at the same time run:

    Testing concurrent writes on Linux and UNIX

    amqmfsck -c directoryName

    Testing concurrent writes in the IBM i Qshell

    /QSYS.LIB/QMQM.LIB/AMQMFSCK.PGM -c directoryName

  3. Test waiting for and releasing file locks (the -w option). On both machines at the same time run:

    Testing waiting and releasing file locks on Linux and UNIX

    amqmfsck -w directoryName

    Testing waiting and releasing file locks in the IBM i Qshell

    /QSYS.LIB/QMQM.LIB/AMQMFSCK.PGM -w directoryName

Create a test multi-instance queue manager

These instructions create a queue manager called TESTQM with its data under /shared/data, logs under /shared/logs, and a listener on port 1414. They also create two queues called MQMI.TEST and MQMI.SIDE for the amqsfhac integrity checker program. You can use any queue manager name, data path, log path, listener port number, and queue names you wish if you modify the following commands accordingly.

  1. Create a multi-instance queue manager on one machine, specifying the queue manager's data directory and log directory on the shared file system to be tested. On IBM i, refer to the Knowledge Center link above:

    crtmqm -ld /shared/logs -md /shared/data TESTQM

  2. Add the new queue manager to the MQ configuration on the other machine. On IBM i, refer to the Knowledge Center link above:

    addmqinf -s QueueManager -v Name=TESTQM -v Directory=TESTQM -v Prefix=/var/mqm -v DataPath=/shared/data/TESTQM

  3. On both machines, start a queue manager instance. On IBM i use the STRMQM command:

    Starting the multi-instance queue manager on Linux and UNIX

    strmqm -x TESTQM

    Starting the multi-instance queue manager on IBM i

    STRMQM MQMNAME(TESTMQ) STANDBY(*YES)

  4. On both machines, display the queue manager status to see which is the active instance and which is the standby. On IBM i use the WRKMQM command:

    Display the queue manager status on Linux and UNIX

    dspmq -x

    Display the queue manager status on IBM i

    WRKMQM MQMNAME(TESTQM)

  5. On the machine with the active instance, define a listener which the queue manager can automatically start after fail over. On IBM i use the CRTMQMLSR command:

    Defining the listener on Linux and UNIX

    echo "DEFINE LISTENER(PORT.1414) TRPTYPE(TCP) PORT(1414) CONTROL(QMGR)" | runmqsc TESTQM

    Defining the listener on IBM i

    CRTMQMLSR LSRNAME(PORT.1414) MQMNAME(TESTMQ) PORT(1414) CONTROL(*QMGR)

  6. On the machine with the active instance, end the queue manager so it fails over to the standby instance. On IBM i use the ENDMQM command:

    Failing over the queue manager on Linux and UNIX

    endmqm -is TESTQM

    Failing over the queue manager on IBM i

    ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED) ALWSWITCH(*YES) RECONN(*YES)

  7. On both machines, display the queue manager status to confirm that the active instance shut down cleanly and the standby instance became active:

    Display the queue manager status on Linux and UNIX

    dspmq -x

    Display the queue manager status on IBM i

    WRKMQM MQMNAME(TESTQM)

  8. Restart the queue manager as a standby instance on the machine where you ended it:

    Starting the multi-instance queue manager on Linux and UNIX

    strmqm -x TESTQM

    Starting the multi-instance queue manager on IBM i

    STRMQM MQMNAME(TESTMQ) STANDBY(*YES)

  9. On the machine with the active instance, make sure the listener is running. On IBM i use the WRKMQMLSR command:

    Check the listener status on Linux and UNIX

    echo "DISPLAY LSSTATUS(PORT.1414)" | runmqsc TESTQM

    Check the listener status on IBM i

    WRKMQMLSR MQMNAME(TESTMQ)

  10. On the machine with the active instance, create the local queues used by the amqsfhac integrity checker program, using any names you wish. On IBM i use the CRTMQMQ command. The integrity checker program needs a main queue to put and get messages, and another for the side messages:

    Create the local queues on Linux and UNIX

    echo "DEFINE QLOCAL(MQMI.TEST) MAXDEPTH(10000)" | runmqsc TESTQM

    echo "DEFINE QLOCAL(MQMI.SIDE)" | runmqsc TESTQM

    Create the local queues on IBM i

    CRTMQMQ QNAME(MQMI.TEST) QTYPE(*LCL) MQMNAME(TESTMQ) MAXDEPTH(10000)

    CRTMQMQ QNAME(MQMI.SIDE) QTYPE(*LCL) MQMNAME(TESTMQ)

Run the amqsfhac integrity checker sample program during failovers

IBM provides an integrity checker sample program called amqsfhac along with its source code amqsfhac.c in the installable samples component, which is available on both client and server installations. These steps will use the amqsfhac sample program to test the integrity of your multi-instance queue manager file system in a variety of fail over scenarios caused by normal and exceptional conditions.

  1. On the client machine, run the amqsfhac sample program in the environment where you set the MQSERVER variable and point it at your test multi-instance queue manager and its two queues:

    amqsfhac TESTQM MQMI.TEST MQMI.SIDE uowSize iterations verbose

    • uowSize is the number of messages to put and get in a single transaction. It should range between 1 and the MAXUMSGS value of the queue manager (typically 10000).
    • iterations is the number of times the messages will be put and gotten. It should be a number large enough to keep the program busy while failing over the queue manager.
    • verbose controls the amount of output from the amqsfhac sample program:
      • 0 - No verbose output
      • 1 - Verbose output
      • 2 - Very verbose output

    For example: amqsfhac TESTQM MQMI.TEST MQMI.SIDE 1000 200 1

  2. While the amqsfhac program is running, make the queue manager fail over. The first time through these steps, fail the queue manager over normally by running this command on the active machine:

    Failing over the queue manager on Linux and UNIX

    endmqm -is TESTQM

    Failing over the queue manager on IBM i

    ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED) ALWSWITCH(*YES) RECONN(*YES)

  3. After the fail over is done and the amqsfhac program has completed, check the status of the queue manager on both machines to confirm it is active on only one:

    Display the queue manager status on Linux and UNIX

    dspmq -x

    Display the queue manager status on IBM i

    WRKMQM MQMNAME(TESTQM)

  4. Display the queues used by the amqsfhac program on the active machine to confirm they are both empty. On IBM i use the WRKMQMQ command

    Display the queue depths on Linux and UNIX

    echo "DISPLAY QLOCAL(MQMI.*) CURDEPTH" | runmqsc TESTQM

    Display the queue depths on IBM i

    WRKMQMQ QNAME(MQMI.*) QTYPE(*LCL) MQMNAME(TESTQM)

  5. Go to the queue manager errors directory and confirm that there are three error logs, and that all have the correct permissions, owner and group. Review the recent error log messages from the failover and look for any unexpected issues. Depending on why the queue manager failed over, some errors are to be expected.

    Display the queue manager error logs on Linux and UNIX

    ls -l /shared/data/TESTQM/errors/AMQ*.LOG

    -rw-rw---- 1 mqm mqm 109634 13 Sep 08:30 AMQERR01.LOG
    -rw-rw---- 1 mqm mqm 262355 12 Sep 20:16 AMQERR02.LOG
    -rw-rw---- 1 mqm mqm 262174 10 Sep 15:43 AMQERR03.LOG

    Display the queue manager error logs in the IBM i Qshell

    ls -l /shared/data/TESTQM/errors/AMQ*.LOG

    -rw-rw---- 1 QMQM QMQMADM 100278 17 May 13:33 AMQERR01.LOG
    -rw-rw---- 1 QMQM QMQMADM 262519 16 May 29:50 AMQERR02.LOG
    -rw-rw---- 1 QMQM QMQMADM 263144 15 Sep 07:12 AMQERR03.LOG

  6. Look at the recent FFSTs (AMQ*.FDC files) in the /var/mqm/errors directory (/QIBM/UserData/mqm/errors on IBM i) on both systems. On the active system, FFSTs showing Probe Id AO074001 and KN673000 are normal when a standby instance takes over the queue manager. Depending on why the queue manager failed over, other FFSTs might be expected.

  7. Restart the queue manager as a standby instance on the machine where it is not running:

    Starting the multi-instance queue manager on Linux and UNIX

    strmqm -x TESTQM

    Starting the multi-instance queue manager on IBM i

    STRMQM MQMNAME(TESTMQ) STANDBY(*YES)

  8. Repeat these steps for additional fail over causes. Multi-instance fail over can be triggered by hardware and software failures, including network problems which prevent the queue manager from writing to its data or log files. To be confident that a shared file system will provide integrity and work with a multi-instance queue manager when a problem occurs unexpectedly, test all possible failure scenarios multiple times, including:

    • Shutting down the operating system including syncing the disks
    • Halting the operating system without syncing the disks
    • Physically pressing the server's reset button
    • Physically pulling the network cable out of the server (test this at least five times)
    • Physically pulling the power cable out of the server
    • Physically switching the machine off
    • Any other failover causes appropriate to your environment or system

Delete the test multi-instance queue manager

If you used an existing queue manager to test your file system, delete the TCP/IP listener and delete the two local queues used by the amqsfhac program. On IBM i, use the DLTMQMLSR and DLTMQMQ commands. If you wish to delete the queue manager, follow these instructions:

  1. Stop the test queue manager on the active system without allowing it to fail over:

    Stopping the multi-instance queue manager on Linux and UNIX

    endmqm -i TESTQM

    Stopping the multi-instance queue manager on IBM i

    ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED)

  2. If there is a standby instance running on the other system, stop it as well:

    Stopping the standby instance on Linux and UNIX

    endmqm -ix TESTQM

    Stopping the standby instance on IBM i

    ENDMQM MQMNAME(TESTMQ) OPTION(*IMMED) INSTANCE(*STANDBY)

  3. On one of the machines, delete the queue manager. On IBM i, refer to the Knowledge Center link above:

    dltmqm TESTQM

  4. On the other machine, remove the queue manager configuration. On IBM i, refer to the Knowledge Center link above:

    rmvmqinf -s QueueManager TESTQM

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"ARM Category":[{"code":"a8m0z00000008L1AAI","label":"Components and Features->High Availability (HA)->File System Requirements"}],"Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF012","label":"IBM i"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"9.1;9.0;8.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}},{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSFKSJ","label":"WebSphere MQ"},"ARM Category":[{"code":"","label":""}],"Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF012","label":"IBM i"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"7.5;7.1;7.0.1","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
04 June 2020

UID

ibm16117868