Verifying shared file system behavior on Multiplatforms
Run amqmfsck to check whether a shared file system on UNIX and IBM® i systems meets the requirements for storing the queue manager data of a multi-instance queue manager. Run the IBM MQ MQI client sample program amqsfhac in parallel with amqmfsck to demonstrate that a queue manager maintains message integrity during a failure. (The only requirement for a Windows configuration is that it uses SMB 3 for shared storage provision.)
Before you begin
About this task
Requirements for shared file systems on Multiplatforms describes the file system requirements for using a shared file system with multi-instance queue managers. The IBM MQ technote Testing and support statement for IBM MQ multi-instance queue managers lists the shared file systems that IBM has already tested with. The procedure in this task describes how to test a file system to help you assess whether an unlisted file system maintains data integrity.
Failover of a multi-instance queue manager can be triggered by hardware or software failures, including networking problems which prevent the queue manager writing to its data or log files. Mainly, you are interested in causing failures on the file server. But you must also cause the IBM MQ servers to fail, to test any locks are successfully released. To be confident in a shared file system, test all of the following failures, and any other failures that are specific to your environment:
- Shutting down the operating system on the file server including syncing the disks.
- Halting the operating system on the file server without syncing the disks.
- Pressing the reset button on each of the servers.
- Pulling the network cable out of each of the servers.
- Pulling the power cable out of each of the servers.
- Switching off each of the servers.
Create the directory on the networked storage that you are going to use to share queue manager data and logs. The directory owner must be an IBM MQ Administrator, or in other words, a member of the mqm group on UNIX. The user who runs the tests must have IBM MQ Administrator authority.
Use the example of exporting and mounting a file system in Create a multi-instance queue manager on Linux® or Mirrored journal configuration on an ASP using ADDMQMJRN to help you through configuring the file system. Different file systems require different configuration steps. Read the file system documentation.
Procedure
In each of the checks, cause all the failures in the previous list while the file system checker is running. If you intend to run amqsfhac at the same time as amqmfsck, do the task, Running amqsfhac to test message integrity in parallel with this task.
Results
Examples
The first set of three examples shows the command producing minimal output.
- Successful test of basic file locking on one server
-
> amqmfsck /shared/qmdata The tests on the directory completed successfully.
- Failed test of basic file locking on one server
-
> amqmfsck /shared/qmdata AMQ6245: Error Calling 'write()[2]' on file '/shared/qmdata/amqmfsck.lck' error '2'.
- Successful test of locking on two servers
-
Table 2. Successful locking on two servers IBM MQ server 1 IBM MQ server 2 > amqmfsck -w /shared/qmdata Please start this program on a second machine with the same parameters. Lock acquired. Press Return or terminate the program to release the lock.
> amqmfsck -w /shared/qmdata Waiting for lock...
[ Return pressed ] Lock released.
Lock acquired. The tests on the directory completed successfully
- Successful test of basic file locking on one server
-
> amqmfsck -v /shared/qmdata System call: stat("/shared/qmdata")' System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fchmod(fd, 0666) System call: fstat(fd) System call: fcntl(fd, F_SETLK, F_WRLCK) System call: write(fd) System call: close(fd) System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fcntl(fd, F_SETLK, F_WRLCK) System call: close(fd) System call: fd1 = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fcntl(fd1, F_SETLK, F_RDLCK) System call: fd2 = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fcntl(fd2, F_SETLK, F_RDLCK) System call: close(fd2) System call: write(fd1) System call: close(fd1) The tests on the directory completed successfully.
- Failed test of basic file locking on one server
-
> amqmfsck -v /shared/qmdata System call: stat("/shared/qmdata") System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fchmod(fd, 0666) System call: fstat(fd) System call: fcntl(fd, F_SETLK, F_WRLCK) System call: write(fd) System call: close(fd) System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fcntl(fd, F_SETLK, F_WRLCK) System call: close(fd) System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fcntl(fd, F_SETLK, F_RDLCK) System call: fdSameFile = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666) System call: fcntl(fdSameFile, F_SETLK, F_RDLCK) System call: close(fdSameFile) System call: write(fd) AMQxxxx: Error calling 'write()[2]' on file '/shared/qmdata/amqmfsck.lck', errno 2 (Permission denied).
- Successful test of locking on two servers
-
Table 3. Successful locking on two servers - verbose mode IBM MQ server 1 IBM MQ server 2 > amqmfsck -wv /shared/qmdata Calling 'stat("/shared/qmdata")' Calling 'fd = open("/shared/qmdata/amqmfsck.lkw", O_EXCL | O_CREAT | O_RDWR, 0666)' Calling 'fchmod(fd, 0666)' Calling 'fstat(fd)' Please start this program on a second machine with the same parameters. Calling 'fcntl(fd, F_SETLK, F_WRLCK)' Lock acquired. Press Return or terminate the program to release the lock.
> amqmfsck -wv /shared/qmdata Calling 'stat("/shared/qmdata")' Calling 'fd = open("/shared/qmdata/amqmfsck.lkw", O_EXCL | O_CREAT | O_RDWR,0666)' Calling 'fd = open("/shared/qmdata/amqmfsck.lkw, O_RDWR, 0666)' Calling 'fcntl(fd, F_SETLK, F_WRLCK) 'Waiting for lock...
[ Return pressed ] Calling 'close(fd)' Lock released.
Calling 'fcntl(fd, F_SETLK, F_WRLCK)' Lock acquired. The tests on the directory completed successfully
Running amqsfhac to test message integrity
amqsfhac checks that a queue manager using networked storage maintains data integrity following a failure.
Before you begin
You require four servers for this test. Two servers for the multi-instance queue manager, one for the file system, and one for running amqsfhac as an IBM MQ MQI client application.
Follow step 1 in Procedure to set up the file system for a multi-instance queue manager.
About this task
Procedure
Results
An example of running amqsfhac in step 6 is shown in Figure 9. The test is a success.
If the test detected a problem, the output would report the failure. In some test runs
MQRC_CALL_INTERRUPTED
might report
.
It makes no difference to the result. The outcome depends on whether the write to disk was committed
by the networked file storage before or after the failure took place. Resolving to backed out