[AIX][IBM i][Linux]

Verifying shared file system behavior on Multiplatforms

Run amqmfsck to check whether a shared file system on AIX®, Linux® , or IBM® i meets the requirements for storing the queue manager data of a multi-instance queue manager.

Before you begin

You need a server with networked storage, and two other servers connected to it that have IBM MQ installed. You must have administrator (root) authority to configure the file system, and be an IBM MQ Administrator to run amqmfsck.

About this task

Requirements for shared file systems on Multiplatforms describes the file system requirements for using a shared file system with multi-instance queue managers. The IBM MQ technote Testing statement for IBM MQ multi-instance queue manager file systems lists the shared file systems that IBM has already tested with. The procedure in this task describes how to test a file system to help you assess whether an unlisted file system maintains data integrity.

Failover of a multi-instance queue manager can be triggered by hardware or software failures, including networking problems which prevent the queue manager writing to its data or log files. Mainly, you are interested in causing failures on the file server. But you must also cause the IBM MQ servers to fail, to test any locks are successfully released. To be confident in a shared file system, test all of the following failures, and any other failures that are specific to your environment:

  1. Shutting down the operating system on the file server including syncing the disks.
  2. Halting the operating system on the file server without syncing the disks.
  3. Pressing the reset button on each of the servers.
  4. Pulling the network cable out of each of the servers.
  5. Pulling the power cable out of each of the servers.
  6. Switching off each of the servers.

Create the directory on the networked storage that you are going to use to share queue manager data and logs. The directory owner must be an IBM MQ Administrator, or in other words, a member of the mqm group on AIX and Linux. The user who runs the tests must have IBM MQ Administrator authority.

Use the example of exporting and mounting a file system in Create a multi-instance queue manager on Linux or Creating a multi-instance queue manager using journal mirroring and NetServer on IBM i to help you through configuring the file system. Different file systems require different configuration steps. Read the file system documentation.

Note: Run the IBM MQ MQI client sample program amqsfhac in parallel with amqmfsck to demonstrate that a queue manager maintains message integrity during a failure.

Procedure

In each of the checks, cause all the failures in the previous list while the file system checker is running. If you intend to run amqsfhac at the same time as amqmfsck, do the task, Running amqsfhac to test message integrity in parallel with this task.

  1. Mount the exported directory on the two IBM MQ servers.

    On the file system server create a shared directory shared, and a subdirectory to save the data for multi-instance queue managers, qmdata. For an example of setting up a shared directory for multi-instance queue managers on Linux, see Create a multi-instance queue manager on Linux

  2. Check basic file system behavior.
    On one IBM MQ server, run the file system checker with no parameters.
    On IBM MQ server 1:
    
    amqmfsck /shared/qmdata
    
  3. Check concurrently writing to the same directory from both IBM MQ servers.
    On both IBM MQ servers, run the file system checker at the same time with the -c option.
    On IBM MQ server 1:
    
    amqmfsck -c /shared/qmdata
    
    On IBM MQ server 2:
    
    amqmfsck -c /shared/qmdata
    
  4. Check waiting for and releasing locks on both IBM MQ servers.
    On both IBM MQ servers run the file system checker at the same time with the -w option.
    On IBM MQ server 1:
    
    amqmfsck -w /shared/qmdata
    
    On IBM MQ server 2:
    
    amqmfsck -w /shared/qmdata
    
  5. Check for data integrity.
    1. Format the test file.
      Create a large file in the directory being tested. The file is formatted so that the subsequent phases can complete successfully. The file must be large enough that there is sufficient time to interrupt the second phase to simulate the failover. Try the default value of 262144 pages (1 GB). The program automatically reduces this default on slow file systems so that formatting completes in about 60 seconds
      On IBM MQ server 1:
      
      amqmfsck -f /shared/qmdata
      
      The server responds with the following messages:
      
      Formatting test file for data integrity test.
      
      
      Test file formatted with 262144 pages of data.
      
    2. Write data into the test file using the file system checker while causing a failure.

      Run the test program on two servers at the same time. Start the test program on the server which is going to experience the failure, then start the test program on the server that is going to survive the failure. Cause the failure you are investigating.

      The first test program stops with an error message. The second test program obtains the lock on the test file and writes data into the test file starting where the first test program left off. Let the second test program run to completion.

      Table 1. Running the data integrity check on two servers at the same time
      IBM MQ server 1 IBM MQ server 2
      
      amqmfsck -a /shared/qmdata
      
       
      
      Please start this program on a second machine
      with the same parameters.
      
      
      File lock acquired.
      
      
      Start a second copy of this program
      with the same parameters on another server.
      
      
      
      Writing data into test file.
      
      
      
      To increase the effectiveness of the test,
      interrupt the writing by ending the process,
      temporarily breaking the network connection
      to the networked storage,
      rebooting the server or turning off the power.
      
      
      amqmfsck -a /shared/qmdata
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      Turn the power off here.
       
      
      File lock acquired.
      
      
      Reading test file
      
      
      Checking the integrity of the data read.
      
      
      Appending data into the test file
      after data already found.
      
      
      The test file is full of data.
      It is ready to be inspected for data integrity.
      

      The timing of the test depends on the behavior of the file system. For example, it typically takes 30 - 90 seconds for a file system to release the file locks obtained by the first program following a power outage. If you have too little time to introduce the failure before the first test program has filled the file, use the -x option of amqmfsck to delete the test file. Try the test from the start with a larger test file.

    3. Verify the integrity of the data in the test file.
      On IBM MQ server 2:
      
      amqmfsck -i /shared/qmdata
      
      The server responds with the following messages:
      
      File lock acquired
      
      
      Reading test file checking the integrity of the data read.
      
      
      The data read was consistent.
      
      
      The tests on the directory completed successfully.
      
  6. Delete the test files.
    On IBM MQ server 2:
    
    amqmfsck -x /shared/qmdata
    
    Test files deleted.
    

    The server responds with the message:

    
    Test files deleted.
    

Results

The program returns an exit code of zero if the tests complete successfully, and non-zero otherwise.

Examples

The first set of three examples shows the command producing minimal output.

Successful test of basic file locking on one server
> amqmfsck /shared/qmdata
The tests on the directory completed successfully.
Failed test of basic file locking on one server
> amqmfsck /shared/qmdata
AMQ6245: Error Calling 'write()[2]' on file '/shared/qmdata/amqmfsck.lck' error '2'.
Successful test of locking on two servers
Table 2. Successful locking on two servers
IBM MQ server 1 IBM MQ server 2
> amqmfsck -w /shared/qmdata
Please start this program on a second
machine with the same parameters.
Lock acquired.
Press Return
or terminate the program to release the lock.
 
 
> amqmfsck -w /shared/qmdata
Waiting for lock...

[ Return pressed ]
Lock released.
 
 

Lock acquired.
The tests on the directory completed successfully
The second set of three examples shows the same commands using verbose mode.
Successful test of basic file locking on one server
> amqmfsck -v /shared/qmdata
System call: stat("/shared/qmdata")'
System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fchmod(fd, 0666)
System call: fstat(fd)
System call: fcntl(fd, F_SETLK, F_WRLCK)
System call: write(fd)
System call: close(fd)
System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fcntl(fd, F_SETLK, F_WRLCK)
System call: close(fd)
System call: fd1 = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fcntl(fd1, F_SETLK, F_RDLCK)
System call: fd2 = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fcntl(fd2, F_SETLK, F_RDLCK)
System call: close(fd2)
System call: write(fd1)
System call: close(fd1)
The tests on the directory completed successfully.
Failed test of basic file locking on one server
> amqmfsck -v /shared/qmdata
System call: stat("/shared/qmdata")
System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fchmod(fd, 0666)
System call: fstat(fd)
System call: fcntl(fd, F_SETLK, F_WRLCK)
System call: write(fd)
System call: close(fd)
System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fcntl(fd, F_SETLK, F_WRLCK)
System call: close(fd)
System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fcntl(fd, F_SETLK, F_RDLCK)
System call: fdSameFile = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
System call: fcntl(fdSameFile, F_SETLK, F_RDLCK)
System call: close(fdSameFile)
System call: write(fd)
AMQxxxx: Error calling 'write()[2]' on file '/shared/qmdata/amqmfsck.lck', errno 2
(Permission denied).
Successful test of locking on two servers
Table 3. Successful locking on two servers - verbose mode
IBM MQ server 1 IBM MQ server 2
> amqmfsck -wv /shared/qmdata
Calling 'stat("/shared/qmdata")'
Calling 'fd = open("/shared/qmdata/amqmfsck.lkw",
O_EXCL | O_CREAT | O_RDWR, 0666)'
Calling 'fchmod(fd, 0666)'
Calling 'fstat(fd)'
Please start this program on a second
machine with the same parameters.
Calling 'fcntl(fd, F_SETLK, F_WRLCK)'
Lock acquired.
Press Return
or terminate the program to release the lock.
 
 
> amqmfsck -wv /shared/qmdata
Calling 'stat("/shared/qmdata")'
Calling 'fd = open("/shared/qmdata/amqmfsck.lkw",
O_EXCL | O_CREAT | O_RDWR,0666)'
Calling 'fd = open("/shared/qmdata/amqmfsck.lkw,
O_RDWR, 0666)'
Calling 'fcntl(fd, F_SETLK, F_WRLCK)
'Waiting for lock...

[ Return pressed ]
Calling 'close(fd)'
Lock released.
 
 

Calling 'fcntl(fd, F_SETLK, F_WRLCK)'
Lock acquired.
The tests on the directory completed successfully