Troubleshooting
Problem
IBM MQ provides guidance on the functional capabilities for a file system and the testing a user should complete in their own environment. If a file system has been verified by the lab, or known issues identified, these are documented here: https://www.ibm.com/support/pages/testing-statement-ibm-mq-multi-instance-queue-manager-file-systems
This provides functional verification, but users often need to understand the performance characteristics and if any bottlenecks occur when scaling. The recommended approach is for users to complete a comprehensive performance test of their own environment to verify that it produces the desired behavior, and a stress test to understand the characteristics in extreme situations.
In the case of AWS EFS (https://aws.amazon.com/efs/), we have identified aspects that are important to consider and might affect the scalability and suitability of using EFS in general, depending on your IBM MQ usage pattern:
- EFS Storage Class: there are various options AWS provide to define the throughput of EFS storage. IBM MQ is sensitive to the performance of the file system, especially in the case of persistent messaging. AWS also documents upper limits associated with the total throughput possible from an NFS client which could impact some scenarios. For the latest information consult https://docs.aws.amazon.com/efs/latest/ug/limits.html#limits-client-specific
- IBM MQ file lock verification: IBM MQ multi-instance queue managers periodically check the validity of locks they hold. By default if the lock health cannot be confirmed within 20 seconds, this is considered as a trigger condition for a failover to the standby instance. The AWS EFS support team recommended that all applications using EFS set filesystem IO timeouts to at least 30 seconds. This requires setting FileLockHeartBeatLen=15 (or higher) in the qm.ini tuning parameters section to achieve a timeout of 30 seconds for the lock verify thread, instead of the default 20 seconds. For further details on this tuning parameter consult https://www.ibm.com/docs/en/ibm-mq/9.3?topic=network-queue-manager-health-check-behavior#qmgr_healthchk__filelocking__title__1.
- If EFS IO delays are encountered, the queue manager will be unable to complete messaging operations until IO responses are received, which may lead to delays or other impacts observed at the application.
- File Locks: IBM MQ relies on file locking yet EFS applies limits to the number of file locks available to an application via the NFS client. These limits are set by AWS. See here for current restrictions:
https://docs.aws.amazon.com/efs/latest/ug/limits.html#limits-client-specific
Update: Prior to May 2022, locks were restricted based on a combination of the number of locks per owner (file/process pair), the total number of locks, and the number of locks per file. This placed a direct limit on the total number of IBM MQ queues (implemented as separate files) that could be in use at any one time. Since May 2022, the limit on the number of separate files has been removed, reducing the restrictions. As of May 2022, the limits are a total of 65,536 locks, and 512 locks per file.
To understand the impact of this restriction you need to understand how MQ uses file locks. MQ gets file locks in many situations, for example:
- When the Queue Manager starts, file locks are obtained to complete logic such as logging, reading configuration and administrative operations. This often translates to at least 20 file locks being consumed before handling any messaging traffic. Depending on your individual configuration this can be slightly higher or lower.
- As queues are opened by an application locks are taken against the individual queue files representing those queues.
- As applications connections grow, the number of separate Queue Manager processes increases, each requiring locks on the queues used by the connections.
- To ensure a single active instance of the queue manager is running, each queue manager process will lock a shared file. This sets a maximum number of queue manager processes per queue manager based on a limit of locks per file by EFS.
- When a transaction is being processed, locks are taken to ensure the consistency of the processing across failure situations.
- For persistent messaging operations the queue manager periodically checks for consistency between its recovery logs and the queue files. This operation is called checkpointing and can take locks on the log and queue files.
The file locks can be observed by using standard operating system utilities such as lslocks on Linux:
root@ubuntu:~# lslocks | grep mqm
amqzmuf0 19757 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqfcxba 19813 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqzxma0 19710 POSIX 47B WRITE 0 0 0 /var/mqm/qmgrs/QM1/master
amqzxma0 19710 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqzmuc0 19727 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqzmuc0 19727 POSIX 6.1K READ 0 0 0 /var/mqm/log/QM1/amqhlctl.lfh
amqzmur0 19742 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqzfuma 19719 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqrrmfa 19760 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqzlaa0 19792 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
amqfqpub 19796 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
java 19852 POSIX 47B READ 0 0 0 /var/mqm/qmgrs/QM1/active
Predicting when the limit will be exceeded is complicated. Therefore, it is recommended that performance and stress testing is completed for your scenario with careful monitoring of the associated file locks to ensure you do not approach or reach the AWS EFS limit.
Document Location
Worldwide
Product Synonym
WebSphere MQ;MQ
Was this topic helpful?
Document Information
Modified date:
09 October 2023
UID
ibm11125177