[UNIX][Linux]

Resource problems

How you determine and resolve problems connected to IBM® MQ resources, including resource usage by IBM MQ processes, determining and resolving problems related to insufficient resources, and your resource limit configurations.

Useful commands and the configuration file for investigating resource issues

Useful commands that display current values on your system or make a temporary change to the system:
ulimit -a
Display user limits
ulimit -Ha
Display user hard limits
ulimit -Sa
Display user soft limits
ulimit -<paramflag> <value>
Where paramflag is the flag for the resource name, for example, s for stack.

To make permanent changes to the resource limits on your system use /etc/security/limits.conf or /etc/security/limits.

[Linux]You can obtain the current resource limit set for a process from the proc file system on Linux®. For example, cat /proc/<pid of MQ process>/limits.

Basic checks before tuning IBM MQ or kernel parameters

You need to investigate the following:
  • Whether the number of active connections is within the expected limit.

    For example, suppose that your system is tuned to allow 2000 connections when the number of user processes is no greater than 3000. If the number of connections increases to more than 2000, then either the number of user processes has increased to more than 3000 (because new applications have been added), or there is a connection leak.

    To check for these problems use the following commands:
    • [UNIX]Number of IBM MQ processes:
      ps -elf|egrep "amq|run"|wc -l
    • [Linux]Number of IBM MQ processes:
      ps -eLf|egrep "amq|run"|wc -l
    • [UNIX][Linux]Number of connections:
       echo "dis conn(*) all" | runmqsc <qmgr name>|grep EXTCONN|wc -l
    • [UNIX][Linux]Shared memory usage:
      ipcs -ma
    • [Solaris]Shared memory usage with project details:
      ipcs -mJ
  • If the number of connections is higher than the expected limit, check the source of the connections.
  • If the shared memory usage is very high, check the following number of:
    • Topics
    • Open queue handles
  • From an IBM MQ perspective, the following resources need to be checked and tuned:
    • [Linux]Maximum number of threads allowed for a given number of user processes.
    • Data segment
    • Stack segment
    • File size
    • Open file handles
    • Shared memory limits
    • Thread limits, for example, threads-max on Linux
  • Use the mqconfig command to check the current resource usage.
Notes:
  1. Some of resources listed in the preceding text need to be tuned at user level and some at the operating system level.
  2. The preceding list is not a complete list, but is sufficient for most common resource issues reported by IBM MQ.
  3. [Linux]Tuning is required at thread level, as each thread is a light weight process (LWP).

Problem in creating threads or processes from IBM MQ or an application

Failure in xcsExecProgram and xcsCreateThread
Probe IDs, error messages, and components
XY348010 from xtmStartTimerThread from an IBM MQ process (for example amqzlaa0) or an application
XC037008 from xcsExecProgram with error code xecP_E_PROC_LIMIT from amqzxma0
XC035040 xcsCreateThread
XC037007 from xcsExecProgram with xecP_E_NO_RESOURCE
xcsCreateThread fails with xecP_E_NO_RESOURCE followed by failure data capture, for example ZL000066 from zlaMain
Probe IDs might be different. Check for the error codes xecP_E_PROC_LIMIT and xecP_E_NO_RESOURCE.
Error messages reporting errno 11 from pthead_create, for example: AMQ6119S: An internal IBM MQ error has occurred ('11 - Resource temporarily unavailable' from pthread_create.)
[AIX][Linux]Resolving the problem on AIX® and Linux
IBM MQ sets the error code xecP_E_PROC_LIMIT when pthread_create or fork fails with EAGAIN.
EAGAIN
Check and increase the number of processes for each user resource limit, and stack resource limits, using the ulimit command.
[Linux]Additional configuration required
Review and increase the limits for kernel.pid_max (/proc/sys/kernel/kernel.pid_max) and kernel.threads-max (/proc/sys/kernel/threads-max) kernel parameters.
You need to increase the maximum user processes (nproc) and stack size resources limits for the mqm user and any other any other user that is used to start the queue manager and the IBM MQ applications.
ENOMEM
IBM MQ sets the error code xecP_E_NO_RESOURCE when pthread_create or fork fails with ENOMEM.
Check and increase the stack size and data resource limits.
Notes:
  • You can increase the user process resource limits by using the ulimit command, or by changing the resource limit configuration file.
  • The changes using the ulimit command are temporary. Modify /etc/security/limits or /etc/security/limits.conf to make the changes permanent. You must check the actual configuration on your operating system, as the configuration might be different.
  • You should also review your OS manuals (for example, the man page for pthread_create) for more details on resource issues and tuning resource limits, and ensure that the resource limits are appropriately configured.
  • You should also check if the system is running short of resources, both memory and CPU.
[Solaris]Additional configuration required for ENOMEM and EAGAIN errors
Review and increase the stack (process.max-stack-size) and data resource limits for the project, using the projadd or projmod command.

Problems in creating shared memory

Error : shmget fails with error number 28(ENOSPC)
| Probe Id          :- XY132002                                               |
| Component         :- xstCreateExtent                                        |
| ProjectID         :- 0                                                      |
| Probe Description :- AMQ6119: An internal IBM MQ error has occurred         |
|   (Failed to get memory segment: shmget(0x00000000, 2547712) [rc=-1         |
|   errno=28] No space left on device)                                        |
| FDCSequenceNumber :- 0                                                      |
| Arith1            :- 18446744073709551615 (0xffffffffffffffff)              |
| Arith2            :- 28 (0x1c)                                              |
| Comment1          :- Failed to get memory segment: shmget(0x00000000,       |
|   2547712) [rc=-1 errno=28] No space left on device                         |
| Comment2          :- No space left on device                                |
+-----------------------------------------------------------------------------+
MQM Function Stack
ExecCtrlrMain?
xcsAllocateMemBlock
xstExtendSet
xstCreateExtent
xcsFFST

shmget fails with error number 22(EINVAL)
| Operating System  :- SunOS 5.10                                             |
| Probe Id          :- XY132002                                               |
| Application Name  :- MQM                                                    |
| Component         :- xstCreateExtent                                        |
| Program Name      :- amqzxma0                                               |
| Major Errorcode   :- xecP_E_NO_RESOURCE                                     |
| Probe Description :- AMQ6024: Insufficient resources are available to       |
|   complete a system request.                                                |
| FDCSequenceNumber :- 0                                                      |
| Arith1            :- 18446744073709551615 (0xffffffffffffffff)              |
| Arith2            :- 22 (0x16)                                              |
| Comment1          :- Failed to get memory segment: shmget(0x00000000,       |
|   9904128) [rc=-1 errno=22] Invalid argument                                |
| Comment2          :- Invalid argument                                       |
| Comment3          :- Configure kernel (for example, shmmax) to allow a      |
|   shared memory segment of at least 9904128 bytes                                                   |
+-----------------------------------------------------------------------------+
MQM Function Stack
ExecCtrlrMain
zxcCreateECResources
zutCreateConfig
xcsInitialize
xcsCreateSharedSubpool
xcsCreateSharedMemSet
xstCreateExtent
xcsFFST

[Solaris]Resolving the problem on Solaris
You should:
  • Increase the shared memory resource limit (project.max-shm-memory) for the project used by IBM MQ.
  • Find the project ID associated with the IBM MQ processes and applications, by using the:
    • ps command:
      ps -eo user,pid,uid,projid,args|egrep "mq|PROJID" 
      and the projects -l command, or
    • Project Id attribute in the failure data capture (FDC) header, and the projects -l command, or
    • ipcs -J, and the projects -l command

Unexpected process termination and queue manager crash, or queue manager crash

Process ending unexpectedly followed by FDCs from amqzxma0
Example FDC:
Date/Time         :- Mon May 02 2016 01:00:58 CEST
Host Name         :- test.ibm.com
LVLS              :- 8.0.0.4
Product Long Name :- IBM MQ for Linux (x86-64 platform)
Probe Id          :- XC723010
Component         :- xprChildTermHandler
Build Date        :- Oct 17 2015
Build Level       :- p800-004-151017
Program Name      :- amqzxma0
Addressing mode   :- 64-bit
Major Errorcode   :- xecP_E_USER_TERM
Minor Errorcode   :- OK
Probe Description :- AMQ6125: An internal IBM MQ error has occurred.

Possible Causes and Solutions
  • Check if the user has ended any process.
  • Check if the IBM MQ process ended because of a memory exception:
    • Did the process end with an FDC of Component :- xehExceptionHandler?
    • Apply the fix for known issues corrected in this area.
  • Check if the operating system ended the process because of high memory usage by the process:
    • Has the IBM MQ process consumed lot of memory?
    • Has the operating system ended the process?
      Review the operating system log. For example, the OOM-killer on Linux:
      
      Jan 2 01:00:57 ibmtest kernel: 
      amqrmppa invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0)
    • Apply the fix for known memory leak issues.

Difference in user limits used by a process against the configured limits

The user limits used by the process might be different from the configured limits. This is likely to happen if the process is started by a different user, or by user scripts, or a high availability script for example. It is important that you to check the user who is starting the queue manager, and set the appropriate resource limits for this user.