[Linux]

Determining problems with applications, commands and messages on Linux

If you encounter problems with IBM® MQ applications, commands, and messages, there are a number of questions that you can consider to help you to determine the cause of the problem.

About this task

As you go through the list, make a note of anything that might be relevant to the problem. Even if your observations do not suggest a cause straight away, they might be useful later if you need to carry out a systematic problem determination exercise.

When you open a case with IBM, you can include additional IBM MQ troubleshooting information (MustGather data) that you have collected to help with investigating the problem. For more information, see Collecting troubleshooting information.

Procedure

  1. Are messages failing to arrive on the queue?
    If messages do not arrive when you are expecting them, check if the message been put on the queue successfully:
    • Has the queue been defined correctly? For example, is MAXMSGL sufficiently large?
    • Is the queue enabled for putting?
    • Is the queue already full?
    • Has another application got exclusive access to the queue?
    Also check whether you are able to get any messages from the queue:
    • Do you need to take a sync point? If messages are being put or retrieved within sync point, they are not available to other tasks until the unit of recovery has been committed.
    • Is your wait interval long enough? You can set the wait interval as an option for the MQGET call. Ensure that you are waiting long enough for a response.
    • Are you waiting for a specific message that is identified by a message or correlation identifier (MsgId or CorrelId)? Check that you are waiting for a message with the correct MsgId or CorrelId. A successful MQGET call sets both these values to that of the message retrieved, so you might need to reset these values in order to get another message successfully. Also, check whether you can get other messages from the queue.
    • Can other applications get messages from the queue?
    • Was the message you are expecting defined as persistent? If not, and IBM MQ has been restarted, the message has been lost.
    • Has another application got exclusive access to the queue?
    If you cannot find anything wrong with the queue, and IBM MQ is running, check the process that you expected to put the message onto the queue for the following:
    • Did the application start? If it should have been triggered, check that the correct trigger options were specified.
    • Did the application stop?
    • Is a trigger monitor running?
    • Was the trigger process defined correctly?
    • Did the application complete correctly? Look for evidence of an abnormal end in the job log.
    • Did the application commit its changes, or were they backed out?

    If multiple transactions are serving the queue, they can conflict with one another. For example, suppose one transaction issues an MQGET call with a buffer length of zero to find out the length of the message, and then issues a specific MQGET call specifying the MsgId of that message. However, in the meantime, another transaction issues a successful MQGET call for that message, so the first application receives a reason code of MQRC_NO_MSG_AVAILABLE. Applications that are expected to run in a multiple server environment must be designed to cope with this situation.

    Consider that the message could have been received, but that your application failed to process it in some way. For example, did an error in the expected format of the message cause your program to reject it? If so, refer to the subsequent information in this topic.

  2. Do messages contain unexpected or corrupted information?
    If the information contained in the message is not what your application was expecting, or has been corrupted in some way, consider the following:
    • Has your application, or the application that put the message onto the queue, changed? Ensure that all changes are simultaneously reflected on all systems that need to be aware of the change. For example, the format of the message data might have been changed, in which case, both applications must be recompiled to pick up the changes. If one application has not been recompiled, the data will appear corrupted to the other.
    • Is an application sending messages to the wrong queue? Check that the messages your application is receiving are not intended for an application servicing a different queue. If necessary, change your security definitions to prevent unauthorized applications from putting messages on to the wrong queues. If your application uses an alias queue, check that the alias points to the correct queue.
    • Has the trigger information been specified correctly for this queue? Check that your application should have started; or should a different application have started?

    If these checks do not enable you to solve the problem, check your application logic, both for the program sending the message, and for the program receiving it.

  3. Are unexpected messages received when using distributed queues?
    If your application uses distributed queues, consider the following points:
    • Has IBM MQ been correctly installed on both the sending and receiving systems, and correctly configured for distributed queuing?
    • Are the links available between the two systems? Check that both systems are available, and connected to IBM MQ. Check that the connection between the two systems is active. You can use the MQSC command PING against either the queue manager (PING QMGR) or the channel (PING CHANNEL) to verify that the link is operable.
    • Is triggering set on in the sending system?
    • Is the message for which you are waiting a reply message from a remote system? Check that triggering is activated in the remote system.
    • Is the queue already full? If so, check if the message has been put onto the dead-letter queue. The dead-letter queue header contains a reason or feedback code explaining why the message could not be put onto the target queue. For more information, see Using the dead-letter (undelivered message) queue and MQDLH - Dead-letter header.
    • Is there a mismatch between the sending and receiving queue managers? For example, the message length could be longer than the receiving queue manager can handle.
    • Are the channel definitions of the sending and receiving channels compatible? For example, a mismatch in sequence number wrap can stop the distributed queuing component. For more information, see Distributed queuing and clusters.
    • Is data conversion involved? If the data formats between the sending and receiving applications differ, data conversion is necessary. Automatic conversion occurs when the MQGET call is issued if the format is recognized as one of the built-in formats. If the data format is not recognized for conversion, the data conversion exit is taken to allow you to perform the translation with your own routines. For more information, see Data conversion.

    If you are unable to solve the problem, contact IBM Support for help.

  4. Have you received no response from a PCF command?
    If you have issued a command but have not received a response, consider the following checks:
    • Is the command server running? Use the dspmqcsv command to check the status of the command server. If the response to this command indicates that the command server is not running, use the strmqcsv command to start it. If the response to the command indicates that the SYSTEM.ADMIN.COMMAND.QUEUE is not enabled for MQGET requests, enable the queue for MQGET requests.
    • Has a reply been sent to the dead-letter queue? The dead-letter queue header structure contains a reason or feedback code describing the problem. For more information, see MQDLH - Dead-letter header and Using the dead-letter (undelivered message) queue. If the dead-letter queue contains messages, you can use the provided browse sample application (amqsbcg) to browse the messages using the MQGET call. The sample application steps through all the messages on a named queue for a named queue manager, displaying both the message descriptor and the message context fields for all the messages on the named queue.
    • Has a message been sent to the error log? For more information, see Error log directories on AIX, Linux, and Windows.
    • Are the queues enabled for put and get operations?
    • Is the WaitInterval long enough? If your MQGET call has timed out, a completion code of MQCC_FAILED and a reason code of MQRC_NO_MSG_AVAILABLE are returned. See WaitInterval (MQLONG) for information about the WaitInterval field, and completion and reason codes from MQGET.
    • If you are using your own application to put commands onto the SYSTEM.ADMIN.COMMAND.QUEUE, do you need to take a sync point? Unless you have excluded your request message from sync point, you need to take a sync point before receiving reply messages.
    • Are the MAXDEPTH and MAXMSGL attributes of your queues set sufficiently high?
    • Are you using the CorrelId and MsgId fields correctly? Set the values of MsgId and CorrelId in your application to ensure that you receive all messages from the queue.

    Try stopping the command server and then restarting it, responding to any error messages that are produced. If the system still does not respond, the problem could be with either a queue manager or the whole of the IBM MQ system. First, try stopping individual queue managers to isolate a failing queue manager. If this step does not reveal the problem, try stopping and restarting IBM MQ, responding to any messages that are produced in the error log. If the problem still occurs after restart, contact IBM Support for help.

  5. Are only some of your queues failing?
    If you suspect that the problem occurs with only a subset of queues, check the local queues that you think are having problems.

    Use the MQSC command DISPLAY QUEUE to display the information about each queue. If the CURDEPTH is at MAXDEPTH, the queue is not being processed. Check that all applications are running normally.

    If the CURDEPTH is not at MAXDEPTH, check the following queue attributes to ensure that they are correct:
    • If triggering is being used, is the trigger monitor running? Is the trigger depth too great? That is, does it generate a trigger event often enough? Is the process name correct? Is the process available and operational?
    • Can the queue be shared? If not, another application could already have it open for input.
    • Is the queue enabled appropriately for GET and PUT?

    If there are no application processes getting messages from the queue, determine why this is so. It could be because the applications need to be started, a connection has been disrupted, or the MQOPEN call has failed for some reason. Check the queue attributes IPPROCS and OPPROCS. These attributes indicate whether the queue has been opened for input and output. If a value is zero, it indicates that no operations of that type can occur. The values might have changed, or the queue might have been open but is now closed.

    Check the status at the time you expect to put or get a message.

    If you are unable to solve the problem, contact IBM Support for help.

  6. Does the problem affect only remote queues?
    If the problem affects only remote queues, perform the following checks:
    • Check that required channels have started, can be triggered, and any required initiators are running.
    • Check that the programs that should be putting messages to the remote queues have not reported problems.
    • If you use triggering to start the distributed queuing process, check that the transmission queue has triggering set on. Also, check that the trigger monitor is running.
    • Check the error logs for messages indicating channel errors or problems.
    • If necessary, start the channel manually.
  7. Is your application or system running slowly?
    If your application is running slowly, it might be in a loop, or waiting for a resource that is not available, or there might be a performance problem.

    Perhaps your system is operating near the limits of its capacity. This type of problem is probably worst at peak system load times, typically at mid-morning and mid-afternoon. (If your network extends across more than one time zone, peak system load might seem to occur at some other time.)

    A performance problem might be caused by a limitation of your hardware.

    If you find that performance degradation is not dependent on system loading, but happens sometimes when the system is lightly loaded, a poorly-designed application program is probably to blame. This could appear to be a problem that only occurs when certain queues are accessed.

    A common cause of slow application performance, or the build up of messages on a queue (usually a transmission queue) is one or more applications that write persistent messages outside a unit of work. For more information, see Message persistence.

    If the performance issue persists, the problem might lie with IBM MQ itself. If you suspect this, contact IBM Support for help.