Identifying characteristics of the problem on Linux

Some initial questions to consider to help with identifying the cause of the problem.

About this task

Use the following questions as pointers to help you to identify the cause of the problem:

Has IBM® MQ run successfully before?
Have any changes been made since the last successful run?
Have you applied any maintenance updates?
Has the application run successfully before?
Are you receiving errors when you use special characters in descriptive text for some commands?
Are there any error messages or return codes to help you to determine the location and cause of your problem?
Can you reproduce the problem?
Does the problem affect specific parts of the network?
Does the problem occur at specific times of the day?
Is the problem intermittent?

As you go through the list, make a note of anything that might be relevant to the problem. Even if your observations do not suggest a cause straight away, they might be useful later if you need to carry out a systematic problem determination exercise.

When you open a case with IBM, you can include additional IBM MQ troubleshooting information (MustGather data) that you have collected to help with investigating the problem. For more information, see Collecting troubleshooting information.

Procedure

Has IBM MQ run successfully before?
If IBM MQ has not run successfully before, it is likely that you have not yet set it up correctly. For more information, see IBM MQ installation overview and Installing and uninstalling IBM MQ on Linux.
To run the verification procedure, see Verifying an IBM MQ installation on Linux. Also look at Configuring IBM MQ for information about post-installation configuration of IBM MQ.
Have any changes been made since the last successful run?
Changes that have been made to your IBM MQ configuration, or changes to other applications that interact with IBM MQ could be the cause of your problem.
When you are considering changes that might recently have been made, think about the IBM MQ system, and also about the other programs it interfaces with, the hardware, and any new applications. Consider also the possibility that a new application that you are not aware of might have been run on the system.
- Have you changed, added, or deleted any queue definitions?
- Have you changed or added any channel definitions? Changes might have been made to either IBM MQ channel definitions or any underlying communications definitions required by your application.
- Do your applications deal with return codes that they might get as a result of any changes you have made?
- Have you changed any component of the operating system that might affect the operation of IBM MQ?
Have you applied any maintenance updates?
If you have applied a maintenance update to IBM MQ, check that the update action completed successfully and that no error message was produced.
- Did the update have any special instructions?
- Was any test run to verify that the update was applied correctly and completely?
- Does the problem still exist if IBM MQ is restored to the previous maintenance level?
- If the installation was successful, check with IBM Support for any maintenance package errors.
- If a maintenance package has been applied to any other application, consider the effect it might have on the way IBM MQ interfaces with it.
Has the application run successfully before?
If the problem appears to involve one particular application, consider whether the application has run successfully before:
- Have any changes been made to the application since it last ran successfully?
  If so, it is likely that the error lies somewhere in the new or modified part of the application. Take a look at the changes and see if you can find an obvious reason for the problem. Is it possible to retry using a back level of the application?
- Have all the functions of the application been fully exercised before?
  Could it be that the problem occurred when part of the application that had never been invoked before was used for the first time? If so, it is likely that the error lies in that part of the application. Try to find out what the application was doing when it failed, and check the source code in that part of the program for errors. If a program has been run successfully on many previous occasions, check the current queue status and the files that were being processed when the error occurred. It is possible that they contain some unusual data value that invokes a rarely-used path in the program.
- Does the application check all return codes?
  Has your IBM MQ system been changed, perhaps in a minor way, such that your application does not check the return codes it receives as a result of the change. For example, does your application assume that the queues it accesses can be shared? If a queue has been redefined as exclusive, can your application deal with return codes indicating that it can no longer access that queue?
- Does the application run on other IBM MQ systems?
  Could it be that there is something different about the way that this IBM MQ system is set up that is causing the problem? For example, have the queues been defined with the same message length or priority?
Before you look at the code, and depending on which programming language the code is written in, examine the output from the translator, or the compiler and linkage editor, to see if any errors have been reported. If your application fails to translate, compile, or link-edit into the load library, it will also fail to run if you attempt to invoke it. For information about building your application, see Developing applications.
If the documentation shows that each of these steps was accomplished without error, consider the coding logic of the application. Do the symptoms of the problem indicate the function that is failing and, therefore, the piece of code in error? The errors in the following list illustrate the most common causes of problems encountered while running IBM MQ programs. Consider the possibility that the problem with your IBM MQ system could be caused by one or more of these errors:
- Assuming that queues can be shared, when they are in fact exclusive.
- Passing incorrect parameters in an MQI call.
- Passing insufficient parameters in an MQI call. This might mean that IBM MQ cannot set up completion and reason codes for your application to process.
- Failing to check return codes from MQI requests.
- Passing variables with incorrect lengths specified.
- Passing parameters in the wrong order.
- Failing to initialize MsgId and CorrelId correctly.
- Failing to initialize Encoding and CodedCharSetId following MQRC_TRUNCATED_MSG_ACCEPTED.
Are you receiving errors when you use special characters in descriptive text for some commands?
Some characters, for example, backslash (\) and double quote (") characters have special meanings when used with commands.
Precede special characters with a \, that is, enter \\ or \" if you want \ or " in your text. Not all characters are allowed to be used with commands. For more information about characters with special meanings and how to use them, see Characters with special meanings.
Are there any error messages or return codes to help you to determine the location and cause of your problem?
IBM MQ uses error logs to capture messages concerning its own operation, any queue managers that you start, and error data coming from the channels that are in use. Check the error logs to see if any messages have been recorded that are associated with your problem. For information about the locations and contents of the error logs, see Error logs on AIX, Linux, and Windows.
For each IBM MQ Message Queue Interface (MQI) and IBM MQ Administration Interface (MQAI) call, a completion code and a reason code are returned by the queue manager or by an exit routine, to indicate the success or failure of the call. If your application gets a return code indicating that a Message Queue Interface (MQI) call has failed, check the reason code to find out more about the problem. For a list of reason codes, see API completion and reason codes. Detailed information on return codes is contained within the description of each MQI call.
Can you reproduce the problem?
If you can reproduce the problem, consider the conditions under which it is reproduced:
- Is it caused by a command or an equivalent administration request? Does the operation work if it is entered by another method? If the command works if it is entered on the command line, but not otherwise, check that the command server has not stopped, and that the queue definition of the SYSTEM.ADMIN.COMMAND.QUEUE has not been changed.
- Is it caused by a program? Does it fail on all IBM MQ systems and all queue managers, or only on some?
- Can you identify any application that always seems to be running in the system when the problem occurs? If so, examine the application to see if it is in error.
Does the problem affect specific parts of the network?
Have you made any network-related changes, or changed any IBM MQ definitions, that might account for the problem?
You might be able to identify specific parts of the network that are affected by the problem (for example, remote queues). If the link to a remote message queue manager is not working, the messages cannot flow to a remote queue.
- Check that the connection between the two systems is available, and that the intercommunication component of IBM MQ has started.
- Check that messages are reaching the transmission queue, and check the local queue definition of the transmission queue and any remote queues.
Does the problem occur at specific times of the day?
If the problem occurs at specific times of day, it could be that it depends on system loading. Typically, peak system loading is at mid-morning and mid-afternoon, so these are the times when load-dependent problems are most likely to occur.
If your IBM MQ network extends across more than one time zone, peak system loading might seem to occur at some other time of day.
Is the problem intermittent?
An intermittent problem could be caused by the way that processes can run independently of each other. For example, a program might issue an MQGET call without specifying a wait option before an earlier process has completed. An intermittent problem might also be seen if your application tries to get a message from a queue before the call that put the message has been committed.