Identifying characteristics of the problem on z/OS
Some initial questions to consider to help with identifying the cause of the problem.
About this task
- Has IBM® MQ for z/OS® run successfully before?
- Are there any error messages, return codes or other error conditions?
- Can you reproduce the problem?
- Have you applied any APARs or PTFs?
- Have any changes been made since the last successful run?
- Has the application run successfully before?
- Does the problem affect specific parts of the network?
- Does the problem occur at specific times of the day or affect specific users?
- Is the problem intermittent or does the problem occur with all z/OS, CICS®, or IMS systems?
- Do you have a program error?
As you go through the list, make a note of anything that might be relevant to the problem. Even if your observations do not suggest a cause straight away, they might be useful later if you need to carry out a systematic problem determination exercise.
Procedure
-
Has IBM MQ for z/OS run successfully before?
If the answer to this question is No, consider the following:
- Check your setup. If IBM MQ has not run successfully on z/OS before, it is likely that you have not yet set it up correctly. See the information about installing and customizing the queue manager in Installing the IBM MQ for z/OS product for further guidance.
- Verify the installation.
- Check that message CSQ9022I was issued in response to the START QMGR command (indicating normal completion).
- Ensure that z/OS displays IBM MQ as an installed subsystem. To determine if IBM MQ is an installed subsystem use the z/OS command
D OPDATA. - Check that the installation verification program (IVP) ran successfully.
- Use the command DISPLAY DQM to check that the channel initiator address space is running, and that the appropriate listeners are started.
- Are there any error messages, return codes or other error
conditions? Investigate any error messages, return codes, and conditions where the queue manager or channel initiator terminated. The problem might produce the following types of error message or return codes:
- CSQ messages and reason codes
IBM MQ for z/OS error messages have the prefix CSQ. If you receive any messages with this prefix (for example, in the console log, or the CICS log), see IBM MQ for z/OS messages, completion, and reason codes for an explanation.
- Other messages
For messages with a different prefix, look in the appropriate messages and codes topic for a suggested course of action.
- Unusual messages
Be aware of unusual messages associated with the startup of IBM MQ for z/OS, or issued while the system was running before the error occurred. Any unusual messages might indicate some system problem that prevented your application from running successfully.
- Application MQI return codes
If your application gets a return code indicating that an MQI call has failed, see Return codes for a description of that return code.
- CSQ messages and reason codes
- Can you reproduce the problem? If you can reproduce the problem, consider the conditions under which you can reproduce it. For example:
- Is it caused by a command? If so, is the command issued from the z/OS console, from CSQUTIL, from a program written to put commands onto the SYSTEM.COMMAND.INPUT queue, or by using the operations and control panels?
- Does the command work if it is entered by another method? If the command works when it is entered at the console, but not otherwise, check that the command server has not stopped, and that the queue definition of the SYSTEM.COMMAND.INPUT queue has not been changed.
- Is the command server running? Issue the command
DIS CMDSERVto check. - Is it caused by an application? If so, does it fail in CICS, IMS, TSO, or batch? Does it fail on all IBM MQ systems, or only on some?
- Is an application causing the problem? Can you identify any application that always seems to be running in the system when the problem occurs? If so, examine the application to see if it is in error.
- Have you applied any APARs or PTFs? APARs and PTFs can occasionally cause unexpected problems with IBM MQ. These fixes can have been applied to IBM MQ or to other z/OS systems.
If an APAR or PTF has been applied to IBM MQ for z/OS, check that no error message was produced. If the installation was successful, check with IBM Support for any APAR or PTF error.
If an APAR or PTF has been applied to any other product, consider the effect it might have on the way IBM MQ interfaces with it.
Ensure that you have followed any instructions in the APAR that affect your system. (For example, you might have to redefine a resource.)
- Have any changes been made since the last successful run? When you are considering changes that might recently have been made, think about IBM MQ, and also about the other programs it interfaces with, the hardware, and any new applications. Consider also the possibility that a new application that you do not yet know about might have been run on the system.
- Has your initialization procedure been changed? Consider whether that might be the cause of the problem. Have you changed any data sets, or changed a library definition? Has z/OS been initialized with different parameters? In addition, check for error messages sent to the console during initialization.
- Have you changed any queue definitions or security profiles? Consider whether some of your queues have been altered so that they are members of a cluster. This change might mean that messages arrive from different sources (for example, other queue managers or applications).
- Have you changed any definitions in your sysplex that relate to the support and implementation of shared queues? Consider the effect that changes to such definitions as your sysplex couple data set, or Coupling Facility resource management policy. These changes might have on the operation of shared queues. Also, consider the effect of changes to the Db2® data sharing environment.
- Has any of the software on your z/OS system been upgraded to a later release? Consider whether there are any necessary post-installation or migration activities that you need to perform.
- Has your z/OS subsystem name table been changed? Changes to levels of corequisite software like z/OS or LE might require additional changes to IBM MQ.
- Do your applications deal with return codes that they might get as a result of any changes you have made? Ensure that your applications deal with any new return codes that you introduce.
- Has the application run successfully before? If the problem appears to involve one particular application, consider whether the application has run successfully before.
- Have any changes been made to the application since it last ran successfully? If so, it is likely that the error lies somewhere in the new or modified part of the application. Investigate the changes and see if you can find an obvious reason for the problem.
- Have all the functions of the application been fully exercised before? Did problem occur when part of the application that had never been started before was used for the first time? If so, it is likely that the error lies in that part of the application. Try to find out what the application was doing when it failed, and check the source code in that part of the program for errors. If a program has been run successfully on many previous occasions, check the current queue status and files that were being processed when the error occurred. It is possible that they contain some unusual data value that causes a rarely used path in the program to be invoked.
- Does the application check all return codes? Has your system has been changed, perhaps in a
minor way. Check the return codes your application receives as a result of the change. For example:
- Does your application assume that the queues it accesses can be shared? If a queue has been redefined as exclusive, can your application deal with return codes indicating that it can no longer access that queue?
- Have any security profiles been altered? An MQOPEN call might fail because of a security violation; can your application recover from the resulting return code?
- Does the application expect particular message formats? If a message with an unexpected message format has been put onto a queue (for example, a message from a queue manager on a different platform), it might require data conversion or another different form of processing.
- Does the application run on other IBM MQ for z/OS systems? Is something different about the way that this queue manager is set up that is causing the problem? For example, have the queues been defined with the same maximum message length, or default priority?
- Does the application use the MQSET call to change queue attributes? Is the application is designed to set a queue to have no trigger, then process some work, then set the queue to have a trigger? The application might have failed before the queue had been reset to have a trigger.
- Does the application handle messages that cause an application to fail? If an application fails because of a corrupted message, the message retrieved is rolled back. The next application might get the same message and fail in the same way. Ensure that applications use the backout count; when the backout count threshold has been reached, the message in question is put onto the backout queue.
If your application has never run successfully before, examine your application carefully to see if you can find any of the following errors:- Translation and compilation problems
Before you look at the code, examine the output from the translator, the compiler or assembler, and the linkage editor, to see if any errors have been reported. If your application fails to translate, compile/assemble, or link edit into the load library, it also fails to run if you attempt to invoke it. See Developing applications for information about building your application, and for examples of the job control language (JCL) statements required.
- Batch and TSO programs
For batch and TSO programs, check that the correct stub has been included. There is one batch stub and two RRS stubs. If you are using RRS, check that you are not using the MQCMIT and MQBACK calls with the CSQBRSTB stub. Use the CSQBRRSI stub if you want to continue using these calls with RRS.
- CICS programs
For CICS programs, check that the program, the IBM MQ CICS stub, and the CICS stub have been linked in the correct order. Also, check that your program or transaction is defined to CICS.
-
IMS
programs
For IMS programs, check that the link includes the program, the IBM MQ stub, and the IMS language interface module. Ensure that the correct entry point has been specified. A program that is loaded dynamically from an IMS program must have the stub and language interface module linked also if it is to use IBM MQ.
- Possible code problems
If the documentation shows that each step was accomplished without error, consider the coding of the application. Do the symptoms of the problem indicate the function that is failing and, therefore, the piece of code in error? See Step 10 for some examples of common errors that cause problems with IBM MQ applications.
- Do applications report errors from IBM MQ?
For example, a queue might not be enabled for "gets". It receives a return code specifying this condition but does not report it. Consider where your applications report any errors or problems.
- Does the problem affect specific parts of the network? You might be able to identify specific parts of the network that are affected by the problem (for example, remote queues). If the link to a remote queue manager is not working, the messages cannot flow to a target queue on the target queue manager.
- Check that the connection between the two systems is available, and that the channel initiator and listener have been started. Use the MQSC PING CHANNEL command to check the connection.
- Check that messages are reaching the transmission queue, and check the local queue definition of
the transmission queue, and any remote queues. Use the MQSC BYTSSENT keyword of
the DISPLAY CHSTATUS
command to check that data is flowing along the channel. Use
DISPLAY QLOCAL (XMITQ) CURDEPTHto check whether there are messages to be sent on the transmission queue. Check for diagnostic messages at both ends of the channel informing you that messages have been sent to the dead-letter queue. - If you are using IBM MQ clusters, check that the clustering definitions have been set up correctly.
Have you made any network-related changes that might account for the problem? Have you changed any IBM MQ definitions, or any CICS or IMS definitions? Check the triggering attributes of the transmission queue.
- Does the problem occur at specific times of the day or affect specific
users? If the problem occurs at specific times of day, it might be that it is dependent on system loading. Typically, peak system loading is at mid-morning and mid-afternoon, and so these periods are the times when load-dependent problems are most likely to occur. (If your network extends across more than one time zone, peak system loading might seem to occur at some other time of day.) If you think that your IBM MQ for z/OS system has a performance problem, see Dealing with performance problems on z/OS.
If the problem only affects some users, is it because some users do not have the correct security authorization? See User IDs for security checking for information about user IDs checked by IBM MQ for z/OS.
- Is the problem intermittent or does the problem occur with all
z/OS, CICS, or IMS
systems? A problem might be caused by application interaction or be related to other z/OS systems.
An intermittent problem could be caused by failing to take into account the fact that processes can run independently of each other. For example, a program might issue an MQGET call, without specifying WAIT, before an earlier process has completed. You might also encounter this type of problem if your application tries to get a message from a queue while it is in sync point (that is, before it has been committed).
If the problem only occurs when you access a particular z/OS, IMS, or CICS system, consider what is different about this system. Also consider whether any changes have been made to the system that might affect the way it interacts with IBM MQ.
- Do you have a program error? The following examples show the most common causes of problems encountered while running IBM MQ programs. Consider the possibility that the problem with your system could be caused by one of these errors.
- Programs issue MQSET to change queue attributes and fail to reset attributes of a queue. For example, setting a queue to NOTRIGGER.
- Making incorrect assumptions about the attributes of a queue. This assumption could include assuming that queues can be opened with MQOPEN when they are MQOPEN-exclusive, and assuming that queues are not part of a cluster when they are.
- Trying to access queues and data without the correct security authorization.
- Linking a program with no stub, or with the wrong stub (for example, a TSO program with the CICS stub). This can cause either a long-running unit of work, or an X'0C4' or other abend.
- Passing incorrect or invalid parameters in an MQI call; if the wrong number of parameters are passed, no attempt can be made to complete the completion code and reason code fields, and the task is abended. (This is an X'0C4' abend.) This problem might occur if you attempt to run an application on an earlier version of MQSeries® than it was written for, where some of the MQI values are invalid.
- Failing to define the IBM MQ modules to z/OS correctly (this error causes an X'0C4' abend in CSQYASCP).
- Failing to check return codes from MQI requests. This problem might occur if you attempt to run an application on a later version of IBM MQ than it was written for, where new return codes have been introduced that are not checked for.
- Failing to open objects with the correct options needed for later MQI calls, for example using the MQOPEN call to open a queue but not specifying the correct options to enable the queue for subsequent MQGET calls.
- Failing to initialize MsgId and CorrelId correctly. This error is especially true for MQGET.
- Using incorrect addresses.
- Using storage before it has been initialized.
- Passing variables with incorrect lengths specified.
- Passing parameters in the wrong order.
- Failing to define the correct security profiles and classes to RACF®. This might stop the queue manager or prevent you from carrying out any productive work.
- Relying on default MQI options for a ported application. For example, z/OS defaults to MQGET and MQPUT in sync point. The distributed-platform default is out of sync point.
- Relying on default behavior at a normal or abnormal end of a portal application. On z/OS, a normal end does an implicit MQCMIT and an abnormal end does an implicit rollback.