Performance problems and hang situations

When the problem is a hang, it can be difficult to determine the cause of the problem.

Always look for any indication of a problem on the console log. If the log contains nothing of immediate interest, try to determine the extent of the hang. Try to put the users that are hung into groups with common characteristics:

Are only users of one particular application hung?

For example, local Db2 (on DBD1) requests are working, but requests to another remote Db2 (DBD2) are hanging.

Are local requests on the remote host (DBD2) working?

If not, this situation would appear to be a problem on the other host. Look for an abend, loop, or wait type problem. Check the remote host console for any relevant messages. Check whether Db2 commands on the remote host (DBD2) work.

For example, if no commands work, this situation could mean a problem in Db2. If most commands work, but a cancel thread command does not, this situation might indicate a problem in the DDF address space (ssnmDIST) or in VTAM® or TCP/IP.

If local requests are working, then the problem would appear to be in either the network or the DDF address space (ssnmDIST).

Is any network traffic flowing between the two hosts?

Check whether there are any other network users hung. If there are, then this situation is most likely a network problem.

Are all the hung users in one particular part of the network?

For example, all hung users might have terminals on one controller. In this case, the problem could be the controller. Display the status of the controller, for it might need to be used again. Another example could be that all the hung users are accessing the network through one particular link and this link is having a large numbers of errors. This situation could result in slow response times because of error recovery and retries. Investigate the link problem.

When a device or line that is connected to an NCP goes inoperative (INOP), NCP generates a miscellaneous data record (MDR). This information is sent to the owning SSCP. These records are found in LOGREC and can be viewed by requesting an EREP report. The records can also be seen online by using NetView Hardware Monitor (or an equivalent). These records contain information about why the resource went INOP. They are very useful for problem determination.

Is any particular address space using very high CPU utilization?

If so, monitor this situation, as the address space might be in a loop.

Are all users hung?

If so, this is could be a more fundamental problem with the operating system. Check whether any z/OS® commands are working.

In any hang situation, the results (or lack of results) from various displays can give a clearer picture of the scope of the problem. The more information that is available, the easier the problem diagnosis is, and, usually, the faster the resolution is. It is a discouraging task to find a problem in a VTAM dump when only told that it was hanging. It can also be time-consuming.

Questions to ask that are specific to VTAM

What is the status of the remote application LU (LUDBD2)?

The command D NET,ID=<remote DDF>,SCOPE=ALL issued on the host that owns the application indicates if the application is active and has any sessions with the local DDF. If the status is not 'ACTIV', check the meaning of the status in the manual and take appropriate action.

The same command that is issued on the local host displays the CDRSC status. Make sure this is ACTIV (or ACT/S) also.

Are sessions already set up?

The command above indicates whether there are active sessions. There are three sessions that are required for system use; one with a LOGMODE of SNASVCMG, and two with SYSTOSYS LOGMODE. If there are no sessions, then check the virtual route between the subareas.

Are the sessions working?

By repeating the displays, check the send and receive counts on the user sessions. Also, a DISPLAY THREAD(*) DETAIL command shows whether there is a conversation on each session. The users might be hung waiting for a conversation to end. When a session becomes available for a new conversation, this user's remote database access is processed.

Is the virtual route between the two subareas open and active?

Use a D NET,ROUTE command to see if the virtual route is operational. If the virtual route is blocked, this indicates there could be a storage shortage problem somewhere in the network. This might be a host or an NCP storage shortage.

Is the CDRM to CDRM session between the two hosts active?

The command D NET,ID=<CDRM name> can be used to display the status of the CDRMs, these should both be 'ACTIV'.

Are all users that are in session working and only those that are causing a session to be established hung?

In this circumstance, the problem could be a VTAM problem. Most of the send and receive data processing is done in the user's address space. All session establishment is done in the VTAM address space.

Do VTAM commands work?

If so, then VTAM seems to be working. If not, then VTAM might be in a wait (unlikely) or loop (more likely) or perhaps VTAM is unable to get a share of the CPU.

VTAM paging: Is VTAM doing a large amount of paging?

If so, what looks like a loop in VTAM could be VTAM running a long chain of control blocks.

If the hang occurs only on a session when one particular request is made, often a VTAM buffer trace can be used to see the last PIUs flowing on the session. These PIUs can often hold the key to the problem.