WAIT/LOOP procedure

The procedures for the WAIT and LOOP keywords are combined because the WAIT and LOOP symptoms might not be distinguishable at first.

Determine the type of WAIT or LOOP that is in progress

Use the following procedure to determine the type of WAIT or LOOP occurring, and to find the appropriate keywords for the problem.

Maintenance might change the offsets in these control blocks. For a current version of the control blocks, assemble DFSADSCT.

Is IMS being shut down?
- If the operator issued a CHECKPOINT DUMPQ, PURGE, or FREEZE command before the manifestation of the wait/loop, go to Shutdown processing.
- If IMS is not being shut down, continue with the next step.
Determine whether IMS was in selective dispatching mode.
Find the dispatch work areas in the formatted dump. The dispatch work areas are created using the DISPATCH or All IMS dump formatting options. The dispatch work area eye catcher is **DSP.
The selective dispatch bits are in the SFLAGS field in the DYNAMIC SAP EXT. section, where the X'xxxxxx8x' bit represents selective dispatching. To determine whether selective dispatching was entered for save area prefixes (SAPs), search the DISPATCH AREA section for the following message:
```
*** NOTE: THIS TCB IS IN SELECTIVE DISPATCHING FOR SAPS
```
If you find this message, IMS wrote an X'450F' log record to the OLDS. This log record contains information about dynamic SAPs, such as the highest number of dynamic SAPs used and the number of times IMS was in selective dispatch for dynamic SAPs.

Examine this X'450F' log record to help determine what might have led to the shortage of dynamic SAPs. Then go to the Determine the type of WAIT or LOOP that is in progress. While performing SAP analysis, keep in mind that the dynamic SAPs are labeled DYNAMIC SAP, and that the CURRENT TCB= indicates the associated task control block (TCB).

If IMS is not in selective dispatching mode, continue with the next step.
Can the operator communicate with IMS through the z/OS® system console by using the IMS outstanding reply to enter an IMS command, such as /DISPLAY?
- If no, or if you are not sure, go to step 5 now.
- If yes, the problem might be caused by:
  - A data communication failure.
  - The inability of a task to acquire a resource.
  - Non-completion of an event, such as I/O.
  Continue with the next step.
Can the IMS master terminal operator (MTO) communicate with IMS by issuing various IMS commands, such as /DISPLAY?
- If yes, go to Determine the type of WAIT or LOOP that is in progress.
- If no, the problem might be data communication related. If IMS is still running, issue the following commands:
  - Issue the IMS /DIS NODE nodename command. Save the IMS console output.
  - Turn on the IMS node trace with the /TRA SET ON NODE nodename command.
    Data is captured in the IMS X'6701' log record. Save the IMS OLDS for execution with IMS utility programs DFSERA10 and DFSERA30.
  - Consider turning the VTAM® buffer trace and VTAM internal trace on to complement the IMS node trace, as follows:
```
F NET,TRACE,TYPE=BUF,ID=nodename
F NET,TRACE,TYPE=VTAM,MODE=EXT,OPT=(API,PIU,MSG)
```
    GTF must be active for this option.
  - Obtain a memory dump of the IMS and VTAM regions using this series of commands:
```
DUMP COMM=(dump title)
R id JOBNAME=(j1,j2,j3,j4,j5,j6,j7),SDATA=(CSA,PSA,RGN,SQA,SUM,TRT),END
```
    The variables have the following meanings:
    
    j1
    
    IMS CTL region job name.
    
    j2
    
    VTAM region job name.
    
    j3
    
    IMS DL/I region job name.
    
    j4
    
    Suspicious IMS dependent region job name, if any.
    
    j5
    
    Suspicious CCTL (CICS®) region name, if any.
    
    j6
    
    DBRC region job name.
    
    j7
    
    IRLM region job name (if IRLM database locking was used).
    
    The jobs are listed in order of importance.
    Recommendations: A memory dump of the IMS CTL, VTAM, DL/I, and suspicious dependent region or CCTL is usually sufficient to solve wait/hang problems. Occasionally, the DBRC and IRLM (if they are used for database locking) can be a factor. Obtain a memory dump of DBRC and IRLM as well to ensure that the problem can be resolved quickly.
    
    SYS1.DUMP data sets are often not large enough to hold all regions requested in the DUMP command. Make them large enough to hold the regions. If the z/OS SVC DUMP command fails due to lack of space, take separate memory dumps in smaller combinations to accommodate the smaller SYS1.DUMP data set size.
  - Go to the Determine the type of WAIT or LOOP that is in progress.
Query the IMS Dispatch Work Areas.
1. Find the Dispatch Work Areas in the formatted dump. The Dispatch Work Areas are created using the DISPATCH or ALL IMS dump formatting options. The Dispatch Work Area eye catcher is **DSP.
2. Scan each Dispatch Work Area (STM, CTL, restart data set, and so on) except for the DRC and dependent region entries (labeled DEP, MPP, BMP, DBT, DRA, or IFP). Examine the QPOST field at offset X'1C'.
  If the high-order bit of the QPOST field is off, note the address and type of Dispatch Work Area.
3. If, after scanning all Dispatch Work Areas, except for the DBRC (DRC) task and dependent regions, you find that the QPOST high-order bit is always set, one of the following situations has occurred:
  - IMS is in an IMS WAIT (IWAIT) state. Go to Determine the type of WAIT or LOOP that is in progress now.
  - If at least one Dispatch Work Area has an incorrect high-order bit, a LOOP or operating system WAIT has occurred. Continue with the next step.
Query the TCB/RB chain.
1. Find the current ECB, address space ID (ASID), and TCB address for each Dispatch Work Area noted previously in step 5b.
  - In IDSPWRK SECTION 1, find field CECB at offset X'28'. The field CECB at offset X'28' contains the ECB of the current dispatched ECB.
  - In IDSPWRK SECTION 1, find the field ASIDS at offset X'30'. The first halfword of the field ASIDS at offset X'30' contains the ASID number for the task; the second halfword contains the CTL region ASID.
  - In IDSPWRK SECTION 1, find the field TCB at offset X'40'. The field TCB at offset X'40' contains the TCB address for the task.
2. Find the formatted TCB/RB chain in the z/OS formatted dump. Use the IPCS SUMMARY FORMAT ASID(X'__') command for the ASID/TCB found in step 6a. Use the following FIND command to locate the TCB:
```
F 'TCB: xxxxxxxx' 1 16
```
  where xxxxxxxx is the 8-character TCB address, including leading zeros.
3. Examine the request block (RB) structure (PRBs, SVRBs, or IRBs), focusing on the last RB in the chain for that TCB. The TCBRBP field at offset X'00' contains the address of the last RB. Use the following FIND command to locate the RB:
```
F 'RB: xxxxxxxx' 1 16
```
  where xxxxxxx is the 8-character RB address, including leading zeros.
  Exception: Using the last RB in the TCBs RB chain is usually accurate. However, there are occasions when additional RBs might be appended to the end of the chain to facilitate dump processing, but they have nothing to do with the problem. X'00020033' in the WLIC field in any RB in the RB chain normally indicates dump processing. In such a case, examine the RBs prior to the RB with WLIC=X'000020033'. If the RB before the RB containing WLIC=X'00020033' contains WLIC=X'0002000C, it might be necessary to examine the RB before the RB containing WLIC=X'00002000C'.
  Example:
```
PRB  WLIC = X'00020006'
PRB  WLIC = X'00020078'
SVRB WLIC = X'0002000C'  Examine prior RB.
SVRB WLIC = X'00020033'  <== Indicates dump processing
SVRB WLIC = X'00020078'
```
4. Examine the LINK field in the RB found in step 6c. The high-order byte of the LINK field is the wait count field.
  - If the wait count is X'00', the task is probably looping. Perform the following steps:
    - Perform system loop diagnostics. Obtain the OPSW and registers from the looping RB, (located in the following RB or in the TCB, if this is the last RB (TCBRBP)) for a snapshot of the loop.
    - Obtain the PSW address from the z/OS system trace table. Use the IPCS VERBX TRACE ASID(xx) command to obtain the entries for the ASID in question. Focus on the entries for the TCB found in step 6a. You can ignore entries between any SVC and associated SVCR because they reflect necessary z/OS operating system activity indirectly involved in the loop. (The IMS TYPE2 SVC is an exception to this since it results in execution of IMS code.) Sorting the pertinent addresses by OPSW address greatly aids in laying out the loop.
    - Resolve the PSW address found by using either IPCS BROWSE mode, the IPCS WHERE command, or by using an LPA or NUCLEUS MAP to obtain the name of the modules involved in the loop. The IPCS commands used to obtain the maps are LPAMAP, and VERBX NUCMAP. Calculate the offset at which the instruction appears in the modules to outline the path of the loop.
    - Another source of information for the looping task can sometimes be found at the top of the IMS SAPS AND SAVEAREA section (**SSA) of the IMS formatted dump. Look for the **** A C T I V E **** save area set nearest the top of the **SSA with the SAPECB filed matching the CECB field obtained in step 6a. The save area flow can indicate IMS modules involved in the loop or those passing control to the looping function.
  - If the wait count is not X'00' (that is, = X'01', or X'02', and so on), a system WAIT has probably occurred. Perform the following steps:
    - Obtain the address portion of the OPSW. It points to the waiting module.
    - Resolve the PSW address found by using either IPCS BROWSE mode, the IPCS WHERE command, or by using an LPA or NUCLEUS MAP to obtain the name of the waiting module. The IPCS commands used to obtain the maps are LPAMAP, and VERBX NUCMAP, respectively. Calculate the offset at which the wait occurred in the module. This information can be used for APAR searches and to assist IBM® Software Support representatives.
    - Use the CECB field obtained in step 6a to find the related SAP save area by scanning for the SAPECB match in the IMS formatted memory dump **SSA section.

SAP analysis procedure

Find the formatted SAP AND SAVE AREA section in the IMS formatted dump.

Choose either the SAVEAREA, SYSTEM, ALL or SAVEAREA,SUM options of the IMS Offline Dump Formatter. The eye catcher of the SAP AND SAVE AREA section is **SSA.

The following table defines the key fields in SAP analysis.

Table 1. Key fields in SAP analysis
Offset	Field name	Length	Field description
SAP+X'00'	SAPFLAG1	1	X'80' = Active SAP X'40' = Waiting SAP
SAP+X'01'	SAPDSPCD	1	IMS TCB number. This number matches the associated TCB number at offset X'3B' in the dispatch work area.
SAP+X'14'	SAPIWAIT	4	In waiting SAPs, this is the address of the last active save area. Those below this address are residual. In SAPs that are active but not waiting, this field is residual and should not be used. Exception: SAPIWAIT might not be valid for Fast Path save area sets (DBF-prefixed modules). The active save area set usually ends with DBFXSL30, the Fast Path wait module, unless DFSIWAIT or DFSISERW appears previously in a save area set.
SAP+X'18'	SAPECB	4	Address of the ECB associated with this ITASK. If the PST is used, this field points to the beginning of the PST.
SAP+X'24'	SAPCDSP	4	Address of the current dispatch work area.
SAP+X'30'	SAPSDPNO	4	Dispatch number for the ITASK.

Begin SAP analysis at the end of the sorted SAPs.
Find the end of the sorted SAPS. Eye catcher ***END OF SORTED SAP FORMATTING marks the end of the list. SAPs are sorted by the SAPSDPNO (system dispatch number). The most recently dispatched ITASKs are at the end of the sorted SAPs. These are the ITASKS that have been waiting the longest and possibly causing the other ITASKS to wait behind them by holding a resource, such as a lock or a latch.
Scan backwards from the end, examining only active or waiting SAPs. Focus only on the active save area sets (that is, SAPFLAG1 has the X'00' bit turned on (X'08', X'Cx', X'Dx', X'Fx')). Active save area sets are marked with the eye catcher **** W A I T I N G **** or **** A C T I V E ****. To find waiting or active SAPs, use the following find command:
```
F '   **** ' PREV
```
The SAVEAREA,SUM option of the Offline Dump Formatter produces only active save area sets. Active running SAPs are marked with the eye catcher RUN. The end of this formatting is marked by the eye catcher ****** END SAP SUMMARY.
Skip all normal save area sets.
This step describes all normal save area sets. After you have identified all types of normal save area sets, you can disregard them because they are unrelated to the problem.
1. WAITING save area sets in which module name DFSIWAIT appears after label EP at the second-level save area are considered normal save area sets.
  The following example shows a normal save area set at the second level:
```
  ***SAVE AREA SET***
     EP DFSQMRT0-11/13/94
     SA 00133BC4         WD1 8091E430   HSA 80000000   LSA 00133C0C ...
 
     EP DFSIWAIT
     SA 00133C0C         WD1 00000000   HSA 00133BC4   LSA 00133C54 ...
 
     EP DFSFLLG0-220-PL46803
     SA 00133C54         WD1 00000000   HSA 00133C0C   LSA 00133C9C ...
     ......
```
2. The only normal save area sets in which the save area set contains DFSIWAIT at the third level are shown in the following example. Ensure that register 08 contains a value of X'00000003' for any of the first four save area sets, as shown in the example. Otherwise, it is abnormal and indicates an intent conflict, as described in Intent conflict. Use the SAPSECB field to obtain the PST address for use in the intent conflict procedure.
```
   EP DFSSMIC0 --> EP SMSC2    --> EP DFSIWAIT with REG08 = x'00000003'
     EP DFSSMIC0 --> EP DFSSMSC2 --> EP DFSIWAIT with
     REG08 = x'00000003'
     EP DFSSMIC0 --> EP DFSSMSC1 --> EP DFSIWAIT with
     REG08 = x'00000003'
     EP DFSSMIC0 --> EP MPPENQ00 --> EP DFSIWAIT with REG08 = x'00000003'
 
     EP DFSFXC30 --> EP DFSFXC30-WFITEST  --> EP DFSIWAIT
     EP DFSVTP00 --> EP VTPOWORK --> EP DFSIWAIT
     EP DBFHCL00 --> EP DBFHGU10 --> DBFXSL30 
```
3. The only normal save area sets in which the save area contains DFSIWAIT at the fourth level are those shown in the following example. Ensure that register 08 in the DFSIWAIT save area set contains X'00000003'. Otherwise, it is abnormal and indicates an intent conflict, as described in Intent conflict. Use the SAPSECB field to obtain the PST address for use in the intent conflict procedure.
  The following examples show normal save area sets at the fourth level:
```
   DFSSMIC0 --> DFSSMSC0 --> SMSC1000 --> DFSIWAIT  REG08 = x'00000003'
   DFSFXC30 --> DFSDLA30 --> DLA32000 --> DFSIWAIT
```
4. The following active save area sets are probably normal, so you can ignore them.
  - Save area sets marked ACTIVE or RUN with SAPDSPCD=X'07'. This is a DRC task SAP. This condition is usually normal for the DBRC task.
  - Save area sets marked ACTIVE or RUN with SAPDSPCD=X'0F'. This is the ESI task SAP if SAPCDSP=X'00000000'.
  - Dependent region save area sets marked ACTIVE with SAPDSPCD=X'03'(MPP), X'04'(BMP), X'0D'(DRA), X'12' (IFP), X'13'(DBT), X'0C' (ESS), or X'00' (RESIDUAL), in which the top save area indicates it was returned. (The last bit of the address in the field labeled RET, which is register 14, is odd or has X'FF' in the high-order byte.)
  - If the SAPDSPCD=X'13'(DBT), and the first save area EPA is marked UNKNOWN with the second-level save area RET field marked returned (the last bit of the address in RET is odd), this is a normal save area set if the first save area EPA is within module DFSDASC0 or DFSDAST0.

Obtain abnormal save area set information.

The remaining save area sets (those that are ACTIVE or WAITING, but abnormal, as described in step 4 are involved in the wait in some way.

Recommendation: Concentrate on one save area set at a time, beginning with the first abnormal save area set. Remember to start from the end of the sorted SAPs.

If you find an abnormal save area set marked **** A C T I V E **** (SAPFLAG1=X'80'), the problem is associated with the TCB/RB save area set. Use the address of the current dispatch area in SAPCDSP to find the dispatch work area associated with this save area set. Go to step 6a in the WAIT/LOOP procedure. Continue from there, using the ASID/TCB obtained from the dispatch work area. If the high-order bit in QPOST is on (QPOST=X'8x'), this SAP is suspended. Record this save area set and continue to the next abnormal save area set. Discontinue step 6a because this save area set should probably be ignored. Otherwise, continue.

Record the following key fields from the abnormal save area sets flagged as:

**** W A
I T I N G ****

The address of the SAP.
For each save area in the save area set, from the first save area down to the save area pointed to by the SAPIWAIT field, obtain the following information. (See exception for SAPIWAIT in Table 1 before proceeding.)
1. EP module name
2. APAR level (the APAR number and last few letters of the changeID string)
3. RET address (this is register 14)
4. EPA address
If the module name is UNKNOWN and the module save area set begins with DFSDLA00, the EPA address can probably be resolved in the DL/I region dump by using IPCS BROWSE mode for the DL/I ASID.

The offset from which DFSIWAIT, DFSISERW, or DBFXSL was invoked from the calling module.

You can calculate the offset by subtracting the EPA address in the save area before the save area pointed to by SAPIWAIT from the RET address of the save area pointed to by SAPIWAIT.

The following table shows key data from an abnormal save area set.

Table 2. Key data from an abnormal save area set
EP module name	APAR number	Last few ChangeIDs	RET	EPA	Wait call offset
DFSCST00	PL45938	abcde	80A7BA14	00A8E110
DFSDBDR0	PL49770	..mnopr	60A8E6D6	00A07A58
DFSBML00	none		50A07AC2	00B5DAE0	X'10E'
DFSIWAIT	none		40B5DBEE	70A7C7F6

Identify the reason for the WAIT.
To identify the reason for the WAIT, do the following:
1. Assemble the module that issued the wait. Use the offset obtained in step 5 as an approximate displacement into the module where an IWAIT or ISERWAIT was issued. Examine the code and comments at that point. Most modules give the reason for the IWAIT in the comments above the IWAIT issue point.
  The EP name might not be the actual module name, but rather a CSECT within a module. To find the actual module name, using IPCS BROWSE mode, scan backwards from the EPA address for the actual module name.
Repeat steps 5 and 6 for the first three abnormal save area sets you found.
You should be able to gather enough information from the first three abnormal save area sets to perform a search or determine the cause of the problem.

Keyword: WAIT

At this point, you can be sure that you are in an IMS WAIT. Therefore, WAIT is an appropriate keyword for the search argument.

Keyword: module name issuing IWAIT or ISERWAIT

The Module Name column in your worksheet indicates the modules that issued the IWAITs. These modules can provide useful search arguments. Use the 8-character module name for this keyword.

Keyword: WAIT reason

The IWAIT REASON column in your worksheet indicates the reason or resource, or both, that is causing the IMS WAIT.

For example, if the reason was a WAIT for the DPST latch, the IWAIT REASON keyword is DPST LATCH.

Keyword: additional related keywords

External events might trigger WAITs. These events might be indicated by console messages, or they might be related to a procedure that was being performed at the time the WAIT began.

You can use each of these additional keywords in the search argument when applicable.

Search argument example

Consider this scenario:

IMS went into a IWAIT after a WADS write error occurred.
Multiple unusual save area sets were found from module DFSFLLG0.
The reason for the IWAIT was found to be the LOG LATCH.

The broad search argument to use is:

5655J3800 WAIT LOG | LATCH | W ADS | DFSFLLG0

For a structured database search, use this search argument:

PIDS/5655J3800 WAIT PCSS/LOG | PCSS/LATCH | PCSS/WADS | RIDS/DFSFLLG0

With this search argument, you might receive numerous search results, which will probably contain the APAR describing your problem. You can then take various combinations of the additional keywords that were compared with the OR operator in the above example and use the AND operator on the keywords instead. You can use this technique to narrow your field of search until you find the appropriate APAR.

PST analysis

This section deals with analyzing regions for possible problems in scheduling, intent conflicts, and so forth.

Determine the number of active regions.
SCDREGCT at SCD+X'C8A' is a 2-byte field that contains the number of active regions, if any.

If SCDREGCT = X'0000', no regions are active. Go back to Determine the type of WAIT or LOOP that is in progress.

If SCDREGCT is not equal to X'0000', go to step 2.
Determine if the scheduler sequence queues (SSQs) have any entries.
Obtain the address of the transaction anchor block (TAB) from the SCDTAB field in the DSECT (label TABEP in the formatted dump). The TAB, which is mapped by DSECT DFSTAB, consists of:
- TAB header
- Headers for each of the six subqueues (SSQ1 - SSQ6)
- Class vector table (CVT)
- Transaction class tables (TCTs)
If the count of partition specification tables (PSTs) waiting on any subqueue (field TABSCHQC) equals 0, no region should be waiting on any subqueue. However, you should also check each subqueue header. Calculate the address of the subqueue header for a specific subqueue (SSQ#) as follows:
1. SSQ# × X'18' - X'8' = offset of header for SSQ#
2. Offset of header for SSQ# + SCDTAB address = address of header for SSQ#
Perform this calculation for each subqueue number. If field TABSSQnF, where n is the subqueue number, is not zero, this field contains the address of an entry on the SSQ for the specified subqueue.
1. The SSQ consists of the following six subqueues. All subqueues are formatted in a dump.
  
  Subqueue 1
  
  Reserved for future use.
  
  Subqueue 2
  
  JMP region waiting for work.
  
  Subqueue 3
  
  MPP region waiting for work.
  
  Subqueue 4
  
  MPP/JMP region waiting for intent.
  
  Subqueue 5
  
  BMP/JBP region waiting for intent.
  
  Subqueue 6
  
  MPP/BMP/JMP region waiting for input.
2. Each subqueue represents a resource. A PST enqueued on a subqueue is waiting for that resource.
3. The TAB and SSQs are formatted after the SCD LATCH EXTENSION in an IMS formatted dump, as follows:
```
      **TAB - TRANSACTION ANCHOR BLOCK**  
      0D1873B0                    005800FF 00000000    *        ........*
      0D1873C0  0000000E 00000000 00000000 00000000    *................*
      0D1873D0  00000000 00000000 00000000 00000000    *................*
          LINES   0D1873E0-0D1873EF   SAME AS THE ABOVE
      0D1873F0  00000000 00000000 0CF18544 0CF00C40    *.........1...0. *
      0D187400  00000000 00000000 00003614 00000000    *................*
      0D187410  0CF18C40 0CF18C40 00000000 00000000    *.1. .1. ........*
      0D187420  00003AEB 00000000 00000000 00000000    *................*
      0D187430  00000000 00000000 0000396E 00000000    *................*
      0D187440  00000000 00000000 00000000 00000000    *................*
      0D187450  000010B4 00000000 0D187858 0D1878B0    *................*
      0D187460  0D187908 0D187960 0D1879B8 0D187A10    *................*
      0D187470  0D187A68 0D187AC0 0D187B18 0D187B70    *................*
      ........
      ........
      ........
      ........
 
          ***SCHEDULER SEQUENCE QUEUES***
 
          DFSPSTQE  00000000     SUBQ  1            NOT ACTIVE
                                 SUBQ  2            NOT ACTIVE
                                 SUBQ  3            NOT ACTIVE
                                 SUBQ  4            NOT ACTIVE
                                 SUBQ  5            NOT ACTIVE
                                 SUBQ  6            NOT ACTIVE
```
4. If the words NOT ACTIVE follow the subqueue entry, no PSTs are enqueued on that entry.
5. If entries are listed for subqueue 3, go to No work to do.
6. If no entries are listed for subqueue 3, go to step 3.
Are there subqueue 4 or 5 entries?
Subqueue 4 does not apply to a DBCTL environment.

Entries on subqueue 4 or 5 are waiting for intent conflicts to be resolved.
1. If entries are listed for subqueue 4 or 5, go to Intent conflict.
2. If not, go to step 4.
Are there subqueue 6 entries?
This step does not apply to a DBCTL environment. Continue with the next step.

Entries on subqueue 6 are waiting for input.
1. If there are entries listed for subqueue 6, go to WAIT for input.
2. If there are no entries, go to step 5.
Are all regions accounted for?
Compare the number of regions in the SCDREGCT (SCD+X'C92') with the number of regions enqueued on the subqueues. (The SCDREGCT is 2 bytes.)
1. If the numbers of regions are equal, go to step 6.
2. If the numbers of regions are not equal, all regions are unaccounted for. Go to the analysis for PST analysis.
Report the problem.
This problem occurs when there are entries queued on the subqueues and no reason can be found to prevent their scheduling, but nothing schedules. Report the problem to the IBM Support Center.

PST active

You reach this point in the analysis either when:

The SCDREGCT field is not equal to zero, and there are no entries on the Scheduler Sequence Queues, or
No problem was found in analyzing the PSTs on the subqueues, and the number of PSTs on the subqueues is less than that in the SCDREGCT field.

Locate the PSTs.
Find the stack of dependent region PSTs in the dump. (Two stacks of PSTs exist in the dump. System PSTs are printed separately from the dependent region PSTs.)
Is the PST scheduled?
1. Find all the PSTs with PSTTERM (X'1BC') = X'02' (ACTIVE) and PSTCODE1 (X'B7A') = X'10' (SCHEDULED).
2. Ignore the PSTs without the SCHEDULED bit on.
For the scheduled PSTs, do SAP analysis.
1. PST at offset minus X'04' (field name PTR) is usually the SAP address. (The PTR field is the last entry on the line above the X'0000' line in the dump.) If not, PST + X'5B8' (PSTSAV1) is the address of the first Save Area in a set, and WD1 in that Save Area is the address of the SAP.
2. Go to Determine the type of WAIT or LOOP that is in progress. Return here after doing SAP analysis for the scheduled PSTs only.
Are there any ACTIVE non WAITING SAPs?
1. If any of the SAPs are marked ACTIVE go to step 5.
2. If SAPs are found WAITING, use normal SAP analysis to report the problem. Use the search argument format Search argument example.
Is the dependent region active within an IMS save area set?
1. If SAP +X'08' (SAPCNTRL) = X'10', this region is in a DL/I call within IMS. Go to step 6.
2. Otherwise go to step 7.
Analyze the region dump.
You must analyze the region dump using the PSW address to identify the problem. Refer to WAIT/LOOP procedure, steps 6c and 6d.
Determine what the application program is doing.
You must analyze the region dump using the PSW address to identify what the application program is doing.

In a DBCTL environment, you must analyze the CCTL region dump using the PSW address to find out what the DRA, CCTL, or application program is doing. Refer to WAIT/LOOP procedure, steps 6c and 6d.
Determine the reason the latch is not freed.
If a latch is being waited for, and the owner is not waiting for I/O, use SAP analysis to identify the reason for the WAIT.

No work to do

This section does not apply to a DBCTL environment.

You came to this point because subqueue 3 contains PSTs.

Locate the PSTs on subqueue 3.
The addresses under the field name SQPSTADD are the PST addresses. In the formatted dump, the PSTs start with the eye catcher *** DB PST AREA ***. Locate the PSTs that are on subqueue 3.
Find the classes the PSTs can execute.
PST + X'C68' (PSTCLASS) is an 8-byte field. Each byte indicates a class transaction that the PST is allowed to process. For example, if PSTCLASS = 00010003 00050006, the PST can process classes 0001, 0003, 0005, and 0006.
For each PST on subqueue 3, locate the transaction class table (TCT) for each class that the PST can process. There is one TCT for each class.
1. Obtain the TAB address from the SCDTAB. SCD+B88 points to SCDTAB and is labeled TABEP in IMS Dump Formatter.
2. Take the first PSTCLASS value and subtract 1.
3. Multiply this result by 4.
4. Add this value to the TABCLASS offset value + X'A0'.
5. TCT = 4 x (first PSTCLASS value - 1) + X'A0'.
  When the high-order byte contains a X'80' this indicates the TCT class is not active.
Can any SMBs be scheduled?
TCT +X'04' = zero or the address of an SMB that can be scheduled.
1. If zero, no SMBs can be scheduled. Go to step 7.
2. If SMBs can be scheduled, locate the SMBs and then go to step 5.
Is SMB locked or stopped?
1. If SMB +X'24' (SMBSTATS) = X'10' (STOPPED) or X'08' (LOCKED), go to step 6.
2. Otherwise, go to step 9.
Are there any more SMBs on this class?
1. If SMB+X'04' (SMBQEFP) is not equal to zero, it is the address of the next SMB. Move on to the next SMB and repeat step 5.
2. If SMB+X'04' (SMBQEFP) = zero, there are no more SMBs. Go to step 7.
Are all classes accounted for?
1. If all classes found in PST + X'C68' (PSTCLASS) are not accounted for, repeat step 4 for each remaining class.
2. Otherwise, go to step 8.
Are all regions accounted for?
To determine whether all regions are accounted for, use SCDREGCT (SCD + X'C8A'). The SCDREGCT is 2 bytes. There is one PST for each region.
1. If the number of PSTs on subqueue 3 is equal to the SCDREGCT and they have been examined and accounted for, there are no transactions scheduled for the regions. This is a normal WAIT, and there is no work for IMS to perform. This is not a problem.
2. Otherwise, go back to 3 to continue the scheduler queue analysis.
Locate the PSB directory (PDIR).
If the SMB is not locked or stopped, locate the PDIR: SMB+X'3C' (SMBPDIR) = address of the PDIR.
Can PDIR schedule?
Locate the PDIR entry. When any of the following bits are ON, the PDIR is unable to schedule.

PDIR +X'20' (PDIRCODE) =

X'40'X'10'X'08'X'02'
1. If the PDIR cannot schedule, go back to step 6.
2. Otherwise, go to step 11.
Is PDIR marked parallel?
1. If the PDIR is marked scheduled but not parallel:
```
PDIR+X'20' (PDIRCODE) = X'04' (Scheduled)
and:
PDIR+X'21' (PDIROPTC) is not equal to X'04' (Not parallel)
```
  If there are entries listed for subqueue 6, go to WAIT for input to determine if any of the waiters on subqueue 6 are pseudo WFIs scheduled against the same PDIR. If there is a pseudo WFI scheduled against the same PDIR, report the problem to the IBM Support Center.
  
  If there are no entries listed for subqueue 6 or none of the waiters on subqueue 6 point to the same PDIR, go back to step 6.
2. If marked parallel (PDIR +X'21' = X'04'), go to step 12.
Are enough messages enqueued for another PST?
If the PDIR is marked parallel, check if enough messages are enqueued on the SMB to schedule another PST.
1. You do this by finding:
  1. SMB+X'46' (SMBPARLM) = number of messages per region (2 bytes).
  2. SMB+X'44' (SMBRGNS) = number of message regions scheduled for the SMB (2 bytes).
  3. SMB+X'1A'(SMBENQCT) minus SMB +X'18' (SMBDEQCT) = number of messages currently enqueued. (To find the number currently enqueued, subtract the messages dequeued from those enqueued.)
2. If the number of messages currently enqueued (step 12a3) is greater than the number of messages per region (step 12a1) multiplied by the number of message regions scheduled (step 12a2), there are enough messages enqueued on the SMB to schedule another PST. Go back to step 6.
3. Otherwise, go to step 13.
Report the problem.
At this point, regions are waiting, enqueued on subqueue 3 with transactions that can be scheduled. Report the problem to the IBM Support Center.

Intent conflict

You reach this point by having entries on subqueue 4 or 5.

An intent problem is indicated when the PST is on the intent queue.

Locate the PSTs that are on subqueue 4 or subqueue 5, or both.
The addresses under the field name SQPSTADD are the PST addresses. To analyze the INTENT CONFLICT fields in a PST, you must locate the PST in the unformatted section of the dump.
Is the PSB work pool too small?
1. If PST + X'B7A' (PSTCODE1) = X'06', the PST is on the PSB WAIT queue for pool space. The PSB work pool is too small. You must increase the size of the PSBW parameter in the DFSPBxxx member.
2. Otherwise, go to step 3.
Is the Data Management Block (DMB) pool too small?
1. If PST + X'B7A' (PSTCODE1) = X'20', the DMB pool is too small. You must increase the size of the DMB parameter in the DFSPBxxx member.
2. Otherwise, go to step 4.
Can intent be satisfied?
1. If PST + X'B7A' (PSTCODE1) = X'40', the intent cannot be satisfied. Go to step 6.
2. Otherwise, go to step 5.
Is the region scheduled?
1. If any PST has the following:
  - PST +X'B7A' (PSTCODE1) = X'10'(SCHEDULED)
  - and:
  - PST +X'1BC' (PSTTERM) = X'02'(ACTIVE)
  the region is scheduled, and this a normal WAIT for subqueue 4 and subqueue 5. Usually this is not a problem. Go back to the subqueue 6 entry of PST analysis, step 4 and continue.
2. Otherwise, go to step 7.
There is an intent conflict.
If you reach this point, there is an intent conflict. Usually, the intent conflict is caused by a PSB having the exclusive option. This option is defined during the PSBGEN. See the PSBGEN section of IMS Version 15.2 System Utilities. If the exclusive option did not cause the intent conflict, report the problem to the IBM Support Center.
Report the problem.
If you reach this point, the problem is that the last region to terminate should have posted the PST on subqueue 4 and subqueue 5 and did not. In a DBCTL environment, the last thread to unschedule a PSB did not post subqueue 4 or 5. Thus, there is a WAIT with a PST on subqueue 4 or subqueue 5 with no scheduled regions. Use subqueue 4 or subqueue 5 in your search argument, or report the problem to the IBM Support Center.

WAIT for input

You can reach this point only by having entries on subqueue 6.

Find the PSTs on subqueue 6.
The addresses under the field name SQPSTADD are the PST addresses. The PSTs are found in the stack of PSTs.
Find Scheduler Message Blocks (SMBs) for the PSTs.
For each PST enqueued on subqueue 6, find the related SMB: PST +X'C4' (PSTSMB) = address of the SMB.
Are any of the regions on subqueue 6 pseudo WFIs?
- If SMB+X'27' (SMBFLAG3) = X'08' (WFI transaction), the region is not a pseudo WFI.
- If the region is a pseudo WFI, check if the region is holding any resources needed by transactions waiting to be processed.
Are any messages enqueued on SMB?
There should be no messages enqueued on the SMB.
- SMB+X'1A' (SMBENQCT) minus SMB+X'18' (SMBDEQCT) = number of messages enqueued
  - If there are messages enqueued on the SMB, go to step 6.
  - If no messages are enqueued, go to step 5.
Are all regions accounted for?
Compare the count of regions enqueued on the subqueues with the count in SCDREGCT (SCD + X'C92') (2 bytes).
- If the counts are equal, all regions are accounted for, and the IMS regions are in a normal scheduling environment. The problem is not with scheduling.
- If not equal, other regions are active in IMS. Go to PST active.
Report the problem.
The problem is that IMS messages are enqueued on the SMB and wait-for-input (subqueue 6) is not posted. Report the problem to the IBM Support Center.

Loop

Use standard z/OS system diagnostic procedures for loops.

Using the RB found in step 6c of WAIT/LOOP procedure, determine the PSW address. The PSW address is labeled OPSW. The PSW address is always the second word following the label. This PSW address belongs to one of the modules involved in the loop.

You can use the z/OS system trace to examine entries for the ASID and TCB indicated in the Dispatch Work Area at step 5 of the WAIT/LOOP procedure. The PSW address in the system trace entries indicates the modules involved in the loop.

Locate the PSW addresses in the storage section of the dump and scan backward through the eye catchers on the right side of the dump until you find a module identifier.

The looping module might not be an IMS module. Sometimes, the addresses are in the Link Pack Area (LPA) or the nucleus and might require an LPA or nucleus map.

Create the search argument

You can use the following additional keywords in the search argument to narrow the search, but they might not be necessary.

Keyword: LOOP

At this point, you can be sure that you are in a loop situation. Therefore, LOOP is an appropriate keyword for the search argument.

Keyword: module names involved in the loop

The module names derived in the loop procedure above are also valid keywords.

Keyword: label in module

If it is a tight loop, labels from the assembly listing of the modules involved might be useful keywords.

Keyword: additional related keywords

External events can trigger loops. These events might be indicated by console messages or be related to a procedure that was being performed at the time the LOOP began.

Search argument example

Consider the following scenario:

IMS went into a loop.
The active modules indicated in the RB chain and the z/OS system trace table were DFSCFEI0 and DFSCFE00.
The loop began after the operator issued a /DISPLAY NODE command.

The broad search argument to use is:

5655J3800 LOOP DFSCFE00 | DFSCFEI0 | DISPLAY | NODE

For a structured database search, use this search argument:

PIDS/5655J3800 LOOP RIDS/DFSCFE00 | RIDS/DFSCFEI0 | PCSS/DIS | PCSS/NODE

With this search argument you might receive numerous hits, which will probably contain the APAR describing your problem. You can then take various combinations of the additional keywords that were compared with the OR operator in the above example and use the AND operator on them instead. You can use this technique to narrow the field of search until you find the appropriate APAR.

If the loop was not in an IMS module, do not use the IMS component ID, 5655J3800.

System wait

Use standard z/OS systems diagnostic procedures.

If the PSW address is for a system module, include that information when reporting the problem. You can use the module name in your search along with the WAIT keyword.

Shutdown processing

Use this analysis if the operator issued a /CHECKPOINT FREEZE, DUMPQ, or PURGE to IMS and IMS failed to come down normally. Before taking IMS out of the system, be sure to use a /DISPLAY SHUTDOWN STATUS command. Obtain the listing of the /DISPLAY command and any subsequent activity to find any unusual conditions that might have prevented an orderly termination of IMS.

You should also use this analysis if IMS shut itself down and failed to terminate normally. For example, when IMS runs low on message queue space, it shuts itself down.

Before starting this procedure, you need to obtain an IMS dump in order to examine bit settings. Be aware that if you received only the first part of the DFS994I message during shutdown processing, VTAM might be involved in the failure. (For a DBCTL environment, ignore any further instructions that refer to VTAM in this topic and in the next topic, Shutdown analysis (CHE FREEZE, DUMPQ, or PURGE).) If you received the DFS994I xxx (FREEZE, DUMPQ, PURGE), but not DFS994I IMS SHUTDOWN COMPLETED, be sure to obtain a dump of VTAM and IMS. Here are two ways to get a dump:

Enter the z/OS DUMP command to dump the VTAM address space and then modify IMS down with a dump.
Enter the z/OS DUMP command to dump the VTAM, IMS control, DL/I, and CCTL address spaces, and then modify IMS down without a dump.
Be sure to include the RGN option along with the other standard SDATA defaults in the DUMP command.

In the section Shutdown Analysis that follows, note the following:

Displacements and test conditions can change when maintenance is applied to a system.
The bit settings shown are cumulative. This means that they usually combine with any bits already set in the byte. Check the bit settings as described. If a bit was not set or reset as shown, include both the module name and the cumulative bit settings in each byte in your search argument.
SET turns the bit ON. RESET turns the bit OFF. Other bits in the byte might already be ON.
It is essential in using the following analysis to find out if the indicated bits were SET or RESET and to use only the DUMPQ/FREEZE or PURGE sections where applicable.
The Save Areas (SAs) might not always identify the last module to have control. In some cases, control is passed back to the initiating module (such as DFSCST00), and you can find no trace of any lower modules in the SAs.
The main control block in shutdown problem analysis is the system contents directory (SCD). This flow of control lists most of the modules involved. When you find a field that does not have the bits SET or RESET as indicated, stop the analysis and report the problem.
Be aware that defective code can produce results that appear to contradict this information.
The following analysis does not list every action that is taking place in IMS shutdown processing, but only activity that causes bit setting to be changed in key SCD fields.
Comments scattered throughout the analysis are for information only. For example, the statement, If input or output is pending, return to DFSICIO0 with RC=C to complete, is for information. Do not look at return codes, but examine only the bit settings.

Shutdown analysis (CHE FREEZE, DUMPQ, or PURGE)

Remember that in this analysis you will be looking at bit settings, not hexadecimal values.

These sections do not apply to DBCTL shutdown:

PURGE
DFSICL20
DFSICLX0
DFSICIO0
DFSIPCP0
DFSCPCP0
- DFSICL20
  - If PURGE, then set SCDCKCTL(X'C08') = X'34' and then set SCDSTOP1(X'C02') = X'80'
  - If not PURGE, then:
    - If DUMPQ, set SCDCKCTL(X'C08') = X'1C'
    - If FREEZE, set SCDCKCTL(X'C08') = X'14'
      - Reset POLL the lines and then (not applicable to DBCTL)
      - Set SCDSTOP1(X'C02') = X'C0' (for DBCTL, set AWE to TRM1)
- DFSICLX0
- DFSICIO0
- DFSIPCP0
  - If SCDCFLG1(X'AC7') = X'08', then
    - Set SCDCQFLG(X'AC8') = X'04' and
    - Set SCDCNXW4(X'ACF') = X'40'
  - If input or output is pending, return to DFSICIO0 with RC=C to complete.
  - When there is no input or output pending, or when the input or output is finished, then:
    - Set SCDCPCTL(X'AC4') = X'80'
    - Set AWE to TRM1
- DFSCST00
- DFSTRM00
For PURGE
- AWE = TRM1, First phase of termination
- If SCDIDCNT +1 (X'BC8') is not equal to X'000000' and SCDCKCTL(X'C08') = X'20' (PURGE):
  - Set SCDSTOP1(X'C02') = X'10'
  - Set SCDSTOP1(X'C02') = X'02'
- If SCDFTFLG(X'290') = X'20' (Fast Path active), DBFTERM0 posts the Fast Path regions for SHUTDOWN
- DFSTRM00
For DUMPQ or FREEZE
- If SCDIDCNT+1(X'BC8') is not equal to X'000000' and SCDCKCTL(X'C08') is not equal to X'20' (Not PURGE)
  - Set SCDSTOP1(X'C02') = X'04'
  - Set SCDSTOP1(X'C02') = X'02'
- If SCDFTFLG(X'290') = X'20' (Fast Path Active), DBFTERM0 posts the Fast Path regions for SHUTDOWN
For DUMPQ, PURGE, or FREEZE
- If Fast Path was active on return from DBFTERM0, or if Fast Path was not active, and SCDREGCT(X'C8A') is not equal to X'0000' (ACTIVE REGIONS), then post the PSTs waiting in the scheduler.
- If SCDSHFL1(X'3A4') = X'80' (IRLM in system) or SCDIDCNT+1(X'BC8'), or both, is not equal to X'000000' then return to DFSCST00 to wait for regions to end, If DBCTL, notify DRA before returning to DFSCST00.
- When or if SCDIDCNT+1(X'BC8') = X'000000' (REGIONS ENDED), set SCDSTOP1(X'C02') = X'01'.
For PURGE only
- If SCDCKCTL(X'C08') = X'20' (PURGE)
- Set SCDSTOP1(X'C02') = X'20'
- IWAIT for all output to go.
For DUMPQ, PURGE, or FREEZE
When all output is done for PURGE or FREEZE or DUMPQ, then:
- If SCDFTFLG(X'290') = X'20' (Fast Path active), DBFTERM1 closes the areas.
- If SCDFTFLG(X'290') is not equal to X'20' or when Fast Path areas are closed then:
  - If SCDSMMS1(X'033') = X'02' (DLI SAS), then:
    
    Tell the DL/I region to close the databases (DFSSDL40).
    
    IWAIT for the databases to close.
  - If not DLI/SAS, then let DFSDLOC0 close the databases.
Then when all databases and areas are closed: Set SCDSTOP1+1(X'C02') = X'04'.
- DFSCPCP0
  Set return code (RC) = 8 to ask DFSIPCP0 if communication is still going on.
- DFSIPCP0 (DFSIPCP2)
  - If no output or no messages on Q3, set return code (RC) = 0 to inform DFSCPCP0.
  - If output or messages on Q3, set return code (RC) = 4 to inform DFSCPCP0, which causes DFSCPCP0 to IWAIT.
- DFSCPCP0
  - If output is pending (RC = 4)
    
    Set SCDCPCTL(X'AC4') = X'08'
    
    Set SCDSTOP1(X'C02') = X'40'
    
    IWAIT for DC to finish.
  - If no output or when output finishes
    
    Set off SCDCPCTL(X'AC4') = X'08' (reset the bit)
    
    Set SCDSTOP1+1(X'C02') = X'08'
    
    Reset Poll all lines that are candidates for the SHUTDOWN message
    
    Set CTBFLAG3(0D) = X'10' (for all terminals that are to receive the shutdown message)
- DFSICLX0
- DFSICIO0
- DFSIPCP0
  - If any CTBFLAG3(0D) = X'10':
    
    Set CTBACTL(10) = X'20'
    
    Set CTBACTL(10) = X'10'
    
    RC = 8 to DFSICIO0 (send SHUTDOWN message)
  - If NO CTBFLAG3(0D) = X'10':
    
    Set SCDDFLGS(X'718') = X'80'
    
    Set SCDCPCTL(X'AC4') = X'20'
    
    RC = 4 to DFSICIO0 (quiesce lines)
- DFSICIO0
  - If RC = 4, idle the lines
  - If RC = 8, send DFS991 - IMS SHUTDOWN message
  - The WRITE interrupt from the SHUTDOWN message results in the following:
    
    Set off CTBFLAG5(0F) = X'80' (reset)
    
    Set off CTBFLAG3(0D) = X'10' (the)
    
    Set off CTBACTL (10) = X'30' (bits)
- DFSIPCP0
  When all line activity is stopped
- DFSCPCP0
- DFSTRM00
  - If DBCTL set SCDSTOP =SCDSTSNT, then set SCDSTOP1+1(X'C02') = X'01'
- DFSRCRT0
- DFSRCP00
  - Send DFS994I *CHKPT yyddd/hhmmss*ctype (first part of DFS994I message)
  - Set AWE = TRM2
  - Set off SCDCKCTL(X'C08') = X'04' (reset the bit)
- DFSTRM00
  Set SCDTRMFL(X'430') = X'40'
- DFSCST00
- DFSTRM00
  - If DLI/SAS SCDSMMS1(X'033') = X'02', pass AWE to DFSSDL40 to begin Normal Termination
  - If not DLI/SAS or when DFSSDL40 returns
  - If SCDRFPIN(X'C32') = X'80' (Fast Path errors):
    
    Print error message
    
    Set off SCDRFPIN(X'C32') = X'80' (reset the bit)
    
    Close queue data sets (not applicable to DBCTL)
    
    IWAIT for closing
    
    Set off SCDSTOP1(X'C02') = X'08' (reset the bit)
- DFSTERM0
  - Terminate DASD log
  - Set off SCDRECTL(X'146') = X'80' (reset the bit)
  - Terminate RDS
  - Terminate IMS system type tasks
  - Signoff DBRC
  - Quit IRLM
  - Close VTAM ACB (not applicable to DBCTL)
  - If DLI/SAS, SCDSMMS1(X'033') = X'02' and the ECB at SCDRSETF(X'D1C') is not equal to X'40' (posted) :
    
    IWAIT for the DL/I region to end
    
    Set AWE = TRM3
    
    Set SCDTRMFL(X'430') = X'20'
    
    Send DFS994I IMS SHUTDOWN COMPLETED (second part of DFS994I message)
- DFSTRM00
- DFSCST00

IRLM procedure

WAIT states can be encountered during IRLM processing in four areas:

Deadlock involving non-IRLM resources
Deadlock involving only IRLM resources
Lock request not granted because holder did not release lock
IRLM latch unavailable

Deadlock involving non-IRLM resources

Failure Description

Application programs waiting for non-IRLM resources and holding IRLM resources are waiting for other applications also holding IRLM resources. The IRLM cannot detect deadlocks involving non-IRLM resources.

Detection

Use the IMS WAIT diagnostic procedures to discover the non-IRLM resources being waited for. Follow the RLB chains representing resources held or requested for each requesting work unit (WHB) to discover the IRLM resources being waited for. If the wait state occurred as a result of an IRLM error, the function/subfunction is IRLM/DEADLK.

An example of a search argument is:

569516401 AR101 WAIT IRLM IRLM/DEADLK

For a structured database search, use this search argument:

PIDS/569516401 LVLS/101 WAIT RIDS/IRLM RIDS/DEADLK

Deadlock involving only IRLM resources

Failure Description

Application programs are deadlocked for IRLM resources. If all the application programs are waiting for IRLM resources (there are no application programs running which could release the locks that the other application programs are waiting for), this is a deadlock. The IRLM should detect this condition and post one of the waiters as unable to obtain the lock because of a deadlock.

Detection

Follow the RLB chains representing resources held or requested for each requesting work unit (WHB) to discover the IRLM resources being waited for. If the wait state occurred as a result of an IRLM error, the function/subfunction is IRLM/DEADLK.

An example of a search argument is:

569516401 AR101 WAIT IRLM IRLM/DEADLK

For structured database search, use this search argument:

PIDS/569516401 LVLS/101 WAIT RIDS/IRLM RIDS/DEADLK

Lock request not granted because holder did not release lock

Failure Description

An application program requested a lock, but the request was not granted because the holder of the resource did not release it. This does not result in a deadlock. However, If the requester is not timed out, its task and any others waiting after it might enter a wait state.

An example of a search argument is:

569516401 AR101 WAIT IRLM

For structured database search, use this search argument:

PIDS/569516401 LVLS/101 WAIT RIDS/IRLM

IRLM latch unavailable

Failure Description: An error in IRLM processing can result in an IRLM latch being permanently unavailable. If this condition exists, no new IRLM requests can be processed.
If this error occurs, call the IBM Support Center for help in diagnosing the problem. The support representative will tell you what type of documentation to gather.