Case 10: Correcting errors within the common recall queue

The following indicate the corruption of the common recall queue (CRQ):
  • DFSMShsm issues message ARC1506E.
  • DFSMShsm issues message ARC1187E.
  • DFSMShsm does not select recall requests for processing.

    For this last circumstance, issue the QUERY ACTIVE command and examine message ARC1541I. This message displays the factors that affect the selection of requests from the CRQ. A status of anything other than CONNECTED or a hold level of anything other than NONE is a probable reason why certain recall requests are not selected.

Note: An unexpected loss in connectivity to the CRQ may also introduce errors, but in most cases DFSMShsm will be able to automatically correct those errors. When this occurs, you may see the multiple issuance of message ARC1102I. DFSMShsm issues message ARC1102I when it attempts to recall a data set that was previously recalled. Occurrences of this message at the time of a loss in connectivity is normal and not a problem with the CRQ list structure.

If you believe that the CRQ has become corrupted, issue the AUDIT COMMONQUEUE(RECALL) FIX command. When DFSMShsm issues this command, it scans the entries in the CRQ and attempts to correct logical inconsistencies within the structure. Note that the audit function cannot correct all types of errors and may report zero errors, even though there are errors in the structure.

If messages ARC1506E and ARC1187E recur, or if certain data sets are not selected for recall processing after running the audit function, then perform the following procedure:

  1. Capture a dump of the CRQ structure.
    1. As a general rule for dumps, include COUPLE and XESDATA information in SDUMPs.
      To display the current options, issue the following command:
      D D,O
      If DFSMShsm does not list COUPLE or XESDATA as options for SDUMP, then issue the following command:
      CD SET,SDUMP=(COUPLE,XESDATA)
    2. Issue this dump command:
      DUMP COMM=(CRQ LIST STRUCTURE)

      See: * id IEE094D SPECIFY OPERAND(S) FOR DUMP COMMAND

    3. Issue the following command:
      R id,STRLIST=(STRNAME=SYSARC_basename_RCL,(LNUM=ALL,ADJ=CAP,EDATA=UNSER),CONT
      where basename is the base name of the CRQ structure that the following command specified:
      SETSYS COMMONQUEUE(RECALL(CONNECT(basename)))

      See: * id2 IEE094D SPECIFY OPERAND(S) FOR DUMP COMMAND

    4. Issue the following command:
      R id2,LOCKE,(EMC=ALL),ACC=NOLIM),END
  2. Reallocate the structure to remove the errors.
    1. Issue the SETSYS COMMONQUEUE(RECALL(DISC)) command on all DFSMShsm hosts that are connected to the CRQ structure. This will move all recall requests from the CRQ back to the local DFSMShsm hosts for processing. Each host will issue message ARC1502I after it disconnects from the structure.
      The errors within the CRQ may prevent a particular DFSMShsm host from disconnecting. If this is the case, DFSMShsm cannot perform the remainder of the steps until that DFSMShsm host shuts down.
      Recommendation: Discontinue using the CRQ until the DFSMShsm host can be shut down with minimal impact on other DFSMShsm functions. At this point, no further action is required to prevent DFSMShsm from attempting to place new requests onto the CRQ.
    2. Once all DFSMShsm hosts disconnect from the common recall structure, delete the structure.
      To determine if all DFSMShsm hosts have disconnected, issue the following command:
      D XCF,STR,STRNAME=SYSARC_basename_RCL

      The number of connections reported should be zero.

      To delete the structure, issue the following command:
      SETXCF FORCE,STR,STRNAME=SYSARC_basename_RCL
    3. Issue the following command on each DFSMShsm host that was previously connected:
      SETSYS COMMONQUEUE(RECALL(CONNECT(basename)))
      This command will cause each host to automatically move all of its local recall requests back onto the CRQ and to reallocate the structure.
  3. Capture the Problem Determination Aid (PDA) data.

    See z/OS DFSMShsm Diagnosis for information on how to perform this action. The collected PDA includes the data from the time that a problem first occurred, through the time that the audit was performed.

  4. Contact IBM® Support to report the problem.