IBM Support

II14597: INFO APAR : VSAMRLS DOCUMENTATION COLLECTION GUIDE AND COMMON RECOVERY TIPS

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • INTRAN

Error description

  • The purpose of this INFO APAR is to document the procedure
    to follow and documentation to gather if SMSVSAM HANGs. It will
    also document basic recovery procedures for most types of hangs
    or other RLS problems.
    
    The basic procedure for diagnosing RLS problems is as follows:
      1. Issue diagnostic commands to understand scope of problem
      2. Dump affected systems and regions
      3. Restart affected regions
    
    We will go over each item in detail below:
    
    -------------------------------
    1. DIAGNOSTIC COMMANDS
    -------------------------------
    
    When a hang involving RLS is encountered, the first and best
    action is to issue the following diagnostic commands.
    .
     - D GRS,C                - displays ENQ contention
     - D SMS,SMSVSAM,DIAG(C)  - displays RLS latch contention
     - D SMS,SMSVSAM,QUIESCE  - shows any outstanding quiesces
     - D SMS,CFLS             - displays lock structure information
     - D XCF,STR,STRNM=IGWLOCK00 - another display of the lock str
    .
    The first three commands above are the most important. Each of
    these commands will identify specific information about which
    system is not moving.
    
    D GRS,C COMMAND
    ---------------------------
    There are wide array of RLS ENQs, but the main ones to watch are
    majorname SYSVSAM and SYSZIGW3. The D GRS,C command will show
    which system and TCBs are holding these ENQs, which can help us
    narrow down which system is at fault. Take note from these
    displays if there are any common systems/tcbs doing the holding.
    Note that RLS ENQs are plex-wide in scope.
    
    D SMS,SMSVSAM,DIAG(C) COMMAND
    -------------------------------
    The DIAG(C) command reports only on the system on which is was
    issued, so this command needs to be run on each system. You can
    also use RO *ALL,D SMS,SMSVSAM,DIAG(C) to simplify the process.
    The report will list all of the LATCHES within RLS that have
    contention. Take note of the TCBs and compare those with the
    TCBs holding the ENQs from the GRS display.
    
    At this point, you may notice a deadlock between two or three
    TCBs holding latches and/or ENQs. This informaiton can be used
    to search the APAR database for potential fixes. Also, if a
    deadlock is found, the only solution is to restart RLS (see
    '#3 - recovery' below).
    
    D SMS,SMSVSAM,QUIESCE COMMAND
    ------------------------------------
    The QUIESCE command is very important for helping identify if
    RLS is waiting on a registered CICS region. Note that this
    command is also system-specific, so it needs to be run on each
    system, or routed, as well.
    
    The output of the display will list the data set name and
    quiesce event, and then all of the registered CICS regions and
    their current status. Watch for any regions that have not
    responded in the same timeframe as the others, or in a timely
    manner. These are the problem CICS regions. Note these regions
    for the DUMP stage.
    
    Here is an example of the display:
    
    IGW540I 09.36.44 DISPLAY SMS,SMSVSAM,QUIESCE
    SPHERE NAME: MY.DATA.SET
    SYSTEM NAME: ST8      START TIME: 09.33.50 TOTAL ELAPSE TIME:
    00.02.54
    
    PARTICIPATING SUB-SYSTEM STATUS: SCHEDULED: COMPLETED: ELAPSE:
     SUB-SYSTEM NAME:  ........       09.33.50   09.33.51  00.00.00
     SUB-SYSTEM NAME:  CICS1          09.33.50   09.33.50  00.00.00
     SUB-SYSTEM NAME:  CICS2          09.33.50   00.00.00  02.54.00
     SUB-SYSTEM NAME:  CICS3          09.33.50   09.33.50  00.00.00
    
    Note the ELAPSE time above is 02.54.00. This indicates that the
    CICS region CICS2 has not responded in adequate time. Track and
    dump any erroneous regions such as this one.
    
    -------------------------------
    2. DUMP AFFECTED REGIONS
    -------------------------------
    
    Using the commands above, you should have a list of SYSTEMS and
    REGIONS which are involved. Now we need to dump SMSVSAM and any
    affected CICS regiosn on the involved systems for later review.
    For simplicity and completeness, capturing RLS on all systems of
    the plex is ideal.
    
    On each system, dump SMSVSAM and any affected CICS regions,
    making sure to include the DATASPACEs for RLS, and SDATA parms
    GRSQ and XESDATA. Here is an example command to dump RLS and a
    CICS region on one system:
    
     DUMP COMM=(some meaningful dump title)
       R xx,JOBNAME=(SMSVSAM,XCFAS,CICS1),CONT
       R yy,DSPNAME=('SMSVSAM'.*,'XCFAS'.*),CONT
       R nn,SDATA=(PSA,NUC,SQA,LSQA,SUM,RGN,GRSQ,LPA,
                   TRT,CSA,XESDATA),END
    .
    You can also use the same command to dump SMSVSAM around the
    plex. Include 'CICS*' in the REMOTE SYSLIST to capture
    similarly-named CICS regiosn around the plex.
    
     DUMP COMM=(some meaningful dump title)
       R xx,JOBNAME=(SMSVSAM,XCFAS),CONT
       R yy,DSPNAME=('SMSVSAM'.*,'XCFAS'.*),CONT
       R nn,SDATA=(PSA,NUC,SQA,LSQA,SUM,RGN,GRSQ,LPA,
                   TRT,CSA,XESDATA),CONT
       R zz,REMOTE=(SYSLIST=(*('SMSVSAM')),DSPNAME,SDATA),END
    
    You can also simplify the process by including and entry
    similar to this example in your IEADMCxx PARMLIB member:
    
      JOBNAME=(*MASTER*,SMSVSAM),DSPNAME=('SMSVSAM'.*),
      SDATA=(COUPLE,PSA,NUC,SQA,LSQA,SUM,RGN,GRSQ,LPA,TRT,
      CSA,XESDATA),REMOTE=(SYSLIST=(*('SMSVSAM')),DSPNAME,SDATA)
    
    Then issuing DUMP COMM=(title),PARMLIB=xx will dump RLS all
    around the plex.
    
    Please see the z/OS MVS System Commands manual for more
    information on dumping (book SA22-7627).
    
    In addition to the dumps, it is also very helpful to have:
      - SYSLOG from all systems involved
      - LOGREC for some time prior to the error
      - JOBLOGs, if available, of affected regions
    
    -------------------------------
    3. RESTART AFFECTED REGIONS
    -------------------------------
    
    To decide which regions to restart, follow this flow. After each
    step, confirm that RLS is still hung. If so, reissue the
    commands from step 1 to see if the nature of the problem has
    shifted. Then, revisit the step below.
    
     1. Restart any CICS regions that are holding up QUIESCE
        requests (from the D SMS,SMSVSAM,QUIESCE output)
     2. Restart SMSVSAM on any system that showed LATCH contention
     3. Restart SMSVSAM on any system that was holding an ENQ
     4. Restart SMSVSAM on all systems, one at a time.
    
    Note that it may not be necessary to do all steps. Depending on
    the hang, SMSVSAM in the plex may free up after any step above.
    
    The command to restart SMSVSAM is
      - V SMS,SMSVSAM,TERMINATESERVER  .. watch for IGW408I
      - V SMS,SMSVSAM,ACTIVE           .. watch for IGW414I
    .
    In rare cases where the TERMINATESERVER fails, use the
      - FORCE SMSVSAM,ARM
    To restart SMSVSAM. Note that this is reserved as a last resort
    and not recommended in most cases.
    .
    Please contact your support representatives if the above does
    not solve the problem, and send the documentation for review.
    
    Additional keywords: RLS SMSVSAM VSAMRLS QUIESCE UNQUIESCE
    DOCUMENTATION DUMP DUMPS DOC DEBUG DIAGNOSIS DIAG LATCH ENQ CONT
    CONTENTION HANG DEADLOCK HOLD
    
    Related INFO APARs:
    II12927 - DOC COLLECTION GUIDELINES FOR SEVERAL COMPONENTS
    II12603 - RLS INIT AND RECOVERY BASICS
    II14171 - GETTING STARTED WITH VSAMRLS
    

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    II14597

  • Reported component name

    V2 LIB INFO ITE

  • Reported component ID

    INFOV2LIB

  • Reported release

    001

  • Status

    INTRAN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-09-29

  • Closed date

  • Last modified date

    2012-12-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19N","label":"APARs - OS\/390 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSSN3L","label":"z\/OS Communications Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"001","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
19 December 2012