IBM Support

PM82843: DBRC PRA EXCESSIVE RETRY MITIGATION ENHANCEMENT

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as new function.

Error description

  • When running with Parallel RECON Access, RLS timeout or
    deadlocks can occur and repeat seemingly endlessly
    when lock promotions are needed to update a RECON record. This
    can severely impact IMS and utility processing.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of IMS Version 13 with Parallel    *
    *                 RECON Access enabled.                        *
    ****************************************************************
    * PROBLEM DESCRIPTION: Excessive RECON contention can occur    *
    *                      when a large number of DBRC requests    *
    *                      doing similar actions are processed in  *
    *                      parallel. This can result in excessive  *
    *                      MSGDSP1184W messages and elongate the   *
    *                      processing time.                        *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    When PRA is enabled, the current DBRC logic will issue a VSAM
    GET in a manner that it gets a shared lock. If it happens to
    need to update the record, it will do a subsequent GET for
    update which gets an exclusive lock. It then writes the changed
    record.
    If several DBRC instances happen to make similar requests in
    this manner and are trying to update the same record, deadlocks
    or timeouts can  occur since each has a shared lock and each is
    trying to then get an exclusive lock for update.  At this point
    the victims will retry their processing.  If the  number
    involved is large enough and the timing just right, this
    could result in endless (or seemingly endless) retries.
    
    This SPE detects and attempts to mitigate this be altering
    the first request from a GET NUP,CRE to a GET UPD so that
    an initial attempt is for an exclusive lock.
    

Problem conclusion

Temporary fix

Comments

  • These type of deadlocks can be eliminated if DBRC always did the
    initial GET with update intent to get an exclusive lock. In this
    case other DBRCs would wait behind the exclusive lock.  But
    getting an exclusive lock when a record will not  get updated
    can cause more contention and extra TVS logging, so getting all
    records exclusive would be detrimental. Also, DBRC request logic
    does not always know if a  given record will get updated at the
    time it does its  initial get.
    
    To handle this, the proposal is to add logic to track what
    records get updated by a given request, but only  if the
    request has hit a retryable error such as a timeout or deadlock.
    If the request hits a retry-able error a second time the logic
    will then get an EX lock for any record that  was updated or was
    attempted to be updated on any prior iteration that hit a
    retryable error.
    
    As an example.  Lets say a request makes the following
    calls:
    GET RECA, GET RECB GET RECC GET RECD UPD RECB UPD RECD
    On the first attempt, a timeout occurs updating RECD,
    on the second attempt, a timeout occurs updating RECB
    
    The VSAM requests would be
    GET NUP CRE RECA
    GET NUP CRE RECB
    GET NUP CRE RECC
    GET NUP CRE RECD
    GET UPD RECB
    PUT RECB
    GET UPD RECD ---timeout -> retry
    ---
    GET NUP CRE RECA
    GET NUP CRE RECB
    GET NUP CRE RECC
    GET NUP CRE RECD
    GET UPD RECB (save fact that RECB update attempted)
       ---timeout -> retry
    
    GET NUP CRE RECA
    GET UPD RECB   <since we know it will get updated>
    GET NUP CRE RECC
    GET NUP CRE RECD
    GET UPD RECB
    PUT RECB
    GET UPD RECD
    PUT RECD
    
    
    The concept is to limit overhead until we know we are hitting
    contention repeatedly at which point the extra tracking of
    updates and higher level lock will be done on subsequent GETs
    for that record until the request works. Note this will not
    prevent all deadlocks.  Also, because some DBRC locate
    processing does a GET KGE or GET BWD to find a record greater
    or less than a user specified value, it cannot know ahead of
    time what that record key will be that gets returned to check if
    we will eventually update.  In this case we will continue to get
    the shared lock during the locate.
    
    Code changes:
    DSPURI00:
     -In DSPURILC for a direct locate of record w/o a timestamp:
        If we are retrying a request, check if we attempted to
        update the record in a prior attempt, and if so set the RPL
        options to UPD instead of NUP.  Direct locates of records
        with a timestamp in the key, the routine GETWithCRE will be
        used.
     -The only other locate type that we can check for update is a
      CLASS FIRST of type locate as that also calls GETwithCRE.
     -In DSPURICH: Add call to AddKeytoList to track the update
      of the passed record if the request is being retried
     -In DSPURIDL: Add call to AddKeytoList to track the update
      of the passed record if the request is being retried
    
     - Add new routine: AddKeytoList - This routine will create
       the initial hash table if one does not exists.  It then
       checks if the key is already in the table.  If not, it adds
       it to a table that holds all the updated keys. The hash table
       entry it updated to point to the new entry, or the last
       synonym is updated to point to the new entry.
       Note: if we are unable to get storage for more keys, the
       table will be marked full.
    
     - Add new routine: EntryExists - This routine takes a key and
       checks if it is already in the hash table tracking records
       that were updated.  If the table is marked full, ALL
       records will be considered as in the table.
    
    
    DSPBRQ00:  Call to clear hash table routine to clear the entries
      in the update tracking table if the request completed without
      the need for retry.
    
    DSPCRTR0:  Call to clear hash table routine to clear the entries
      in the update tracking table if the request completed without
      the need for retry.
    
    DSPCRTR1: Add subroutine ClearUpdHT which just clears all
      entries in the update hash table
    
    DSPDSS01: Add code to release the storage used for the key
      update hash table
    
    DSPPRAB: Add prab_retry_updhtbl and prab_retry_updrqst
    
    DSPURX00: Clear some PRAB_RETRY fields when request will not
      be retried. Add call to ClearUpdHT if the command being
      processed does not need to be retried.
      Add ClearUpdHT routine to clear the hash table entries.
    
    DSPEF06F: Recompile for DSPPRAB change.
    DSPEF00F: Recompile for DSPPRAB change.
    DSPEF02F: Recompile for DSPPRAB change.
    DSPEF0AF: Recompile for DSPPRAB change.
    

APAR Information

  • APAR number

    PM82843

  • Reported component name

    IMS V13

  • Reported component ID

    5635A0400

  • Reported release

    300

  • Status

    CLOSED UR1

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2013-02-14

  • Closed date

    2013-03-26

  • Last modified date

    2013-10-04

  • APAR is sysrouted FROM one or more of the following:

    PM82842

  • APAR is sysrouted TO one or more of the following:

    UK92905

Modules/Macros

  •    DSPBRQ00 DSPCRTR0 DSPDSS01 DSPEF0AF DSPEF00F
    DSPEF02F DSPEF06F DSPURI00 DSPURX00
    

Fix information

  • Fixed component name

    IMS V13

  • Fixed component ID

    5635A0400

Applicable component levels

  • R300 PSY UK92905

       UP13/03/30 P F303

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"300","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
14 December 2020