A fix is available
APAR status
Closed as new function.
Error description
When running with Parallel RECON Access, RLS timeout or deadlocks can occur and repeat seemingly endlessly when lock promotions are needed to update a RECON record. This can severely impact IMS and utility processing.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IMS Version 13 with Parallel * * RECON Access enabled. * **************************************************************** * PROBLEM DESCRIPTION: Excessive RECON contention can occur * * when a large number of DBRC requests * * doing similar actions are processed in * * parallel. This can result in excessive * * MSGDSP1184W messages and elongate the * * processing time. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** When PRA is enabled, the current DBRC logic will issue a VSAM GET in a manner that it gets a shared lock. If it happens to need to update the record, it will do a subsequent GET for update which gets an exclusive lock. It then writes the changed record. If several DBRC instances happen to make similar requests in this manner and are trying to update the same record, deadlocks or timeouts can occur since each has a shared lock and each is trying to then get an exclusive lock for update. At this point the victims will retry their processing. If the number involved is large enough and the timing just right, this could result in endless (or seemingly endless) retries. This SPE detects and attempts to mitigate this be altering the first request from a GET NUP,CRE to a GET UPD so that an initial attempt is for an exclusive lock.
Problem conclusion
Temporary fix
Comments
These type of deadlocks can be eliminated if DBRC always did the initial GET with update intent to get an exclusive lock. In this case other DBRCs would wait behind the exclusive lock. But getting an exclusive lock when a record will not get updated can cause more contention and extra TVS logging, so getting all records exclusive would be detrimental. Also, DBRC request logic does not always know if a given record will get updated at the time it does its initial get. To handle this, the proposal is to add logic to track what records get updated by a given request, but only if the request has hit a retryable error such as a timeout or deadlock. If the request hits a retry-able error a second time the logic will then get an EX lock for any record that was updated or was attempted to be updated on any prior iteration that hit a retryable error. As an example. Lets say a request makes the following calls: GET RECA, GET RECB GET RECC GET RECD UPD RECB UPD RECD On the first attempt, a timeout occurs updating RECD, on the second attempt, a timeout occurs updating RECB The VSAM requests would be GET NUP CRE RECA GET NUP CRE RECB GET NUP CRE RECC GET NUP CRE RECD GET UPD RECB PUT RECB GET UPD RECD ---timeout -> retry --- GET NUP CRE RECA GET NUP CRE RECB GET NUP CRE RECC GET NUP CRE RECD GET UPD RECB (save fact that RECB update attempted) ---timeout -> retry GET NUP CRE RECA GET UPD RECB <since we know it will get updated> GET NUP CRE RECC GET NUP CRE RECD GET UPD RECB PUT RECB GET UPD RECD PUT RECD The concept is to limit overhead until we know we are hitting contention repeatedly at which point the extra tracking of updates and higher level lock will be done on subsequent GETs for that record until the request works. Note this will not prevent all deadlocks. Also, because some DBRC locate processing does a GET KGE or GET BWD to find a record greater or less than a user specified value, it cannot know ahead of time what that record key will be that gets returned to check if we will eventually update. In this case we will continue to get the shared lock during the locate. Code changes: DSPURI00: -In DSPURILC for a direct locate of record w/o a timestamp: If we are retrying a request, check if we attempted to update the record in a prior attempt, and if so set the RPL options to UPD instead of NUP. Direct locates of records with a timestamp in the key, the routine GETWithCRE will be used. -The only other locate type that we can check for update is a CLASS FIRST of type locate as that also calls GETwithCRE. -In DSPURICH: Add call to AddKeytoList to track the update of the passed record if the request is being retried -In DSPURIDL: Add call to AddKeytoList to track the update of the passed record if the request is being retried - Add new routine: AddKeytoList - This routine will create the initial hash table if one does not exists. It then checks if the key is already in the table. If not, it adds it to a table that holds all the updated keys. The hash table entry it updated to point to the new entry, or the last synonym is updated to point to the new entry. Note: if we are unable to get storage for more keys, the table will be marked full. - Add new routine: EntryExists - This routine takes a key and checks if it is already in the hash table tracking records that were updated. If the table is marked full, ALL records will be considered as in the table. DSPBRQ00: Call to clear hash table routine to clear the entries in the update tracking table if the request completed without the need for retry. DSPCRTR0: Call to clear hash table routine to clear the entries in the update tracking table if the request completed without the need for retry. DSPCRTR1: Add subroutine ClearUpdHT which just clears all entries in the update hash table DSPDSS01: Add code to release the storage used for the key update hash table DSPPRAB: Add prab_retry_updhtbl and prab_retry_updrqst DSPURX00: Clear some PRAB_RETRY fields when request will not be retried. Add call to ClearUpdHT if the command being processed does not need to be retried. Add ClearUpdHT routine to clear the hash table entries. DSPEF06F: Recompile for DSPPRAB change. DSPEF00F: Recompile for DSPPRAB change. DSPEF02F: Recompile for DSPPRAB change. DSPEF0AF: Recompile for DSPPRAB change.
APAR Information
APAR number
PM82843
Reported component name
IMS V13
Reported component ID
5635A0400
Reported release
300
Status
CLOSED UR1
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2013-02-14
Closed date
2013-03-26
Last modified date
2013-10-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK92905
Modules/Macros
DSPBRQ00 DSPCRTR0 DSPDSS01 DSPEF0AF DSPEF00F DSPEF02F DSPEF06F DSPURI00 DSPURX00
Fix information
Fixed component name
IMS V13
Fixed component ID
5635A0400
Applicable component levels
R300 PSY UK92905
UP13/03/30 P F303
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"300","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
14 December 2020