A fix is available
APAR status
Closed as program error.
Error description
One IMS batch job running the BACKUP.RECON command, which triggered the RECON quiesce processes in IMSPLEX environment. Right after the quiesce processes completed with MSGDSP1133I, another Image Copy job failed to OPEN the RECON by msgDSP0002I. DSP0002I UNABLE TO OPEN RECON1 DATA SET DSP0002I VSAM RETURN CODE=08 ERROR CODE=000
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All V11 IMS DBRC & BPE PRA users will be * * affected. * **************************************************************** * PROBLEM DESCRIPTION: RECON open failure RC=08 RSN=000 for * * one of the DB Image Copy jobs that led * * to the UABEND3312 failure for that * * job. This occurred when this job was * * started while a BACKUP.RECON job was * * being run. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** The initial problem encountered by the customer was a RECON open failure with RC=08 RSN=000 for one of the DB Image Copy jobs that led to the job failure with ABENDU3312. This problem occurs when this IC job is starting while a BACKUP.RECON job is running. The problem is due to the fact that during initialization, DBRC will wait to be notified for not quiesced before proceeding with RECON open processing. In this case, the RECONs were quiesced. While waiting, the new DBRC gets an end quiesce notification and processes it. Part of end quiesce processing opens any closed RECONs. Once complete, the initialization continues processing and tries to OPEN the RECONs again which result in a failure. While fixing this, we ran a stress test to emulate the customer processing where a large number of IC jobs are starting while a RECON quiesce is being issued. For the test, BACKUP.RECON was done as well as enabling and disabling parallel RECON access. This uncovered other QUIESCE related issues. Case 1: Get a NOT QUIESCED response even when quiesced. This problem occurs when SCI messages/notifications are received or processed in an order other than the order they event occurred. This can happen as order is not guaranteed, especially when multiple LPARs are involved. In this case, JOB1 and JOB2 are running when JOB1 sends a quiesce request. At the same time JOB3 starts and waits for a not quiesced notification. JOB2 gets the notification that JOB3 exists before the quiesced notification and sends JOB3 a not quiesced message allowing JOB3 to continue its RECON OPEN processing. JOB2 then processes the quiesce request from JOB1 and acknowledges it. JOB1 gets the quiesce acknowledgment before the 'new DBRC' notification for JOB3. So JOB1 considers the quiesce acknowledged by all and proceeds as if it owns the RECONs while JOB3 still has them opened. Case 2: Delete of other DBRC results in implicit not-quiesced response to a new DBRC even when RECONs are still quiesced. If a new DBRC receives a notification that another DBRC is gone, it will be treated as an implicit not quiesced notification. If there are no DBRCs that were started before the quiesce, the assumption is that any DBRC started during the quiesce would also be waiting on a not-quiesced. If everyone started prior to this is gone, no quiesce could be in progress. The problem is that the logic in DSPRLI00 sets all pre-existing DBRCs as 'STARTED DURING QUIESCE'. So, if a quiesce is in progress when JOB3 starts, and JOB3 gets a notification that a DBRC is gone, it will be treated as an implicit not quiesced whether it was the quiescer who failed or any other DBRC. Case 3: ABENDU2480 will occur when there are more logical closes than opens. When a RECON loss notification is received, DBRC does a logical open and then the logical close call. If logical open fails for some reason while logical close is still being called, then it will end up with the ABENDU2480 for the reason that more logical closes than opens.
Problem conclusion
GEN: KEYWORDS: *** END IMS KEYWORDS *** For the initial problem, code is added to DSPDSS10 to set an indication that DBRC is in the middle of SCI registration during physical open. Group services logic is changed to no longer invoke dsprclos quiesce or dspropn end-quiesce if this indication is set as they are unnecessary. Case 1: This problem requires a method for serializing start-up with quiesce. During quiesce processing, a SYSTEMS level enqueue with major name DSPURI03 and minor name equal to the plexname and group ID (PLEXGRP in note below) will be done to serialize the process. When getting a quiesce, DBRC will obtain an exclusive ENQ on DSPURI03.PLEXGRP after getting acknowlegements from all DBRCs that it knows about. Also, just prior to registering with SCI, a new DBRC will get a shared ENQ on DSPURI03.PLEXGRP. This will be released after physical OPEN processing completes. This ensures new DBRCs will wait prior to opening RECONs if a quiesce is being done. It also ensures (for all intents/ purposes) that a new DBRC will be known to SCI everywhere by the time the quiescer gets an exclusive lock. To allow for quiesce race, the quiescer issues a quiesce and waits for responses from all DBRCs. This will handle any race condition as it does today. Note that new DBRCs will not be allowed to do a quiesce if they hold the shared enqueue. Once everyone responded, the quiesce winner will get an EXclusive ENQ. This will wait for any new jobs it did not know about to complete open processing and release the SH ENQ. Once the ENQ is obtained, a query will be done to see if there are any DBRCs that it did not know about, and if so, these will be sent a quiesce notification and an acknowledgement will be waited on. Because there is a timing issue that could lead to a deadlock if the exclusive ENQ just waits, DBRC needs to be allowed to process other SCI notifications while waiting for the exclusive ENQ. A separate TCB will be attached to do the EX ENQ and eventual DEQ. This code will request an exclusive ENQ. Once it has it, it will enqueue a new brlsb (brlbf2EN) on DSPRLN_RQ which will get dequeued by DSPRLN00 and processed by DSPRLX10. In a BPE environment a new AWE would be created and enqueued to DSPBGS00 for processing. The TCB code then will wait on its ECB before proceeding. When the new brlsb (or AWE) is processed, a CSLSCQRY will be done to check for new DBRCs (e.g. the ones that were holding the shared ENQ we did not know about). If any exists, the quiescer will wait. It should get a notification that DBRC started (ready) and will waiting for a response. Once all have responded, the ENQ TCB's ECB will get posted to DEQ the global resource and RLX10 will WAIT on that TCB's term ECB to ensure it is done before proceed. Case 2: To fix this issue, existing DBRCs will no longer be considered as 'started during quiesce', but will be considered as started before a quiesce. To account for the situation where all new DBRCs started after a job failed while quiescing, any DBRC that determines none_started_before the quiesce is an implicit 'not_quiesced' will broadcast not_quiesced to the other DBRCs. Note that it only needs one DBRC to see none_started_before the quiesce (other than the one that failed) while the others will think that all of them started before the quiesce and will continue to wait so the one that detects it must do the broadcast. Case 3: To prevent more logical closes than opens, DSPRCLOS will not be called if DSPROPEN fails when processing RECON loss notifications. ---------------------------------------------------------------- THE FOLLOWING MODULES WERE CHANGED: DFSBRLSB - Add BRLBF2GE for grp services related enqueue DSPBCODE - Add AWGS_GOTENQ (message that we got exclusive enqueue) AWGS_RELENQ (request to release shared ENQ) DSPBDS00 - Fix the problem with too many closes than opens DSPBGS00 - Add code to track DBRCs started during quiesce and handle implicit 'not quiesced' - if Quiesce acknowledgement message has no DSN, ignore DSN list. - do not invoke DSS if inOpen - In AWGS_QUACK and AWGS_DBRCDOWN processing add logic to request an exclusive ENQ if all known DBRCs have responded to a quiesce request. If the EXclusive ENQ is already held when we detect all have responded, then releases the ENQ. - add new AWGS_GOTENQ support. - Add new AWGS_RELENQ support - If we get a quiesce request while waiting for not_quiesced, release the shared enqueue if we hold it. - If all DBRCs we knew about when started go down and we are waiting on not_quiesced, treat as implicit end quiesce or not quiesced and POST Group services INIT waiter. - Add routine, found_new_dbrc, to issue CSLSCQRY to determine if any new DBRCs exist. - Add routine, GetENQForQuiesce, to attach a TCB, DSPRLNQ0 to request an exclusive ENQ. - Add routine ReleaseENQ to DEQ the shared ENQ or POST DSPRLNQ0 to DEQ the exclusive ENQ - Add code in subtract_a_dbrc to detect if any DBRC started before the quiesce. - Add routine, Reset_flags, to reset indication that DBRC was started during a quiesce. DSPBGS10: - Set DSPDGML_started_during_quiesce based on whether DSPRLN_quiesced is set or not. This prevents hang during quiesce. DSPBGS20: - Get Shared ENQ on new resource before registering with SCI - Add routine, GetENQForQuiesce, to get shared ENQ. DSPBGS50 - Add support to broadcast a not_quiesced message DSPBIN20 - Add code to set entry point for subtask DSPRLNQ0 DSPCRTR0 - Add code to set entry point for subtask DSPRLNQ0 DSPDCLMD - Add DSPRLNQ0 and DSPETXR0 to DSPCINT0 DSPDSS01 - do not free deferred irc if quiesce race in BPE - make sure propogate quiesce reach when deferred end quiesce DSPDSS10 - indicate in open - prevent quiesce if still own shared ENQ on new resource - invoke DSPGRPSV to DEQ shared ENQ DSPEF040 - add support for missing brlbf2 functions (check for more) DSPGDB - add GDBRLIinOpen - Remove gdb_quiend_state - see if can manage in DSPRLN only! DSPGRPSV - add REL_DEQ function DSPPRAB - fix formatting of pras flags DSPRLI00 - get shared ENQ on new resource - fix dspdgml_started_during_quiesce indication (should be off) - Add routine, GetENQForQuiesce, to get shared ENQ. DSPRLN - add several fields to coordinate quiesce processing DSPRLN00 - added support to broadcast not_quiesce - when quiesce/quiclose, get global ENQ if no other DBRCs and wait for response before continuing - Add routine, GetENQForQuiesce, to attach a TCB, DSPRLNQ0 to request an exclusive ENQ. DSPRLXB0 - Removed the issuance of MSGDSP1126I (moved to DSPRLX10) - Removed DSP1128 to prevent potential ABENDS138 DSPRLX10 - Added code to issue DSP1126 message for brlbf2qn. - Added code to issue DSP1128 message. - Do not call dsprclos if dspropen fails - do not call DSPRCLOS QUIESCE or DSPROPEN ENDQUI if GDBRLIInOpen is set. - if Quiesce Acknowldegement (brlbf2qk) message has no dsns, ignore dsn list. - If we get a quiesce request while waiting for not_quiesced, release the shared enqueue if we hold it. - In brlbf2qk or brlbf2dd processing, add logic to request an exclusive ENQ if all known DBRCs have responded to a quiesce request. If the EXclusive ENQ is already held when we detect all have responded, then releases the ENQ. - add new brlb2ge (got enqueue) support. Check if found a new DBRC. If not, we own the RECONs and process as in brlbf2qk or brlbf2dd when own RECONs. - If all DBRCs we knew about when started go down and we are waiting on not_quiesced, treat as implicit end quiesce and broadcast not quiesced to other DBRCs. - Add routine, found_new_dbrc, to issue CSLSCQRY to determin if any new DBRCs exist. - Add routine, GetENQForQuiesce, to attach a TCB, DSPRLNQ0 to request an exclusive ENQ. - Add routine ReleaseENQ to DEQ the shared ENQ or POST DSPRLNQ0 to DEQ the exclusive ENQ DSPTRAC1 - add support for missing brlbf2 functions THE FOLLOWING MODULES WERE RECOMPILED: DSPCINT0 - Recompile for new parts DSPETXR0/DSPRLNQ0 DSPEF00F - Recompile for DSPRLN change DSPEF01F - Recompile for DSPRLN change DSPEF03F - Recompile for DSPRLN change DSPEF0AF - Recompile for DSPGDB change DSPLOADR - Recompile for new modules DSPRLNQ0 & DSPEXTR0 DSPRLXI0 - Recompile to pick up BRLBF2SS THE FOLLOWING ARE THE NEW PARTS: PSPETXR0 - Propagate ABENDs originating in subtasks as U4095. ETXR routines are specified when a task is attached. They are entered when the task terminates. If an IMS task uses this routine as an ETXR routine when attaching subtasks, then any ABENDs originating in the subtasks will result in a user 4095 ABEND of the attaching task. DSPRLNQ0 - This module is run as a DBRC subtask to request an exclusive ESYSTEMS ENQ on a new resource for quiesce processing serialization. Once obtained, it will enqueue a brlsb or an AWE depending on the environment to let group services know the ENQ was obtained. It finally will wait until posted to to the DEQ.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PM53092
Reported component name
IMS V11
Reported component ID
5635A0200
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2011-11-29
Closed date
2012-03-27
Last modified date
2012-04-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
PM59495 UK77421
Modules/Macros
DFSBRLSB DSPBDS00 DSPBGS00 DSPBGS10 DSPBGS20 DSPBGS50 DSPBIN20 DSPCINT0 DSPCRTR0 DSPDSS01 DSPDSS10 DSPEF0AF DSPEF00F DSPEF01F DSPEF03F DSPEF040 DSPETXR0 DSPLOADR DSPRLI00 DSPRLNQ0 DSPRLN00 DSPRLXB0 DSPRLXI0 DSPRLX10 DSPTRAC1 DSPURI00 HMK1100J
Fix information
Fixed component name
IMS V11
Fixed component ID
5635A0200
Applicable component levels
R100 PSY UK77421
UP12/03/29 P F203
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
03 April 2012