Question & Answer
Question
Deadlock between two systems occurs. The deadlock is caused by the following scenario: SYSA catalog task has: Enqueue on SYSIGGV2 CAT.VOLxxx with a reserve on VOLxxx Enqueue on SYSZVVDS VOLyyy (waiting for VOLyyy) SYSB CATALOG task has: Enqueue on SYSIGGV2 CAT.
Answer
Problem
Deadlock between two systems occurs. The deadlock is caused by the following scenario:
SYSA catalog task has:
Enqueue on SYSIGGV2 CAT.VOLxxx with a reserve on VOLxxx
Enqueue on SYSZVVDS VOLyyy (waiting for VOLyyy)
SYSB CATALOG task has:
Enqueue on SYSIGGV2 CAT.VOLyyy with a reserve on VOLyyy
Enqueue on SYSZVVDS VOLxxx (waiting for VOLxxx)
To avoid an IPL the operator 'CANCELLED' the user task or issued command F CATALOG,END(id),NOREDRIVE for the CATALOG task.
Deadlock between two systems occurs. The deadlock is caused by the following scenario:
SYSA catalog task has:
Enqueue on SYSIGGV2 CAT.VOLxxx with a reserve on VOLxxx
Enqueue on SYSZVVDS VOLyyy (waiting for VOLyyy)
SYSB CATALOG task has:
Enqueue on SYSIGGV2 CAT.VOLyyy with a reserve on VOLyyy
Enqueue on SYSZVVDS VOLxxx (waiting for VOLxxx)
To avoid an IPL the operator 'CANCELLED' the user task or issued command F CATALOG,END(id),NOREDRIVE for the CATALOG task.
When the operator issued F CATALOG,END(id),NOREDRIVE, or CANCEL, CAS invoked CALLRTM macro with ABEND71A to terminate the CAS service task.
RTM gets control and issues an SVC10 purge=quiesce, which causes an SVC wait in IOSPURGA. IOSPURGA does not complete until I/O has completed because of the purge=quiesce but the required device for the I/O is still reserved by the other system.
This wait is the result of the F CATALOG,END(id),NOREDRIVE command or CANCEL not being able to complete. At this point the abended CAS task waits while it continues to hold hardware reserves on previously owned devices. Essentially, the lockout situation remains and an IPL on one of the systems is required to resolve the lockout.
If the user fails to convert the resource SYSIGGV2 from a hardware reserve, a lockout between systems sharing catalogs may result. Each system is waiting to perform I/O to a device reserved by a sharing system. Some jobs or catalog tasks may not be terminated via system commands and they remain in I/O waits.
Resolution
The following notes are intended to help system programmers understand the consequences of failing to convert the SYSIGGV2 reserve by use of GRS or equivalent (ISV/OEM) resource serialization software:
1) The most common lockout resulting from the failure to convert the SYSIGGV2 reserve is between major names SYSIGGV2 and SYSZVVDS.
This wait is the result of the F CATALOG,END(id),NOREDRIVE command or CANCEL not being able to complete. At this point the abended CAS task waits while it continues to hold hardware reserves on previously owned devices. Essentially, the lockout situation remains and an IPL on one of the systems is required to resolve the lockout.
If the user fails to convert the resource SYSIGGV2 from a hardware reserve, a lockout between systems sharing catalogs may result. Each system is waiting to perform I/O to a device reserved by a sharing system. Some jobs or catalog tasks may not be terminated via system commands and they remain in I/O waits.
Resolution
The following notes are intended to help system programmers understand the consequences of failing to convert the SYSIGGV2 reserve by use of GRS or equivalent (ISV/OEM) resource serialization software:
1) The most common lockout resulting from the failure to convert the SYSIGGV2 reserve is between major names SYSIGGV2 and SYSZVVDS.
In such a deadlock, CATALOG TASK 1, on SYSTEM ONE, holds a hardware reserve on VOLUME 1 and is in an I/O wait for I/O directed to VOLUME 2.
Meanwhile, CATALOG TASK 2 running on SYSTEM 2 and holding a reserve on VOLUME 2 is waiting to perform I/O to the volume. This type of resource contention can be minimized by ensuring all data sets defined on the same volume as an ICF CATALOG are cataloged in that CATALOG and by reducing the number of data sets residing on the same volume as an ICF catalog that are not cataloged in that catalog.
NOTE: CATALOG MANAGEMENT LOGIC is committed to users being able to catalog data sets onto a single volume in many different catalogs, and in NO WAY affects PERFORMANCE when the SYSIGGV2 reserve is properly converted.
2) The use of the MVS commands 'MODIFY CATALOG LIST' followed by the 'MODIFY CATALOG END(xx)' or 'MODIFY CATALOG RESTART' or 'MODIFY CATALOG,ABEND' may be useful in terminating lockouts.
NOTE: CATALOG MANAGEMENT LOGIC is committed to users being able to catalog data sets onto a single volume in many different catalogs, and in NO WAY affects PERFORMANCE when the SYSIGGV2 reserve is properly converted.
2) The use of the MVS commands 'MODIFY CATALOG LIST' followed by the 'MODIFY CATALOG END(xx)' or 'MODIFY CATALOG RESTART' or 'MODIFY CATALOG,ABEND' may be useful in terminating lockouts.
The user might be able to identify a catalog task holding a reserve on a volume needed by a sharing system and, by terminating that task, the user might relieve the enqueue contention. However, catalog corruption may occur if I/O directed towards a catalog or VVDS is interrupted while in progress.
For this reason, the catalog recovery environment will not permit the termination of a catalog task if that task is in an I/O wait. Therefore, if the user attempts to terminate a task waiting for I/O to begin, (even if the device is at present reserved by a sharing system), that task will not be terminated until the hardware (UCB) reserve is released and the I/O has completed.
Use of the 'MODIFY CATALOG,RESTART' command when one or more catalog tasks is in an I/O wait may result in the catalog address space being unable to process any catalog requests until the sharing system's reserve is released.
Deadlocks between systems may occur because of the failure to convert the SYSIGGV2 reserve. If more than 2 systems are involved, the IPL of more than one system may be required. System programmers must carefully weigh the advantages to be gained from not converting the SYSIGGV2 reserve against the cost and inconvenience of unscheduled IPLs.
See DFSMS Managing catalogs
Topic: Preventing Lockouts on Shared Volumes
The following note is in regard to the IDCAMS DELETE / ERASE operation in regards to the SYSIGGV2 reserve:
NOTE: There is always the possibility of a system failure occurring while a DELETE operation is in progress. In as such, there are different ramifications to experience, when the failure results in partial data set catalog structures. It is the conclusion of the CATALOG component group that the best order to delete said catalog structure is :
(1) Remove the DSCB
(2) Remove the VVR or NVRs
(3) Remove the BCS entry
ICF catalog serializes this whole process using the SYSIGGV2 RESERVE. If ERASE is specified with the delete operation, Catalog will hold this RESERVE throughout the entire period that it will take DADSM to physically 'zero' the DASD tracks. Depending upon the amount of tracks to zero, this ERASE operation could take several minutes to complete.
There will be 2 SYSIGGV2 reserves: the catalog and sphere.
Use of the 'MODIFY CATALOG,RESTART' command when one or more catalog tasks is in an I/O wait may result in the catalog address space being unable to process any catalog requests until the sharing system's reserve is released.
Deadlocks between systems may occur because of the failure to convert the SYSIGGV2 reserve. If more than 2 systems are involved, the IPL of more than one system may be required. System programmers must carefully weigh the advantages to be gained from not converting the SYSIGGV2 reserve against the cost and inconvenience of unscheduled IPLs.
See DFSMS Managing catalogs
Topic: Preventing Lockouts on Shared Volumes
The following note is in regard to the IDCAMS DELETE / ERASE operation in regards to the SYSIGGV2 reserve:
NOTE: There is always the possibility of a system failure occurring while a DELETE operation is in progress. In as such, there are different ramifications to experience, when the failure results in partial data set catalog structures. It is the conclusion of the CATALOG component group that the best order to delete said catalog structure is :
(1) Remove the DSCB
(2) Remove the VVR or NVRs
(3) Remove the BCS entry
ICF catalog serializes this whole process using the SYSIGGV2 RESERVE. If ERASE is specified with the delete operation, Catalog will hold this RESERVE throughout the entire period that it will take DADSM to physically 'zero' the DASD tracks. Depending upon the amount of tracks to zero, this ERASE operation could take several minutes to complete.
There will be 2 SYSIGGV2 reserves: the catalog and sphere.
The catalog SYSIGGV2 consists of the catalog NAME only and serializes the entire catalog.
The sphere SYSIGGV2 is the sphere NAME (DSN or CLUSTER or ALIAS) plus the catalog NAME.
This allows CATALOG (CAS) to only hold the catalog record (sphere) that CATALOG is presently processing.
The catalog SYSIGGV2 is held only when we are updating the catalog.
If this SYSIGGV2 RESERVE (SVC56) is properly converted to an ENQUEUE by GRS, then, instead of reserving the entire VOLUME, only the catalog is locked for the catalog SYSIGGV2 and only the entry is locked during the sphere SYSIGGV2. Other items on the volume will not be locked out during the DELETE process. The delete process looks like this:
(0) Issue CAS request SVC26 (0A1A)
(1) Enqueue SHR SYSIGGV2 catalog
(2) Read the catalog record
(3) Enqueue EXC SYSIGGV2 sphere
(4) Dequeue SYSIGGV2 catalog
(5) Scratch the DSCB (SVC29 (0A1D) DADSM may erase)
(6) Delete VVR or NVR (if appropriate)
(7) Enqueue EXC SYSIGGV2 catalog
(8) Delete BCS entries
(9) Dequeue SYSIGGV2 catalog
(10) Dequeue SYSIGGV2 sphere
If GRS conversion is not done for SYSIGGV2, then the hardware RESERVE on the associated VOLUME (UCB) is held throughout steps 1 through 10. The negative impact being that all systems requesting services from this VOLUME will be locked out for a prolonged period of time. As such, it is highly recommended that users set SYSIGGV2 in the GRS conversion RNL such that GRS will convert 'generically' SYSIGGV2 and exclude the catalog SYSIGGV2 'specifically'. And, all sharing systems MUST be in the SAME GRS complex. If we (IBM) did not function in this manner, and simply dropped SYSIGGV2, then, the DELETE or ALTER processes would not be properly serialized throughout their respective operations with the possible negative impact of leaving bad or partial catalog structures.
For more on catalog resources and sharing of catalogs see informational apar II14297.
If this SYSIGGV2 RESERVE (SVC56) is properly converted to an ENQUEUE by GRS, then, instead of reserving the entire VOLUME, only the catalog is locked for the catalog SYSIGGV2 and only the entry is locked during the sphere SYSIGGV2. Other items on the volume will not be locked out during the DELETE process. The delete process looks like this:
(0) Issue CAS request SVC26 (0A1A)
(1) Enqueue SHR SYSIGGV2 catalog
(2) Read the catalog record
(3) Enqueue EXC SYSIGGV2 sphere
(4) Dequeue SYSIGGV2 catalog
(5) Scratch the DSCB (SVC29 (0A1D) DADSM may erase)
(6) Delete VVR or NVR (if appropriate)
(7) Enqueue EXC SYSIGGV2 catalog
(8) Delete BCS entries
(9) Dequeue SYSIGGV2 catalog
(10) Dequeue SYSIGGV2 sphere
If GRS conversion is not done for SYSIGGV2, then the hardware RESERVE on the associated VOLUME (UCB) is held throughout steps 1 through 10. The negative impact being that all systems requesting services from this VOLUME will be locked out for a prolonged period of time. As such, it is highly recommended that users set SYSIGGV2 in the GRS conversion RNL such that GRS will convert 'generically' SYSIGGV2 and exclude the catalog SYSIGGV2 'specifically'. And, all sharing systems MUST be in the SAME GRS complex. If we (IBM) did not function in this manner, and simply dropped SYSIGGV2, then, the DELETE or ALTER processes would not be properly serialized throughout their respective operations with the possible negative impact of leaving bad or partial catalog structures.
For more on catalog resources and sharing of catalogs see informational apar II14297.
Related Information
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG90","label":"z\/OS"},"Component":"5695DF105 - DFSMS CATALOG","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"1.1;1.10;1.11;1.12;1.13;1.2;1.3;1.4;1.5;1.6;1.7;1.8;1.9;2.1;2.2;2.3;2.4","Edition":"","Line of Business":{"code":"LOB56","label":"Z HW"}}]
Historical Number
5305619
Was this topic helpful?
Document Information
Modified date:
03 September 2021
UID
isg3S1000396