A fix is available
APAR status
Closed as new function.
Error description
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * This APAR is part of the full function support for Release * * 5.2.1 of the TS7700 Virtualization Engine D/T3957 and * * provides enhanced CUIR support for an unhealthy (fenced) * * cluster. * **************************************************************** * PROBLEM DESCRIPTION: * * PROBLEM DESCRIPTION: Enhanced CUIR Support for Release 5.2.1 * * of the TS7700 Virtualization Engine. * **************************************************************** * RECOMMENDATION: * * For full support (z/OS V2R3 and above) OA60929 (Device * * Services) will bring in the support from MVS Allocation * * OA61050. * ****************************************************************
Problem conclusion
Temporary fix
Comments
TS7700 Release 5.2.1 -------------------- This APAR was developed as part of the support for Release 5.2.1 of the TS7700 Virtualization Engine and enhances the CUIR support that was initially delivered (see APAR OA52376). The initial support enabled devices to be automatically varied offline and online when a TS7700 cluster was placed in service. This new support enables devices in an unhealthy (fenced) cluster to be automatically varied offline and online. . None of the other functions being delivered in Release 5.2.1 of the TS7700 require host support. . Resources --------- For a discussion of the other TS7700 Release 5.2.1 enhancements, refer to: https://www.ibm.com/docs/en/ts7700-virtual-tape. For a detailed discussion of the CUIR support, refer to: https://www.ibm.com/support/pages/node/6355675. . Control Unit Initiated Recovery (CUIR) - Unhealthy Cluster ---------------------------------------------------------- With this host support, if all clusters in the grid are at the 5.2.1 release level, an automatic vary capability now exists for the TS7700 to notify the host that a distributed library (cluster) is having issues. This will enable each of the supporting host systems to automatically vary the devices offline and back online (for CUIR reasons). By default both of the vary notifications (offline and online) are disabled. The LIBRARY REQUEST command can be used to enable each of the automatic notifications: . - LIBRARY REQUEST,complib,CUIR,SETTING,FENCE,{ENABLE|DISABLE} - LIBRARY REQUEST,complib,CUIR,AONLINE,FENCE,{ENABLE|DISABLE} . In addition to the FENCE keyword above, SERVICE or ALL can also be specified. SERVICE enables the initial CUIR vary notification support and ALL enables both SERVICE and UNHEALTHY cluster varies. . If device verification is needed before bringing the devices online, the online notification (AONLINE) can be left disabled and through the TS7700 management interface (MI), online notification can be manually triggered. . With this support (as with the initial CUIR support) when a tape device is varied offline for service or for an unhealthy (fenced) cluster, it is varied offline for CUIR reasons. If a device is offline for CUIR reasons, the reverse notification is needed to clear the CUIR state. If the device is subsequently offline for other reasons (path, operator, or library) it may remain in the offline state due to the other reasons. The existing LIBRARY DISPDRV command (CBR1220I) can be used to determine the reason that a device is offline, including the CUIR reason. If the CUIR state does not clear, the existing VARY xxxx,ONLINE,RESET command can be used. . When devices are varied offline for CUIR reasons, they will go pending offline and then depending on the state of the device, and the state of the cluster, the DDR SWAP command can be used to move long running jobs to another device. For devices that are boxed, when the host receives notification to bring the devices back online, the host will attempt to bring the boxed devices back online. . The following LIBRARY REQUEST commands were added with the initial CUIR support: - LIBRARY REQUEST,libname,LDRIVE (composite or distributed) - LIBRARY REQUEST,distlib,LDRIVE,GROUP,index The LIBRARY REQUEST LDRIVE commands can be used to determine the state of a CUIR notification request. Refer to the following TS7700 white paper for the LDRIVE command options and the output reported to the host: https://www.ibm.com/support/pages/node/6355675 . The following operator command was added with the initial CUIR support: - DEVSERV QTAPE,xxxx,QHA The query host access (QHA) support for tape displays the systems that are online (grouped) to the specified tape device. If there are systems whose devices are not going offline, this will show the systems that are still online (grouped) to the specified device. Since for a period of time, only a subset of the systems may support the new unhealthy vary notification, the commands noted above will help determine if manual varies are needed from some of the systems. . For IOS message updates related to the initial CUIR support refer to IOS APAR (OA52379). Updates specific to this support include: . IOS279I (Message Text) - REQUEST REASON displayed may be SERVICE or UNHEALTHY CLUSTER (Explanation) - updated to explain the unhealthy cluster reason The library has detected issues with a cluster (distributed library) in the grid (composite library) and has fenced the cluster. The library has initiated a reconfiguration request from a device to quiesce the specified set of devices in the unhealthy cluster. The Control Unit Initiated Reconfiguration (C.U.I.R.) service has received control to perform the request. Quiescing devices means to make devices unavailable for use so that they cannot be varied online and used while the cluster is in the fenced state. . IOS280I (Message Text) - REQUEST REASON displayed may be SERVICE or UNHEALTHY CLUSTER (Explanation) - updated to explain the unhealthy cluster reason The library has initiated a reconfiguration request from a device to resume the specified set of devices when the library is no longer in the fenced state. The Control Unit Initiated Reconfiguration (C.U.I.R.) service has received control to perform the request. Resuming devices means to make devices available for use when the cluster (or distributed library) is no longer considered to be in the fenced (unhealthy) state. The devices may have been varied online by the system or may have been made available to be varied online. . IOS281I No changes to the successful message. . IOS282I (Message Text) - REQUEST REASON displayed may be SERVICE or UNHEALTHY CLUSTER (Explanation) - updated to cover both the service and the unhealthy cluster reason The Control Unit Initiated Reconfiguration (C.U.I.R.) support attempted to quiesce the specified devices in order to satisfy the request specified in system message IOS279I (IOS1279I) but the devices could not be quiesced. The request may have failed because the current state of the device precludes it from being quiesced. This may be the case if the device is a JES3 managed device (C.U.I.R. is not supported). In some cases, it may be necessary to manually vary the device offline. (Operator Response) - updated to cover both the service and the unhealthy cluster reason You may need to manually vary the specified devices offline, which may entail an operator initiated DDR SWAP or the cancel of a job that has an allocated device. If the request was for a JES3 managed device, this failure is expected since JES3 managed devices are not supported by this function. Otherwise contact your IBM service representative if the failures persist. . Note: For the unhealthy cluster vary, since a healthy cluster can also report on the state of its peer, the IOS messages above (IOS2791 - IOS282I) may be issued multiple times. Also note that the CUIR support continues to only be supported when running natively on MVS, it is not supported for an MVS guest running under VM. In addition, a CUIR request for a JES3 managed device is not supported, and will result in IOS282I (IOS1282I) being issued. The JES3 managed devices will be listed as devices that could not be brought offline for CUIR reasons. Lastly, if an IPL occurs while in a CUIR state, the CUIR state is not maintained across the IPL. . Additional Search Keywords: MSGIOS279I MSGIOS280I MSGIOS281I MSGIOS282I
APAR Information
APAR number
OA60929
Reported component name
DEV SUPPORT TAP
Reported component ID
5695DF110
Reported release
230
Status
CLOSED UR1
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-02-18
Closed date
2021-09-16
Last modified date
2022-03-18
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
OA61050 UJ06693 UJ06694 UJ06695
Modules/Macros
IECTDSRV IECTDSR2
| SA38067630 |
Fix information
Fixed component name
DEV SUPPORT TAP
Fixed component ID
5695DF110
Applicable component levels
R230 PSY UJ06693
UP21/10/05 P F110
R240 PSY UJ06694
UP21/10/05 P F110
R250 PSY UJ06695
UP21/10/05 P F110
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M"},"Platform":[{"code":"PF054","label":"z\/OS"}],"Version":"230"}]
Document Information
Modified date:
19 March 2022