A fix is available
APAR status
Closed as program error.
Error description
2 instances of Linux XFS corruption on two nodes in 2 different s390x arch clusters which run in same VM SSI. Both clusters were migrated from DS8K to DS8950. OCP version 4.14, zVM 7.2 (2302). EAV ECKD DS8950 disks, Thin provisioned ESE, with Linux exploiting HYPERPAV with the DEFINE HYPERPAVALIAS command. The base DASD are configured to the guest as cylinder 1-END minidisks. Linux is also exploiting HPF I/O in the error scenario. IBM note: Error occurs when guest issues initial read or write HPF I/O thru a HYPERPAV Alias directed to a 1-END minidisk base. All 3 configuration conditions must be present to hit this error: HPF, 1-END minidisks, & guest exploited HYPERPAV alias devices. EAV & ESE come along for the ride, but aren't necessary to hit the issue.
Local fix
Client's circumvention info: On one of failed nodes, able to recopy the disk and issue did not reoccur on boot up. One failed node able to boot in emergency mode, so able to gather a sos report , DBGINFO, and the output from a dry run of xfs_repair.
Problem summary
**************************************************************** * USERS AFFECTED: ALL z/VM users exploiting guest HYPERPAV * * with 1-to-END minidisks * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** *SYMPTOM: Wrong real cylinder accessed when guest does HPF I/O to HYPERPAV 1-to-end mindisk base via virtual alias device. When a guest issues I/O to a HYPERPAV alias, the internal control block structures are altered to represent characteristics associated with the target base. This alteration needs to be performed before any minidisk CCW or DCW translation is performed. For HPF I/O, the alteration is not done until after DCW translation is performed. This can result in DCW translation not performing the correct cylinder relocation for minidisks. This doesn't appear to have caused a problem when guest's could only perform I/O to 0-to-END (real-cylinders) base fullpack minidisks with HYPERPAV aliases. The introduction of 1-to-END cylinders minidisk support has uncovered this problem. Since the initial state of the internal control blocks represent a 0-to-END minidisk, the resulting error is usually an I/O that is misplaced by minus 1 cylinder. For example, instead of being directed to real cylinder 5, the I/O would attempt to read or write to real cylinder 4. When a guest is set-up to use 1-to-END base minidisks consistently for HYPERPAV I/O, the error condition will only occur on the first alias HPF I/O to a 1-to-END base, for each virtual alias device. This is because the minidisk real cylinder extents in the internal control blocks will be properly set immediately following DCW translation for that first I/O. If a guest is setup with a mix of 0-to-END & 1-to-END base virtual devices, then the error condition could be reoccurring. *SYMPTOM: DETACH command hang possible when guest detaches a range of virtual HYPERPAV alias devices when CSE XLINK is also used. A secondary problem was found during stress testing of the fix to the reported problem. It was found in a CSE XLINK environment, that DETACH command hangs were possible when a guest detaches a range of virtual HYPERPAV alias devices. This occurred because the internal symbolic lock name associated with each virtual alias was contrived of the VOLSER name associated with the real alias device linked to the virtual device. Additionally, the real alias VOLSER name changes dynamically based on the last real base device that the real alias had targeted. In an environment with heavy I/O loads going to real alias devices, this VOLSER name could change between the time a lock was obtained and then released. Since there are no safeguards for an unknown symbolic name in the lock release code, a lock can be held indefinitely. That is, the release code will just exit if the symbolic lock name can't be found. This condition led to a guest hang when the original symbolic lock name was obtained again, usually by a subsequent virtual device in the DETACH command range, resulting in a guest hang. Additionally, since the alias RDEV lock is many times held in this code path, the hang of the symbolic CSE lock can result in the RDEV lock held indefinitely, resulting in a real alias device hang.
Problem conclusion
SYMPTOM 1 fix - HCPFXR was modified to properly alter its internal control block structures before doing DCW translation on guest I/O directed to a virtual HYPERPAV alias. This results in the correct real cylinder translation being performed before the real I/O is initiated. SYMPTOM 2 fix - HCPXLG was altered to use a consistent VOLSER name for all real HYPERPAV aliases. This avoids an inconsistent lock name used between the lock obtain and release, which occurred when the real alias was directed to different target base devices in an obtain, release window.
Temporary fix
********* * HIPER * ********* FOR RELEASE VM/ESA CP/ESA R720 : PREREQ: VM66715 CO-REQ: NONE IF-REQ: NONE FOR RELEASE VM/ESA CP/ESA R730 : PREREQ: VM66715 CO-REQ: NONE IF-REQ: NONE
Comments
APAR Information
APAR number
VM66776
Reported component name
VM CP CP
Reported component ID
568411202
Reported release
720
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2024-06-04
Closed date
2024-07-18
Last modified date
2025-06-10
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UM90461 UM90462
Modules/Macros
HCPFCWDS HCPFXR HCPFXTBK HCPXLG
Fix information
Fixed component name
VM CP CP
Fixed component ID
568411202
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SG27M","label":"APARs - z\/VM Environment"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"720","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]
Document Information
Modified date:
11 June 2025