IBM Support

VM66776: LINUX XFS FILE-SYSTEM CORRUPTION WITH ESE, HPF, AND HYPERPAV ALIAS EXPLOITATION

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • 2 instances of Linux XFS corruption on two nodes in 2 different
    s390x arch clusters which run in same VM SSI.  Both clusters
    were migrated from DS8K to DS8950.  OCP version 4.14, zVM 7.2
    (2302).   EAV ECKD DS8950 disks, Thin provisioned ESE, with
    Linux exploiting HYPERPAV with the DEFINE HYPERPAVALIAS command.
     The base DASD are configured to the guest as cylinder 1-END
    minidisks.   Linux is also exploiting HPF I/O in the error
    scenario.
    
    IBM note:   Error occurs when guest issues initial read or write
    HPF I/O thru a HYPERPAV Alias directed to a 1-END minidisk base.
      All 3 configuration conditions must be present to hit this
    error:   HPF, 1-END minidisks, & guest exploited HYPERPAV alias
    devices.   EAV & ESE come along for the ride, but aren't
    necessary to hit the issue.
    

Local fix

  • Client's circumvention info:   On one of failed nodes, able to
    recopy the disk and issue did not reoccur on boot up.     One
    failed node able to boot in emergency mode, so able to gather a
    sos report , DBGINFO, and the output from a dry run of
    xfs_repair.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: ALL z/VM users exploiting guest HYPERPAV     *
    *                 with 1-to-END minidisks                      *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION: APPLY PTF                                    *
    ****************************************************************
    *SYMPTOM:  Wrong real cylinder accessed when guest does HPF I/O
    to HYPERPAV 1-to-end mindisk base via virtual alias device.
    
    When a guest issues I/O to a HYPERPAV alias, the internal
    control block structures are altered to represent
    characteristics associated with the target base.  This
    alteration needs to be performed before any minidisk CCW or DCW
    translation is performed.
    
    For HPF I/O, the alteration is not done until after DCW
    translation is performed.  This can result in DCW translation
    not performing the correct cylinder relocation for minidisks.
    This doesn't appear to have caused a problem when guest's could
    only perform I/O to 0-to-END (real-cylinders) base fullpack
    minidisks with HYPERPAV aliases.  The introduction of 1-to-END
    cylinders minidisk support has uncovered this problem.  Since
    the initial state of the internal control blocks represent a
    0-to-END minidisk, the resulting error is usually an I/O that is
    misplaced by minus 1 cylinder.  For example, instead of being
    directed to real cylinder 5, the I/O would attempt to read or
    write to real cylinder 4.
    
    When a guest is set-up to use 1-to-END base minidisks
    consistently for HYPERPAV I/O, the error condition will only
    occur on the first alias HPF I/O to a 1-to-END base, for each
    virtual alias device.  This is because the minidisk real
    cylinder extents in the internal control blocks will be properly
    set immediately following DCW translation for that first I/O.
    If a guest is setup with a mix of 0-to-END & 1-to-END base
    virtual devices, then the error condition could be reoccurring.
    
    *SYMPTOM: DETACH command hang possible when guest detaches a
    range of virtual HYPERPAV alias devices when CSE XLINK is
    also used.
    
    A secondary problem was found during stress testing of the fix
    to the reported problem.  It was found in a CSE XLINK
    environment, that DETACH command hangs were possible when a
    guest detaches a range of virtual HYPERPAV alias devices.
    This occurred because the internal symbolic lock name associated
    with each virtual alias was contrived of the VOLSER name
    associated with the real alias device linked to the virtual
    device.  Additionally, the real alias VOLSER name changes
    dynamically based on the last real base device that the real
    alias had targeted.  In an environment with heavy I/O loads
    going to real alias devices, this VOLSER name could change
    between the time a lock was obtained and then released.
    
    Since there are no safeguards for an unknown symbolic name in
    the lock release code, a lock can be held indefinitely.  That
    is, the release code will just exit if the symbolic lock name
    can't be found.  This condition led to a guest hang when the
    original symbolic lock name was obtained again, usually by a
    subsequent virtual device in the DETACH command range, resulting
    in a guest hang.  Additionally, since the alias RDEV lock is
    many times held in this code path, the hang of the symbolic CSE
    lock can result in the RDEV lock held indefinitely, resulting in
    a real alias device hang.
    

Problem conclusion

  • SYMPTOM 1 fix -
    HCPFXR was modified to properly alter its internal control block
    structures before doing DCW translation on guest I/O directed to
    a virtual HYPERPAV alias.  This results in the correct real
    cylinder translation being performed before the real I/O is
    initiated.
    
    SYMPTOM 2 fix -
    HCPXLG was altered to use a consistent VOLSER name for all real
    HYPERPAV aliases.  This avoids an inconsistent lock name used
    between the lock obtain and release, which occurred when the
    real alias was directed to different target base devices in an
    obtain, release window.
    

Temporary fix

  • *********
    * HIPER *
    *********
    FOR RELEASE VM/ESA CP/ESA R720 :
    PREREQ: VM66715
    CO-REQ: NONE
    IF-REQ: NONE
    FOR RELEASE VM/ESA CP/ESA R730 :
    PREREQ: VM66715
    CO-REQ: NONE
    IF-REQ: NONE
    

Comments

APAR Information

  • APAR number

    VM66776

  • Reported component name

    VM CP CP

  • Reported component ID

    568411202

  • Reported release

    720

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2024-06-04

  • Closed date

    2024-07-18

  • Last modified date

    2025-06-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UM90461 UM90462

Modules/Macros

  • HCPFCWDS HCPFXR   HCPFXTBK HCPXLG
    

Fix information

  • Fixed component name

    VM CP CP

  • Fixed component ID

    568411202

Applicable component levels

  • R720 PSY UM90461

       UP24/07/22 P 2401 ¢

  • R730 PSY UM90462

       UP24/07/22 P 2501 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SG27M","label":"APARs - z\/VM Environment"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"720","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]

Document Information

Modified date:
11 June 2025