IBM Support

PH64680: AN ACCELERATOR ON IBM Z COULD GO INTO TROUBLE BECAUSE OF BAD I/O FROM DISKS (RCU_SCHED SELF-DETECTED STALL ON CPU)

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as documentation error.

Error description

  • In very rare situations an accelerator on IBM Z suddenly can be
    stalled (due to bad I/O from disks).
    Facing that issue, the accelerator can be activated after
    activating all LPARs and performing two reset without wipe
    action on the Admin UI.
    In the one situation observed so far, the SSC dump for the head
    node contained the following symptom string:
    "date/time head kernel: ... rcu: INFO: rcu_sched self-detected
    stall on CPU".
    The deep dive into the system data collected unveiled a hang
    situation on the Network File System (NFS) described in
    https://lore.kernel.org/all/20210802192804.GD6890@fieldses.org/T
    /.
    A parallel analysis of the IBM Z hardware did not unveil any
    hardware-related issues.
    To avoid the observed NFS-related hang situation, the
    accelerator's development team will implement a recommended
    workaround: the system parameter 'leases-enable' will be set 0.
    
    The change will be contained in Accelerator maintenance level
    7.5.13.1.
    Please note:
    There is no option to change parameter 'leases-enable' to 0 on
    the fly.
    NFS is only used to move accelerator configuration files;
    therefore the following is true:
    the potential performance impact mentioned for the workaround
    does NOT apply to the regular accelerator processing.
    
    Additional keywords:
    TS017852171
    DT420211
    GH/.../Customer-Cases/issues/779
    NFS self-detected stall on CPU leases-enable LINUX kernel
    

Local fix

Problem summary

  • Problem Summary:
    Administrators maintaining the json configuration file may be
    facing a hang-/ stall-situation.
    
    Users Affected:
    Administrators of the Accelerator on IBM Z, maintaining the json
    configuration file.
    
    Problem Scenario:
    See APAR Error description.
    
    Problem Symptoms:
    See APAR Error description.
    

Problem conclusion

  • To avoid the observed NFS-related hang situation, the system
    parameter 'leases-enable' will be set to 0 with Accelerator
    maintenance level 7.5.13.1.
    Upgrade your accelerator environment(s) accordingly.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH64680

  • Reported component name

    ANYTCS ACCLTR Z

  • Reported component ID

    5697DA700

  • Reported release

    750

  • Status

    CLOSED DOC

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2025-01-02

  • Closed date

    2025-03-09

  • Last modified date

    2025-03-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG19M"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"750"}]

Document Information

Modified date:
09 March 2025