IBM Support

IJ29555: NSD RETRY LOOP CONTINUOUS RESIGN ISSUE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • current design of GNR, RG will resign due to vdisk with
    8+2p failing fault tolerance whenever RG is recovered,
    and after enough threshold of this pattern reached,
    I/O will be returned as error for any disks retrying the
    I/O.
    

Local fix

Problem summary

  • In mirrored disk environment, I/O is expected to
    continue with the surviving disk if a disk experiences a
    problem. In the case of a disk being created with a
    recovery group vdisk, and the recovery group is in a state
    that it continues to resign due to some vdisk's fault
    tolerance exceeded after a successful recovery,
    a race condition exists which could cause the logic of
    checking this state to be skipped. As a result
    of this, I/O will  continue to be  retried to the problem
    disk instead of moving on to the surviving
    disks.
    

Problem conclusion

  • Benefits of the solution:
    With the fix, I/O will continue with the surviving
    disks instead of hanging with the problem disk.
    
    Work around:
    None
    
    Problem trigger:
    A race condition in the logic that determines if a
    recovery group is repetitively experiencing
    a resign after a successful recovery due to
    a fault tolerance being reached.
    
    Symptom:
    I/O hang due to long waiters  'waiting for stateful
    NSD server error takeover (2)'
    
    Platforms affected:
    N/A
    
    Functional Area affected:
    GNR
    
    Customer Impact:
    Suggested
    Changed Externals:
    None
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ29555

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    510

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-12-02

  • Closed date

    2020-12-14

  • Last modified date

    2020-12-14

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
15 December 2020