IBM Support

IJ25146: BUG IN GNR LOG VTRACK RECOVERY

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • A race condition between disks having errors
    and recovery groups or log groups resigning
    could lead to a bug in GNR log vtrack recovery
    failed to scrub and repair the stale data on the
    disk and can further lead to data corruption
    if all good copies of
    the mirrored data are lost.
    

Local fix

  • The immediate bug does not cause problem
    right away, but it does expose a condition that
    additional events could lead to a recovery failure.
    When the recovery failure is detected data has
    already been corrupted. It will require
    special steps to allow recovery to
    complete but in the meantime a
    certain amount of data will be lost.
    

Problem summary

  • A race condition between disks having errors
    and recovery groups or log groups resigning
    could lead to a bug in GNR log vtrack recovery
    failed to scrub and repair the stale data on the
    disk
    

Problem conclusion

  • Benefits of the solution:
    Fixed the code so that the log vtrack
    scrubbing operation properly fixes
    the inconsistent data during recovery.
    Work Around: None
    Problem trigger:
    This is normally exposed by a race condition
    between a number of disks gradually
    hitting errors causing recovery groups
    or log groups to resign.
    Symptom:
    This problem could happen to either fast write
    log or VCD log, and the impact will be
    failure to complete recovery.
    The log will indicate:
    
    2020-05-29_16:25:21.693-0700: [E] Recovery failure:
    at least 3400 sectors of the fast-write log for LG root
    of RG RG1 could not be recovered.
    2020-05-29_16:25:21.693-0700: [E] Beginning to
    resign log group root in recovery group RG1 due to
    "recovery failure", caller err 333 when "recovering
    log group worker"
    
    or
    
    2020-04-29_09:04:42.567-0400: [E] Recovery failure:
    at least 41616 sectors of the VCD log for LG LG001
    of RG rg1 could not be recovered.
    2020-04-29_09:04:42.567-0400: [E] Beginning to resign
    log group LG001 in recovery group rg1 due to
    "recovery failure", caller err 333 when "recovering log group
    worker"
    
    Platforms affected: N/A
    
    Functional Area affected: GNR
    Customer Impact: high Importance
    
    Changed Externals:None
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ25146

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    505

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-06-01

  • Closed date

    2020-06-01

  • Last modified date

    2020-07-17

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
18 July 2020