IBM Support

IJ53784: POTENTIAL SILENT CORRUPTION OF DATA IN IBM STORAGE SCALE 5.1.7.0 - 5.1.9.8 AND 5.2.0.0 - 5.2.2.0

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • There is a race condition that involves multiple threads
    performing a full-track read operation to the same track while
    disk errors exist. When the configuration parameter
    nsdRAIDClientOnlyChecksum is enabled, this race condition could
    create a situation where, without going through the checksum
    validation, data read from disks could be used for the
    reconstruction of data that failed to read due to disk errors.
    

Local fix

  • Disable client only checksum by running "mmchconfig
    nsdRAIDClientOnlyChecksum=no -i -N <server nodes or nodeclass>"
    

Problem summary

  • There is a race condition that involves multiple threads
    performing a full-track read operation to the same track while
    disk errors exist. When the configuration parameter
    nsdRAIDClientOnlyChecksum is enabled, this race condition could
    create a situation where, without going through the checksum
    validation, data read from disks could be used for the
    reconstruction of data that failed to read due to disk errors.
    

Problem conclusion

  • Benefits of the solution:
    Fixed the code to prevent this race condition from happening.
    
    Work Around:
    Disable client only checksum by running "mmchconfig
    nsdRAIDClientOnlyChecksum=no -i -N <server nodes or nodeclass>"
    
    Problem trigger:
    The race condition and the specific data buffer corruption by
    disk drives.
    
    Symptom: 
    The potential final outcome could be silent data corruption,
    however there is intermediate signs of "Error validating buffer
    checksum in vdisk RG001LG002VS004 vtrack 7611..." which itself
    is not necessary the sign of silent data corruption.
    
    Platforms affected:
    Linux Only
    
    Functional Area affected:
    GNR
    
    Customer Impact:
    High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ53784

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    522

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2025-03-03

  • Closed date

    2025-03-03

  • Last modified date

    2025-03-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"522","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]

Document Information

Modified date:
04 March 2025