IBM Support

IJ40280: CHECKSUM ERROR ON NETWORK I/O

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • NSD checksum errors are encountered in a cluster with
    verbsRdmaSend enabled, similar to:
    
    2021-09-27_15:46:52.820+0200: [W] Encountered first
    checksum error on network I/O to NSD Server disk RG002
    2021-09-27_15:46:52.820+0200: [W] Encountered first
    checksum error on network I/O from NSD Client
    10.124.70.48 (node1)
    2021-09-27_15:46:53.655+0200: [W] Encountered first
    checksum error on network I/O from NSD Client
    10.124.70.55 (node2)
    2021-10-01_12:47:35.002+0200: [W] Encountered first
    checksum error on network I/O from NSD Client
    10.124.70.45 (node3)
    2021-10-01_13:11:18.187+0200: [W] Encountered first
    checksum error on network I/O to NSD Server disk RG002
    2021-10-01_13:11:18.211+0200: [W] Encountered first
    checksum error on network I/O to NSD Server disk RG003
    
    There is a timing window where the data has arrived on
    the RDMA adapter on the GPFS client, and the client
    accesses the file data in the pagepool before the data
    has been copied to pagepool.
    
    LOCAL FIX:
    

Local fix

Problem summary

  • Potential for data integrity issues on all clusters using RDMA
    

Problem conclusion

  • This problem is fixed in 5.1.2 PTF 5
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    Benefits of the solution:
    
    Fixed code so that the integrity issue can no longer happen.
    
    Work around:
    Disable RDMA or set nsdCksumTraditional
    configuration parameter to "yes".
    Problem trigger:
    Race condition between the RDMA software layer
    and IBM Spectrum Scale when reading data.
    Symptom:
    Unexpected Results/Behavior
    Platforms affected:
    ALL Linux OS environments
    Functional Area affected:
    RDMA
    Customer Impact:
    Critical
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ40280

  • Reported component name

    SPEC SCALE DME

  • Reported component ID

    5737F34AP

  • Reported release

    512

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-05-25

  • Closed date

    2022-07-20

  • Last modified date

    2022-09-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE DME

  • Fixed component ID

    5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
08 September 2022