APAR status
Closed as program error.
Error description
NSD checksum errors are encountered in a cluster with verbsRdmaSend enabled, similar to: 2021-09-27_15:46:52.820+0200: [W] Encountered first checksum error on network I/O to NSD Server disk RG002 2021-09-27_15:46:52.820+0200: [W] Encountered first checksum error on network I/O from NSD Client 10.124.70.48 (node1) 2021-09-27_15:46:53.655+0200: [W] Encountered first checksum error on network I/O from NSD Client 10.124.70.55 (node2) 2021-10-01_12:47:35.002+0200: [W] Encountered first checksum error on network I/O from NSD Client 10.124.70.45 (node3) 2021-10-01_13:11:18.187+0200: [W] Encountered first checksum error on network I/O to NSD Server disk RG002 2021-10-01_13:11:18.211+0200: [W] Encountered first checksum error on network I/O to NSD Server disk RG003 There is a timing window where the data has arrived on the RDMA adapter on the GPFS client, and the client accesses the file data in the pagepool before the data has been copied to pagepool. LOCAL FIX:
Local fix
Problem summary
Potential for data integrity issues on all clusters using RDMA
Problem conclusion
This problem is fixed in 5.1.2 PTF 5 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Fixed code so that the integrity issue can no longer happen. Work around: Disable RDMA or set nsdCksumTraditional configuration parameter to "yes". Problem trigger: Race condition between the RDMA software layer and IBM Spectrum Scale when reading data. Symptom: Unexpected Results/Behavior Platforms affected: ALL Linux OS environments Functional Area affected: RDMA Customer Impact: Critical
Temporary fix
Comments
APAR Information
APAR number
IJ40280
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
512
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-05-25
Closed date
2022-07-20
Last modified date
2022-09-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
08 September 2022