APAR status
Closed as program error.
Error description
There is a race condition that involves multiple threads performing a full-track read operation to the same track while disk errors exist. When the configuration parameter nsdRAIDClientOnlyChecksum is enabled, this race condition could create a situation where, without going through the checksum validation, data read from disks could be used for the reconstruction of data that failed to read due to disk errors.
Local fix
Disable client only checksum by running "mmchconfig nsdRAIDClientOnlyChecksum=no -i -N <server nodes or nodeclass>"
Problem summary
There is a race condition that involves multiple threads performing a full-track read operation to the same track while disk errors exist. When the configuration parameter nsdRAIDClientOnlyChecksum is enabled, this race condition could create a situation where, without going through the checksum validation, data read from disks could be used for the reconstruction of data that failed to read due to disk errors.
Problem conclusion
Benefits of the solution: Fixed the code to prevent this race condition from happening. Work Around: Disable client only checksum by running "mmchconfig nsdRAIDClientOnlyChecksum=no -i -N <server nodes or nodeclass>" Problem trigger: The race condition and the specific data buffer corruption by disk drives. Symptom: The potential final outcome could be silent data corruption, however there is intermediate signs of "Error validating buffer checksum in vdisk RG001LG002VS004 vtrack 7611..." which itself is not necessary the sign of silent data corruption. Platforms affected: Linux Only Functional Area affected: GNR Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ53784
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
522
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2025-03-03
Closed date
2025-03-03
Last modified date
2025-03-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"522","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]
Document Information
Modified date:
04 March 2025