IBM Support

Potential silent corruption of data that impacts Erasure Code Edition and IBM Storage Scale Systems

Notification


Risk classification

HIPER (High Impact and/or Pervasive)

Risk categories

Data Loss

Abstract

IBM has identified an issue in IBM Storage Scale 5.1.7.0 - 5.1.9.8 (IBM Storage Scale System 6.1.8.0 - 6.1.9.5) and IBM Storage Scale 5.2.0.0 - 5.2.2.0 (IBM Storage Scale System 6.2.0.0 - 6.2.2.0) that impacts IBM Storage Scale Erasure Code Edition (IBM Storage Scale ECE) and IBM Storage Scale System. The issue is a race condition that involves multiple threads performing a full-track read operation to the same track while disk errors exist. When the configuration parameter nsdRAIDClientOnlyChecksum is enabled, this race condition could create a situation where, without going through the checksum validation, data read from disks could be used for the reconstruction of data that failed to read due to disk errors. 

Description

The race condition only occurs when all the following conditions are true: 

- The system is running IBM Storage Scale ECE or IBM Storage Scale System.

- The system is running these code levels: IBM Storage Scale 5.1.7.0 through 5.1.9.8 (IBM Storage Scale System 6.1.8.0 through 6.1.9.5) and IBM Storage Scale 5.2.0.0 through 5.2.2.0 (IBM Storage Scale System 6.2.0.0 through 6.2.2.0).

- nsdRAIDClientOnlyChecksum is enabled.
Note: In recent IBM Storage Scale System configurations, the default is to have it enabled. 

- Multiple threads are simultaneously performing full-track read operations to the same track.

- Disk errors or buffer trailer validation errors are affecting the reading of the corresponding strip data.

With these conditions, it is possible that data read from other strips will not be evaluated with the buffer checksum. Therefore, if there is silent data corruption within the buffer, it could be amplified to other areas in the same vtrack.

Note: If the corruption affects information from other GNR buffer trailer or descriptors (such as track ID), the buffer trailer checksums are always checked and are not subject to this condition while a GNR buffer validation is being performed. 

Users Affected: 
This issue may affect clients that run IBM Storage Scale ECE or IBM Storage Scale System with the environment configuration parameter nsdRAIDClientOnlyChecksum enabled on the following versions of IBM Storage Scale:

  • IBM Storage Scale 5.1.7.0 through 5.1.9.8 (IBM Storage Scale System 6.1.8.0 through 6.1.9.5)
  • IBM Storage Scale 5.2.0.0 through 5.2.2.0 (IBM Storage Scale System 6.2.0.0 through 6.2.2.0)

Problem determination: 
Given that this is a race condition, definitively determining if the problem has happened will be difficult. However, these entries have to be in the GPFS log (/var/adm/ras) to allow the race condition to occur: 

2025-02-10_20:17:09.569-0500: [E] Error validating buffer checksum in vdisk RG001LG002VS004 vtrack 7611 data segment 12 pdisk e1s17 psector 6135481200 vsector 15587503. 
2025-02-10_20:17:09.570-0500: [E] Error validating buffer checksum in vdisk RG001LG002VS004 vtrack 7611 data segment 16 pdisk e1s17 psector 6135481264 vsector 15587567. 

Reference ID

Internal reference: D.336303

Date first published

17 March 2025

[{"Risk Classification":"HIPER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSP944","label":"IBM Storage Scale System"},"ARM Category":[{"code":"a8m50000000KzdsAAC","label":"GPFS"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
30 May 2025

UID

ibm17184104