IBM Support

Potential undetected data loss in a small number of specific RAID array configurations

Flashes (Alerts)


Abstract

An issue in the RAID software can cause undetected data loss when a drive fails in a small number of Distributed RAID configurations.

There is only one configuration that has been observed in the field data that is exposed to this issue, which is a distributed RAID 6 array with 70 drives, 3 rebuild areas, 2.2 TiB Drives and a 256 kiB strip size .

This issue was introduced in v8.3.1.0 in February 2020, and has been seen 3 times to date. Systems running v8.3.0 or earlier are not exposed to this issue.

Content

The data corruption is caused when certain drives fail and the rebuild starts. Even in exposed RAID configurations, only the failure of specific drives in the array triggers the data corruption.
If a single drive failure occurs on Distributed RAID 6, and the system hits APAR HU02418, the system will start logging 1691 errors after the rebuild completes. Due to the design of RAID,  there is no possibility of detecting this issue for single drive failures in RAID 5 or double drive failures in RAID 6.
IBM's analysis indicates that there are eight theoretical configurations that might be configured automatically by the GUI.  There are a further 1,000 exposed configurations (out of over 680,000 possible configurations) which are possible but are not used by the recommended configurations in the GUI.
Due to the number of potential configurations, IBM added a check to the software upgrade test utility to validate whether the system has one of the exposed configurations. Customers are advised to use the "test only" feature of the GUI to run the upgrade test utility to validate whether their systems are exposed.
  • Systems with an affected configuration running v8.2.1 or earlier are prevented from upgrading to a release without the fix.
  • Systems with an affected running an affected code level will see an error message indicating that the system is exposed to APAR HU02418 and linking back to this document.  This error message will not prevent upgrades if the system is already running an exposed code level
This issue was introduced in v8.3.1.0 and has been resolved under APAR HU02418 in v8.3.1.6, v8.4.0.5 and v8.4.2.1.
If your system configuration is exposed to this issue and you are running one of the affected software levels, contact IBM support to receive an ifix containing the fix.

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"ARM Category":[{"code":"a8m0z000000bqPmAAI","label":"RAID"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"ARM Category":[],"ARM Case Number":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSA76Z4","label":"IBM FlashSystem 7x00"},"ARM Category":[],"ARM Case Number":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR9","label":"IBM FlashSystem 5x00"},"ARM Category":[{"code":"a8m0z000000bqPmAAI","label":"RAID"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR9","label":"IBM FlashSystem 5000"},"ARM Category":[],"ARM Case Number":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHGUJ","label":"IBM Storwize V5000"},"ARM Category":[{"code":"a8m0z000000bqPmAAI","label":"RAID"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"ARM Category":[{"code":"a8m0z000000bqPmAAI","label":"RAID"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STPVGU","label":"SAN Volume Controller"},"ARM Category":[{"code":"a8m0z000000bqPmAAI","label":"RAID"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"STKMQV","label":"IBM FlashSystem V9000"},"ARM Category":[{"code":"a8m0z000000bqHrAAI","label":"Flash Module->Raid \/ Array \/ Mdisk"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
28 March 2023

UID

ibm16491501