IBM Support

HIPER: Potential Undetected Data Corruption Following Tier 3 Recovery or Abrupt Loss of Node From Cluster

Flashes (Alerts)


Abstract

Undetected data corruption may occur if host I/O activity occurs during a RAID parity resynchronization activity. This can be triggered by a Tier 3 recovery procedure or the abrupt loss of a node from the cluster while a RAID array is degraded.

Content

RAID parity resynchronization occurs following a Tier 3 Recovery procedure or when a node abruptly leaves the cluster (e.g. due to forced removal or failure) while a RAID array is degraded.
If I/O to the array occurs during this resynchronization activity this may lead to a conflict between I/O processing and parity resynchronization leading to incorrect parity values being written to the array.
If this occurs, the standard RAID automatic background scrub operation will detect this inconsistency and cause error code 1691 to be recorded in the Event Log, indicating that an inconsistency between RAID data and parity has been detected.  If an event then occurs that requires array data to be reconstructed using parity (e.g. drive failure) the reconstructed data may be corrupt which will not be detected by the system.
This issue is resolved by APAR SVAPAR-90438 in v8.5.0.8.
Important Instructions:
  • If a Tier 3 Recovery procedure or forced node removal is required on an exposed system, contact IBM Support for guidance before re-starting any I/O activities including replication.
  • If 1691 errors have been logged, do not remove any online drives or upgrade drive firmware.  Contact IBM Support for further guidance.
How can I determine whether I have already experienced this issue?
If either of the following  statements are true, it is very unlikely that your system has experienced this issue.
  1. Your system has not experienced any offline pool or offline system events in the past.
  2. Although your system has experienced an offline pool or offline system event, the system didn't experience multiple drive failures or 1691 errors logged in the 30 days following the offline event.

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"ARM Category":[{"code":"a8m3p0000006xeRAAQ","label":"Flash Systems"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR9","label":"IBM FlashSystem 5x00"},"ARM Category":[{"code":"a8m3p0000006xeRAAQ","label":"Flash Systems"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSA76Z4","label":"IBM FlashSystem 7x00"},"ARM Category":[{"code":"a8m3p0000006xeRAAQ","label":"Flash Systems"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"ARM Category":[{"code":"a8m3p0000006xeRAAQ","label":"Flash Systems"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHGUJ","label":"IBM Storwize V5000"},"ARM Category":[{"code":"a8m3p0000006xeRAAQ","label":"Flash Systems"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
07 June 2023

UID

ibm16999657