IBM Support

Potential Undetected Data Corruption or Data Loss After Successive AC Power Outages

Flashes (Alerts)


Abstract

Potential Undetected Data Corruption or Data Loss After Successive AC Power Outages

In rare cases, systems using FlashCore Modules (FCMs) can experience undetected data corruption or data loss if there is an AC Power disturbance while the system is still performing I/O operations.

Content

Description
IBM has identified a rare combination of events that can lead to undetected data corruption or data loss, when a system loses AC power or experiences successive graceful shut downs by the user.
If this rare combination of events occurs, the drive may mark itself as uninitialized. This results in the drive being formatted without the RAID software being aware that the drive's data has been lost. As a result, incorrect data can be returned to hosts. The system may report NVME drive ports offline due to errors.

Systems on FCM firmware 1_2_7 and below are vulnerable to this issue.
Systems Affected:
9846-AF7 FlashSystem 9100     9848-AF7 FlashSystem 9100
9846-AF8 FlashSystem 9150     9848-AF8 FlashSystem 9150
9846-UF7 FlashSystem 9100     9848-UF7 FlashSystem 9100
9846-UF8 FlashSystem 9150     9848-UF8 FlashSystem 9150
2077-424 V5100         2078-424 V5100
2077-AF4 V5100          2078-AF4 V5100
2076-724 V7000 Gen3
Mitigation
  • If you experience an AC power loss, please contact IBM Support for assistance to determine the state of your system.
  • For test environments, it's recommend that one should wait ten (10) seconds after replugging/reseating an FCM drive before one unplugs the FCM again to prevent the possibility of triggering this issue.

A fix for this issue is included in Drive Microcode Packages dated 191110 or later.
The latest Drive Microcode Package, along with the improvements listed in the Release Notes,  can be downloaded from IBM Fix Central:

Determining the FCM firmware level:

  • Using the GUI
    • Open Pools -> Internal Storage.
    • Right click on one of the table headers (e.g. Use) and check the Firmware Level box
    • The firmware level is now added to the table.
       
  • Alternately, download, install and run the latest software upgrade test utility - using the "Test Only" button in Settings -> Update System which will list any drives with downlevel firmware.

Upgrading the FCM Firmware

  • The following URL provides detail for how to upgrade FCM drives:
     https://www.ibm.com/support/knowledgecenter/en/STSLR9_8.2.1/com.ibm.fs9100_821.doc/fs9100_upgrade_nvme_firmware.html
    Note: The above procedure applies to all three systems:  FlashSystem 9100, Storwize V5100 and Storwize V7000
     

  • If the system is running V8.2.1.3 or earlier, and contains FCMs on 1_2_7 or below:
    • Ensure you are using Upgrade Test Utility V29.23 or later
    • Run the Upgrade Test Utility
    • Upgrade the system firmware to 8.2.1.6 (or higher 8.2.1 PTFs) - NOT 8.3.0
    • Upgrade the FCMs to 1_2_9  following the guidance in URL hot-link above.
       
  • If the system is running V8.2.1.4 or later, and contains FCMs on firmware 1_2_7 or below:
    • Upgrade the FCMs to 1_2_9  following the guidance in the URL hot-link above.
  • Each FCM upgrade will take around 5 minutes, plus additional time between each drive to allow the system to stabilize.
     
  • The firmware update is non-disruptive, and can be completed concurrently with host I/O.  As with all drive firmware upgrades, it is advisable to install at a time of lower I/O workload, to ensure host I/O performance is not affected while drives upgrade.

  • The GUI cannot be used to upgrade multiple FCM drives at the same time;  use the CLI.  If using the CLI to upgrade firmware on multiple FCMs, it is a different procedure than the one used for SAS drives, and is documented in the following Knowledge Center page:
    https://www.ibm.com/support/knowledgecenter/en/STSLR9_8.2.0/com.ibm.fs9100_820.doc/fs9100_utilitydriveupgrade.html
    Note: The above procedure applies to all three systems: FlashSystem 9100, Storwize V5100 and Storwize V7000.

Note:  During an NVMe upgrade, in a system with multiple IO groups, running code levels 8.3.0.1 or earlier, or 8.2.1.10 or earlier, there is a small chance that one or more of the drives may fail to rejoin the array after the upgrade has completed.
If the command lsarraymemberprogress is showing 0% for the resync progress for more than 10 minutes, contact IBM Support and mention APAR HU02114.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHGUJ","label":"IBM Storwize V5000"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
28 March 2023

UID

ibm11075965