IBM Support

Potential Incorrect Data Issue on Systems Running V7.7.1.7 or V7.8.1.3

Flashes (Alerts)


Abstract

Systems running V7.7.1.7 or V7.8.1.3 are exposed to a timing window issue that may result in incorrect data being written to volumes.

Content


An issue in V7.7.1.7 and V7.8.1.3 may cause incorrect data to be written to volumes . No other PTFs or major releases are affected.

Actions Required

Any clients with systems running v7.7.1.7 or v7.8.1.3 must take the following actions as soon as possible:

  1. Use a detection tool, to confirm whether the system is currently affected by this issue.
  2. Upgrade the system to v7.7.1.8 or v7.8.1.4 (even if the system is not currently affected).
  3. If upgrade is not possible immediately, take steps detailed in Technote to avoid further issues until an upgrade can be completed.

Refer to the following Technote for instructions on how to download and use this tool:
http://www.ibm.com/support/docview.wss?uid=ssg1S1010888


Symptoms

The following symptoms may be experienced:

• Areas of volumes which have received writes containing all-zero data may contain non-zero data.
• Reading unallocated regions of thin-provisioned or compressed volumes may return non-zero data.
• A node warmstart may occur if an affected node writes quorum data and a subsequent read from the quorum disk contains unexpected non-zero data.
• RAID arrays may log 1691 errors indicating that the parity does not match the data on disk
• There are a number of known host or application faults that may occur; these are detailed below.


Trigger

This issue is triggered only if the following conditions all occur at the same time:

1. A host has issued a write of all-zeroes that is still in progress to any volume.
2. Two non-zero writes are received in the same 32KB region in the same volume, which are not aligned on a 4KB boundary.
3. An I/O quiesce (pause) occurs for the volume being written to. More details of what can trigger a quiesce can be found below.


When the above three conditions occur, this exposes a timing window in which the non-zero write data can be copied into a special area of memory which is supposed to contain zeroes, leading to the symptoms listed above. Within the same cluster, some nodes may be affected while others remain unaffected.


Workaround

There is no workaround available to prevent exposure to this issue. Systems running V7.7.1.7 or V7.8.1.3 should be upgraded to a fixed level as soon as possible.



Events that can trigger the quiesce needed to cause the issue

The following events can cause an IO quiesce. Please note that this list is not exhaustive, but covers the most common scenarios.
• Starting or Stopping a FlashCopy
• Use of Global Mirror with Change Volumes
• Using Metro or Global Mirror consistency protection
• Adding or Removing a Volume Mirror
• Adding or removing Hyperswap Volume copies
• Enabling or Disabling cache for a volume
• Moving a volume to another IO group
• Changing the preferred node of a volume
• Volume going offline due to out of space
• Volume entering the state "empty" due to no writes occurring for over 80 minutes
• Volume entering the state "not_empty" due to writes occurring to a volume that was in the empty state


Host or application failure scenarios

The following is a list of all known errors that can be observed on a host if this issue has occurred. This list is not exhaustive.

• Windows attempting to write a partition table to a new volume will fail, and this disk will appear offline in Disk Manager. Note – this failure occurs before the filesystem is created.
• VMWare attempting to add a volume to a datastore.
• Database expansions may fail with errors.
• Formatting a volume with zero data from the host may cause non-zero data to be written to the disk instead of writing zeros.
• Filesystem checking tools such as fsck and similar may report errors.


Fix

APAR HU01706 in the V7.7.1.8 and V7.8.1.4 PTF releases will prevent any further instances of incorrect data being written, however please note that this will not identify or correct any data that has previously been incorrectly written due to this issue.

[{"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"7.8.1","Platform":[{"code":"","label":"IBM Storwize V7000"}],"Version":"7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STPVGU","label":"SAN Volume Controller"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"7.8.1","Platform":[{"code":"","label":"SAN Volume Controller"}],"Version":"7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STHGUJ","label":"IBM Storwize V5000"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"7.8.1","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STLM5A","label":"IBM Storwize V3700 (2072)"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"7.8.1","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STLM6B","label":"IBM Storwize V3500 (2071)"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"7.8.1","Platform":[{"code":"PF025","label":"Platform Independent"},{"code":"PF025","label":"Platform Independent"}],"Version":"7.7.1;7.8.1;7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"STKMQV","label":"IBM FlashSystem V9000"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":" ","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.7;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"SS4S7L","label":"IBM Spectrum Virtualize Software"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.7.1;7.8.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
28 March 2023

UID

ssg1S1010879