IBM Support

H211769 : Specific disk replacement procedure must be followed to prevent potential data loss in a dynamic disk pool

Troubleshooting


Problem

RETAIN tip: H211769

Symptom

When an optimal disk (that is, one that has not been marked failed) in a Dynamic Disk Pool is pulled and replaced with no host I/O occurring, the controller firmware will not start a reconstruction of data to the new disk.

When host I/O resumes to a Thin Provisioned Volume (TPV), it is possible that an internal failure will occur causing the TPV to fail and data to subsequently be lost. There is a very small possibility that the TPV would stay optimal after the data has already been lost.

When host I/O resumes to a standard Redundant Array of Independent Disks (RAID) volume, no notification will be apparent, but data will be lost.

If host I/O is in progress when the drive is pulled and replaced, then this issue will not occur.

Affected configurationss

The system may be any of the following IBM servers:

  • IBM System Storage DCS3700 Storage Subsystem, type 1818, any model
  • IBM System Storage DCS3860 Storage Subsystem, type 1813, any model
  • IBM System Storage DS3512, type 1746, any model
  • IBM System Storage DS3524, type 1746, any model

This tip is not software specific.

This tip is not option specific.

The following system firmware level(s) are affected: controller firmware 7.84, 7.86


Solution

The fix for this issue is contained in controller firmware 7.84.53.00 and later releases and 7.86.39.00 and later releases.

These files are available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:

 http://www.ibm.com/support/fixcentral/

Workaround

To avoid encountering this issue, ensure that before replacing a disk, the disk is first failed by using DS Storage Manager. Following the manual failure of the disk, its fault Light Emitting Diode (LED) should be lit, indicating the drive is safe to remove. Wait 60 seconds between pulling the failed disk and replacing it with a new disk.

[{"Product":{"code":"SSUUKF","label":"IBM System Storage DCS3860"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Product":{"code":"HW28S","label":"Disk systems->DS3500 (DS3512, DS3524)"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}},{"Product":{"code":"HW28U","label":"Disk systems->DCS3700"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
16 September 2022

UID

ssg1S1004773