Volatile write cache detection

IBM Spectrum Scale Erasure Code Edition now has the ability to test if volatile write caching mode is enabled on the physical disks.

Many SCSI and NVMe drives support a volatile write caching mode in which a drive reports success back from write operations as soon as data has been received into the drive's internal cache memory. IBM Spectrum Scale Erasure Code Edition cannot be used with drives operating in this mode because on power failure, the cached data is lost, causing already committed data to revert to an older version. This can lead to corruption of both the RAID and file system metadata, resulting in data integrity issues. If IBM Spectrum Scale Erasure Code Edition detects a drive with volatile write caching mode enabled, it puts the pdisk into a new volatile write cache enabled (VWCE) state and drains all data from the drive. If IBM Spectrum Scale Erasure Code Edition detects a large number of drives with volatile write caching enabled, it stops service of the recovery group and waits for volatile write caching mode to be disabled on the drives.

The volatile write cache detection feature is enabled for all new IBM Spectrum Scale Erasure Code Edition installations starting from version 5.0.4. On previous installations, the feature is disabled by default and must be manually enabled in order to take advantage of the check.

Check IBM Spectrum Scale Erasure Code Edition cluster configuration for VWCE

IBM Spectrum Scale Erasure Code Edition supports volatile write cache detection from version 5.0.4, upgrade from previous versions need to enable it.

Use the following commands to check the current IBM Spectrum Scale Erasure Code Edition configuration for VWCE detection:
  1. # mmdiag --config|grep nsdRAIDDiskCheckVWCE

    If nsdRAIDDiskCheckVWCE is 1, it means enabled. If nsdRAIDDiskCheckVWCE is 0, it means disabled. Check all physical disks volatile write cache state before enabling it.

  2. After making sure that all disks have disabled volatile write cache, use this command to enable it:
    # mmchconfig nsdRAIDDiskCheckVWCE=yes -i
  3. Rediscover the disk state with this command:
    # mmvdisk rg change --recovery-group rg_name --refresh-pdisk-info

Creation of recovery group will fail if volatile write cache mode is enabled on disk

Before you install IBM Spectrum Scale Erasure Code Edition and create a recovery group, run the SpectrumScale_ECE_OS_READINESS tool first, it will detect volatile write cache of disks and give you warning messages. When disks have volatile write cache mode enabled, creation of recovery group will fail with error messages in the /var/adm/ras/mmfs.log.latest file.

Failure of disk replacement

For replacing failure disk, check and disable volatile write cache mode for the new physical disk. If volatile write cache mode is not disabled, replace command will fail.

Scale out IBM Spectrum Scale Erasure Code Edition by adding new node

Run the SpectrumScale_ECE_OS_READINESS tool first on new node, and disable volatile write cache mode for each disks if needed before adding a node into an IBM Spectrum Scale Erasure Code Edition recovery group.

What to do if volatile write cache is detected

For instructions on how to disable volatile writer caching on SCSI and NVMe disks, see Hardware checklist.