Volatile write cache detection
IBM Storage Scale Erasure Code Edition now has the ability to test if volatile write caching mode is enabled on the physical disks.
Many SCSI and NVMe drives support a volatile write caching mode in which a drive reports success back from write operations as soon as data has been received into the drive's internal cache memory. IBM Storage Scale Erasure Code Edition cannot be used with drives operating in this mode because on power failure, the cached data is lost, causing already committed data to revert to an older version. This can lead to corruption of both the RAID and file system metadata, resulting in data integrity issues.
If IBM Storage Scale Erasure Code Edition detects a drive with volatile write caching mode that is enabled, it puts the pdisk into a new volatile write cache that is enabled (VWCE) state and drains all data from the drive. If IBM Storage Scale Erasure Code Edition detects many drives with volatile write caching enabled, it stops service of the recovery group and waits for volatile write caching mode to be disabled on the drives.
The volatile write cache detection feature is enabled for all new IBM Storage Scale Erasure Code Edition installations starting from version 5.0.4. On previous installations, the feature is disabled by default and must be manually enabled in order to take advantage of the check.
ECE introduces asynchronous detection of volatile write cache in version 5.1.6. In the previous release, ECE only checked the volatile write cache mode when rediscovering disk states like RG initialization and RG master failover, or ran the command to refresh pdisk information.
Check IBM Storage Scale Erasure Code Edition cluster configuration for VWCE
IBM Storage Scale Erasure Code Edition supports volatile write cache detection from version 5.0.4, upgrade from previous versions need to enable it.
# mmdiag --config|grep nsdRAIDDiskCheckVWCE
If
nsdRAIDDiskCheckVWCE
is 1, it means enabled. IfnsdRAIDDiskCheckVWCE
is 0, it means disabled. Check all physical disks volatile write cache state before enabling it.- After making sure that all disks have disabled volatile write cache, use this command to enable
it:
# mmchconfig nsdRAIDDiskCheckVWCE=yes -i
- Rediscover the disk state with this
command:
# mmvdisk rg change --recovery-group rg_name --refresh-pdisk-info
Creation of recovery group fails if volatile write cache mode is enabled on disk
Before you install
IBM Storage Scale Erasure Code Edition and create a
recovery group, run the ece_os_readiness
tool first, it
detects volatile write cache of disks and give you warning messages. When disks have volatile write
cache mode that is enabled, creation of recovery group fails with error messages in the
/var/adm/ras/mmfs.log.latest
file.
Asynchronous detection of volatile write cache
IBM Storage Scale Erasure Code Edition introduces asynchronous detection of volatile write cache mode in version 5.1.6. The detection interval is controlled by configuration nsdRAIDDiskSMARTUpdateInterval, the default value is 24 hours. When Recovery Group is running, ECE will rediscover the disk volatile write cache mode every 24 hours. When pdisk state changes from missing to ok, ECE will also rediscover the pdisk volatile write cache mode. The pdisk will be put into VWCE mode and drain data from it if the volatile write cache enabled is detected.
Failure of disk replacement
For replacing failure disk, check and disable volatile write cache mode for the new physical disk. If volatile write cache mode is not disabled, replace command would fail.
Scale out IBM Storage Scale Erasure Code Edition by adding new node
Run the ece_os_readiness
tool first on new node, and
disable volatile write cache mode for each disk if needed before you add a node into an
IBM Storage Scale Erasure Code Edition recovery group.
What to do if volatile write cache is detected
For instructions on how to disable volatile writer caching on SCSI and NVMe disks, see Hardware checklist.