Cache device failure

Edit online

If the RPV client encounters I/O errors while trying to access the cache logical volume, then it cannot continue to process write requests.

Any cache logical volume I/O failure, even if it is just a single I/O failure, that prevents the RPV driver from being able to perform its duties or to trust the contents of the cache requires the RPV driver to declare that the cache has failed. Then, it must inform LVM that the cache device has failed. LVM keeps track of this by setting a flag that is stored with the other mirror pool attributes in the LVM metadata.

The RPV client must tell LVM to mark ALL of the physical partitions on the disk as stale. Keep in mind that all of the remote physical volumes in the same mirror pool share a single cache device, so a cache device failure affects the entire mirror pool, which means that ALL of the physical partitions in the entire mirror pool can be marked as stale.

The recovery from a cache device failure requires you to perform a set of tasks to remove and replace the cache device. You will also need to run the syncvg command to bring the stale physical partitions back up to date. It is necessary for the syncvg command to perform a full synchronization across the network. You can protect against cache device failure by using LVM mirroring (two copies of the cache logical volume) or by using a disk subsystem that has built-in data mirroring or RAID capabilities, in order to make the cache logical volume highly available and less likely to fail.