Problem description
A behavioral change in vSphere 6.5 and 6.7 can cause VMware to present invalid Changed Block Tracking (CBT) data to backup applications during a backup operation. As a result of this behavioral change, it could be possible to have undetected backup data corruption. When this issue occurs, the backup copy, and all future incremental backup copies, are corrupted.
CBT is the VMware mechanism that enables incremental backup operations for VMware virtual machines (VMs). IBM Spectrum Protect Plus uses CBT to implement its incremental backup strategy.
When using IBM Spectrum Protect Plus to back up VMware virtual machines in vSphere 6.5 or 6.7, the invalid CBT data leads to corrupted backup copies, even though the backup is reported as successful. The corruption is generally not detectable until a restore is attempted. The symptoms of backup data corruption can vary, depending on which portions of the backup data are corrupted. These are some possible symptoms of this problem:
- No data for the VM is restored
- The restored VM is not bootable
- The VM is bootable, but operating system behavior is erratic
- Application behavior is erratic
- Data on the restored VM is corrupted
This problem can occur during backup of a given VM when all of these conditions are true:
- The VM is hosted by vSphere 6.5 or 6.7
- CBT was enabled for the VM at the time the VM was hosted by vSphere 6.5 or 6.7 (*)
- A snapshot for the VM existed at the time CBT was enabled
(*) Notes:
- IBM Spectrum Protect Plus enables CBT for a VM during the first backup of that VM is CBT was not already enabled by a vSphere administrator or another application.
- This condition is false if CBT was enabled while the VM was hosted by an earlier vSphere version, and the vSphere environment was subsequently upgraded to vSphere 6.5 or 6.7.
When all of the above conditions are true, the CBT data returned to the backup application is empty or comprised of random blocks. There is no indication that the data is invalid, so the invalid CBT data is used by the backup application, leading to corrupted backup data.
vSphere versions prior to 6.5 do not have this issue because instead of returning invalid CBT information, an error condition is returned to the backup application. The backup application creates a full backup and then resets the CBT, after which subsequent incremental backups are integral.
Levels affected
This problem affects all IBM Spectrum Protect Plus versions that support vSphere 6.5 and 6.7, as noted here:
- IBM Spectrum Protect Plus versions 10.1.0 - 10.1.2.247 (all versions support vSphere 6.5; vSphere 6.7 support begins with version 10.1.2)
Immediate action
Summary
For each VM with affected or potentially affected backup copies, remove any existing snapshots, reset CBT, force the next backup to be full, then back up the VM. Full backups usually run much longer than incremental backups, so plan accordingly. After the full backup finishes, incremental backups will resume. The result is a fresh sequence of "incremental forever" backups that are not affected by this problem. See sub-section "Detail" below for further information.
Implement procedural policies that adhere to
VMware's recommendation
that no snapshots exist on a VM before enabling CBT on that VM.
Detail
1. Assess whether your VMware backup copies might be affected by this problem. See section "How to determine if your VM backup copies are affected" below.
2. Perform a full base backup of all VMware VMs
3. Choose Method 1 or Method 2, based on your preference.
Method 1
Migrate the VMs from an existing SLA to a new SLA:
a. Ensure that no snapshots exist on your VMs.
c. Create a new SLA in the "Policy Overview" screen.
d. Remove each VM from the existing SLA, then assign it to the new SLA.
Note: New target volumes will be created on the vSnap server, leading to increased space utilization. However, space will be reclaimed as backups from the old SLA expire. When all retentions expire from the old SLA, the old volumes are removed.
Method 2
a. Ensure that no snapshots exist on your VMs.
c. On the Policy Options on the backup screen, enter a list of VMs, each separated by a semicolon, on "Force full backup of resources".
Note: This will not create new target volumes. The existing VMDKs (Virtual Machine Disks) will be overwritten, which reduces the space requirement.
How to determine if your VM backup copies are affected
Unless it is certain that no snapshots existed at the time CBT was enabled, the only way to determine if a given VM backup is affected is to test a restore of the VM. Here is an outline of steps to consider for testing a VM restore:
1. Perform an instant virtualize operation of the last recovery point.
2. Start the virtual machine.
3. Run the disk check utility for the VM's operating system, such as fsck for Linux and chkdsk for Microsoft Windows, on each of the VM's disks.
4. Verify the correct operation of applications installed on the VM.
5. Verify the integrity of other user data on the VM.
Thorough restore testing of all VMs is probably not practical. If you cannot be certain that no snapshots existed at the time CBT was enabled, then consider treating all VM backups as being potentially affected.
Fix
This problem is fixed in these product levels:
- IBM Spectrum Protect Plus version 10.1.2 update 271. Current target availability is Feb 2019.
- IBM Spectrum Protect Plus version 10.1.3. Current target availability is Feb 2019.
After applying the fix, follow the steps in section "Immediate action" above, if those steps have not already been followed.