IBM Support

Backup of VMware VMs can be corrupted, and thus not restorable, if a snapshot exists when CBT is enabled

Flashes (Alerts)


Abstract

A VMware virtual machine backed up by Data Protection for VMware might be unrecoverable if a snapshot of the VM existed at the time Changed Block Tracking was enabled. This problem is documented in APAR IT26261.

Content

Problem description

A behavioral change in vSphere 6.5 and 6.7 can cause VMware to present invalid Changed Block Tracking (CBT) data to backup applications during a backup operation. As a result of this behavioral change, it could be possible to have undetected backup data corruption. When this issue occurs, the backup copy, and all future incremental backup copies, are corrupted.
CBT is the VMware mechanism that enables incremental backup operations for VMware virtual machines (VMs). IBM Spectrum Protect for Virtual Environments: Data Protection for VMware and Tivoli Storage Manager: Data Protection for VMware use CBT to implement an "incremental forever" backup strategy.
When using IBM Spectrum Protect for Virtual Environments: Data Protection for VMware or Tivoli Storage Manager for Virtual Environments: Data Protection for VMware to back up VMware virtual machines in vSphere 6.5 or 6.7, the invalid CBT data leads to corrupted backup copies, even though the backup is reported as successful. The corruption is generally not detectable until a restore is attempted. The symptoms of backup data corruption can vary, depending on which portions of the backup data are corrupted. These are some possible symptoms of this problem:
  • No data for the VM is restored
  • The restored VM is not bootable
  • The VM is bootable, but operating system behavior is erratic
  • Application behavior is erratic
  • Data on the restored VM is corrupted
This problem can occur during backup of a given VM when all of these conditions are true:
  • The VM is hosted by vSphere 6.5 or 6.7
  • CBT was enabled for the VM at the time the VM was hosted by vSphere 6.5 or 6.7 (*)
  • A snapshot for the VM existed at the time CBT was enabled
(*) Notes:
  • Data Protection for VMware enables CBT for a VM during the first backup of that VM if CBT was not already enabled by a vSphere administrator or another application.
  • This condition is false if CBT was enabled while the VM was hosted by an earlier vSphere version, and the vSphere environment was subsequently upgraded to vSphere 6.5 or 6.7.
When all of the above conditions are true, the CBT data returned to the backup application is empty or comprised of random blocks. There is no indication that the data is invalid, so the invalid CBT data is used by the backup application, leading to corrupted backup data.
vSphere versions prior to 6.5 do not have this issue because instead of returning invalid CBT information, an error condition is returned to the backup application. The backup application creates a full backup and then resets the CBT, after which subsequent incremental backups are integral.

Levels affected

This problem affects all Data Protection for VMware versions that support vSphere 6.5 and 6.7, as noted here:
  • IBM Spectrum Protect for Virtual Environments: Data Protection for VMware versions 8.1.0.0 - 8.1.6.1 (all versions support vSphere 6.5; vSphere 6.7 support begins with version 8.1.6.0)
  • Tivoli Storage Manager for Virtual Environments: Data Protection for VMware 7.1.8.0 - 7.1.8.4 (supports vSphere 6.5 only)

Immediate action

Summary

For each VM with affected or potentially affected backup copies, remove any existing snapshots, reset CBT to force the next backup to be full, then back up the VM. Full backups usually run much longer than incremental backups, so plan accordingly. After the full backup finishes, incremental backups will resume. The result is a fresh sequence of "incremental forever" backups that are not affected by this problem. See sub-section "Detail" below for further information.
Implement procedural policies that adhere to VMware's recommendation that no snapshots exist on a VM before enabling CBT on that VM.

Detail

1. Assess whether your VMware backup copies might be affected by this problem. See section "How to determine if your VM backup copies are affected" below.
2. Install the fixing Data Protection for VMware product level when it is available. See section "Fix" below for information about fix availability. If the fix is not yet available, or if the fix cannot be installed immediately, proceed to the "If the Data Protection for VMware fix is not installed" sub-section below.
3. Follow the steps in 3 (i) or 3 (ii) below, depending on whether the Data Protection for VMware fix is installed.
3 (i). If the Data Protection for VMware fix is installed
a. Ensure that no snapshots exist on your VMs.
b. Identify the affected VMs for which CBT will be reset during the next backup.
c. For each VM identified in step b, add the following setting to the client options file you use for running VM backups. These setting will reset the VM's CBT during the next backup operation, resulting in a full backup.
include.vmresetcbt "vm_name"
You can also use wildcard characters '*' (asterisk) and '?' (question mark) to match multiple VM names.
Example: Reset CBT on all VMs with names that start with HR:
include.vmresetcbt "HR*"
Example: Reset CBT on all VMs with names that start with CORP, followed by any one character, followed by "UK", and then followed by any other characters:
include.vmresetcbt "CORP?UK*"
Note: An alternative method to reset a VM's CBT is to to use VMware PowerCLI commands as described in VMware knowledge base article Enabling or disabling Changed Block Tracking (CBT) on virtual machines (1031873) . After the CBT is reset, the next backup for the VM will be full.
d. Back up the affected VMs. The backup processing time will be longer than usual because this is a full backup, so plan accordingly.
e. When the backups are complete, remove the include.vmresetcbt settings from the client options file that were added in step c. This step is inapplicable if you used VMware PowerCLI commands to reset the CBT.
f. Repeat steps a-e untill all affected vms have their CBT reset.
g. Resume normal daily incremental-forever backups.
3 (ii). If the Data Protection for VMware fix is not installed
a. Ensure that no snapshots exist on your VMs.
b. Add the following setting to the client options file you use for running VM backups. This setting will reset the VM's CBT during the next backup operation, resulting in a full backup.
testflags vmbackup_cbt_reset
Note: An alternative method to reset a VM's CBT is to to use VMware PowerCLI commands as described in VMware knowledge base article Enabling or disabling Changed Block Tracking (CBT) on virtual machines (1031873) . After the CBT is reset, the next backup for the VM will be full.
c. Back up the affected VMs. The backup processing time will be longer than usual because this is a full backup, so plan accordingly.
d. When the backups are complete, remove the testflags vmbackup_cbt_reset setting from the client options file that was added in step b. This step is inapplicable if you used VMware PowerCLI commands to reset the CBT.
e. Resume normal daily incremental-forever backups.

How to determine if your VM backup copies are affected

Unless it is certain that no snapshots existed at the time CBT was enabled, the only way to determine if a given VM backup is affected is to test a restore of the VM. Here is an outline of steps to consider for testing a VM restore:
1. Restore the VM to an alternate location.
2. Start the virtual machine.
3. Run the disk check utility for the VM's operating system, such as fsck for Linux and chkdsk for Microsoft Windows, on each of the VM's disks.
4. Verify the correct operation of applications installed on the VM.
5. Verify the integrity of other user data on the VM.
Thorough restore testing of all VMs is probably not practical. If you cannot be certain that no snapshots existed at the time CBT was enabled, then consider treating all VM backups as being potentially affected.

Fix

This problem is fixed in these product levels:
  • IBM Spectrum Protect for Virtual Environments: Data Protection for VMware version 8.1.7.0. Current target availability is February 22, 2019.
  • Tivoli Storage Manager for Virtual Environments: Data Protection for VMware version 7.1.8.5. Current target availability is March 31, 2019.
After applying the fix, follow the steps in section "Immediate action" above, if those steps have not already been followed.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSERB6","label":"IBM Spectrum Protect for Virtual Environments"},"Component":"Data Protection for VMware","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"8.1.0.x;8.1.2.x;8.1.4.x;8.1.6.0;8.1.6.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS8TDQ","label":"Tivoli Storage Manager for Virtual Environments"},"Component":"Data Protection for VMware","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"7.1.8.0;7.1.8.2;7.1.8.3;7.1.8.4","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
13 February 2019

UID

ibm10869964