Backup issues

List of backup issues in the Backup & Restore service of IBM Storage Fusion.

Failed to create snapshot content

Problem statement
Failed to create snapshot content with the following error:

Cannot find CSI PersistentVolumeSource for directory-based static volume

Resolution
To resolve the error, see https://www.ibm.com/docs/en/scalecsi/2.10?topic=snapshot-create-volumesnapshot.

Assign a backup policy operation fails

Problem statement
If you have a PolicyAssignment for an application on the hub and you create a PolicyAssignment for the same application on the spoke, then your attempt to assign a backup policy for the application fails. In both assignments, the application, backup policy, and short-form cluster name are the same. The current format of the PolicyAssignment CR name is appName-backupPolicyName-shortFormClusterName. The issue happens when the first string of the cluster names is identical. In this scenario, the creation gets rejected because the PolicyAssignment name exists in OpenShift® Container Platform.

For example:

Hub assignment creates app1-bp1-apps:
  • Application - app1
  • BackupPolicy - bp1
  • AppCluster - apps.cluster1
Spoke assignment creates app1-bp1-apps (The OpenShift Container Platform rejects it)
  • Application - app1
  • BackupPolicy - bp1
  • AppCluster - apps.cluster2
Resolution
To create the PolicyAssignment for the spoke application, delete the PolicyAssignment CR for the hub application assignment and attempt spoke application assignment again.

Backups do not work as defined in the backup policies

Problem statement
Sometimes, backups do not work as defined in the backup policies, especially when you set hourly policies. For example, if you set a policy for two hours and it does not run every two hours, then gaps exist in the backup history. The possible reason might be that during pod crash and restart, scheduled jobs were not accounting for the time zone, causing gaps in run intervals.
Diagnosis
The following are the observed symptoms:
  • Policies with custom every X hour at minute YY schedules: the first scheduled run of this policy will run at minute YY after X hours + time zone offset from UTC instead of at minute YY after X hours.
  • Monthly and yearly policies run more frequently.
Resolution
You can start backups manually until the next scheduled time.

Backup & Restore service deployed in IBM Cloud Satellite

Problem statement
You can encounter an error when you attempt backup operation on IBM Storage Fusion Backup & Restore service that is deployed in IBM Cloud® Satellite.
Diagnosis
Backup operations fail with the following log entries:

level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=pods, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name>
level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=replicasets.apps, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name>
level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=deployments.apps, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name>
Cause
An issue exists with the default OADP plug-in and it must be disabled to continue.
Resolution

Do the following steps to disable the plug-in:

  1. In the OpenShift console, go to Administration > CustomerResourceDefinitions.
  2. Search for the CustomResourceDefiniton DataProtectionApplication.
  3. In the Instances tab, locate the instance that is named velero.
  4. Open the YAML file in edit mode for the instance.
  5. Under the entry spec:velero:defaultPlugins, remove the line for openshift.
  6. Save the YAML file.

Backup jobs are stuck in a running state for a long time and are not canceled

Resolution
Do the following steps to resolve the issue:
  1. Ensure that all jobs are finished and the queue is empty before you do any disruptive actions like node restarts.
  2. If jobs are running for a long period and do not progress, follow the steps to delete the backup or restore CR directly.
    1. Log in to IBM Storage Fusion.
    2. Go to Backup & Restore > Jobs > Queue and get the name of the job that is stuck.
    3. Run the following command to delete backup job.
      oc delete fbackup <job_name>
    4. Run the following command to delete restore job.
      oc delete frestore <job_name>

Policy creation

Problem statement
Sometimes, when you create a backup policy, the following errors can occur:
Error: Policy daily-snapshot could not created. 
Resolution
Restart the isf-data-protection-operator-controller-manager-* pod in IBM Storage Fusion namespace. It triggers the recreation of the in-place-snapshot BackupStorageLocation CR.

Policy assignment from Backup & Restore service page of the OpenShift Container Platform console

Problem statement
In the Backup & Restore service page of the OpenShift Container Platform console, the backup policy assignment to an application fails with a gateway timeout error.
Resolution
Use your IBM Storage Fusion user interface.

Backup of multiple VMs attempt is failed

Problem statement
This issue occurs when some VMs are in a migrating state. The OpenShift Container Platform does not support snapshot of the VMs in migrating state.
Resolution
Follow the steps to resolve this issue:
  1. Check whether the virtual machine is in a migrating state:
  2. Run the following command to check migrating VM.
    oc get virtualmachineinstancemigrations -A
    Example output:
    NAMESPACE            NAME                                          PHASE         VMI
    fb-bm1-fs-1-5g-10    rhel8-lesser-wildcat-migration-8fhbo          Failed        rhel8-lesser-wildcat
    vm-centipede-bm2     centos-stream9-chilly-hawk-migration-57jyk    Failed        centos-stream9-chilly-hawk
    vm-centos9-bm1-1     centos-stream9-instant-toad-migration-bfyz6   Failed        centos-stream9-instant-toad
    vm-centos9-bm1-1     centos-stream9-instant-toad-migration-d9547   Failed        centos-stream9-instant-toad
    vm-windows10-bm2-1   kubevirt-workload-update-4dm57                Failed        win10-zealous-unicorn
    vm-windows10-bm2-1   kubevirt-workload-update-f2s5w                Failed        win10-zealous-unicorn
    vm-windows10-bm2-1   kubevirt-workload-update-gt6nj                Failed        win10-zealous-unicorn
    vm-windows10-bm2-1   kubevirt-workload-update-rjwmn                Failed        win10-zealous-unicorn
    vm-windows10-bm2-1   kubevirt-workload-update-vfxfl                TargetReady   win10-zealous-unicorn
    vm-windows10-bm2-1   kubevirt-workload-update-z2thw                Failed        win10-zealous-unicorn
    vm-windows11-bm2-1   kubevirt-workload-update-9gr6v                Failed        win11-graceful-coyote
    vm-windows11-bm2-1   kubevirt-workload-update-clbck                Failed        win11-graceful-coyote
    vm-windows11-bm2-1   kubevirt-workload-update-j6pmx                Failed        win11-graceful-coyote
    vm-windows11-bm2-1   kubevirt-workload-update-sfbbx                Pending       win11-graceful-coyote
    vm-windows11-bm2-1   kubevirt-workload-update-th5dd                Failed        win11-graceful-coyote
    vm-windows11-bm2-1   kubevirt-workload-update-zl679                Failed        win11-graceful-coyote
    vm-windows11-bm2-2   kubevirt-workload-update-7dp6g                Failed        win11-conservative-moth
    vm-windows11-bm2-2   kubevirt-workload-update-9nb9m                TargetReady   win11-conservative-moth
    vm-windows11-bm2-2   kubevirt-workload-update-cdrf5                Failed        win11-conservative-moth
    vm-windows11-bm2-2   kubevirt-workload-update-dm8fz                Failed        win11-conservative-moth
    vm-windows11-bm2-2   kubevirt-workload-update-kwr6c                Failed        win11-conservative-moth
    vm-windows11-bm2-2   kubevirt-workload-update-zt8wx                Failed        win11-conservative-moth
  3. Exclude the migrating virtual machine from the backup. Reattempt it after the migration is complete.

Backup applications table does not show the new backup times for the backed-up applications

Problem statement
The backup applications table does not show the new backup times for the backed-up applications.
Resolution
Go to the Applications and Jobs view to see the last successful backup job for a given application. For applications on the hub, the Applications table has the correct last backup time.

Backups are failing for the virtual machines

Problem statement
The backups and snapshots are failing for the virtual machines that is mounted with second disk.
Resolution
  1. Run the following command to get disks details for the virtual machine.
    oc get virtualmachine -A -o json | jq '.items[] | [{name:.metadata.name, namespace:.metadata.namespace, volumes:.spec.template.spec.volumes}] | select(.[].volumes[].dataVolume | length > 1) | {name
    :.[].name, namespace:.[].namespace}'
    Example output:
    {
    "name": "rhel9-absent-basilisk",
    "namespace": "vmtesting"
    }
  2. If you find the virtual machines are mounted with second disk, then follow the steps mentioned in the Red Hat solution to resolve the issue.

Known issues and limitations

  • The OpenShift Container Platform cluster can have problems and become unusable. After you recover the cluster, rejoin the connections. OpenShift Container Platform cluster can have problems and become unusable.
  • The S3 bucket must not have an expiration policy or an archive rule. For more information about this known issue, see S3 buckets must not enable expiration policies.