Backup issues
List of backup issues in the Backup & Restore service of IBM Storage Fusion.
Failed to create snapshot content
- Problem statement
- Failed to create snapshot content with the following error:
Cannot find CSI PersistentVolumeSource for directory-based static volume
- Resolution
- To resolve the error, see https://www.ibm.com/docs/en/scalecsi/2.10?topic=snapshot-create-volumesnapshot.
Assign a backup policy operation fails
- Problem statement
- If you have a PolicyAssignment for an application on the hub and you create a PolicyAssignment
for the same application on the spoke, then your attempt to assign a backup policy for the
application fails. In both assignments, the application, backup policy, and short-form cluster name
are the same. The current format of the PolicyAssignment CR name is
appName-backupPolicyName-shortFormClusterName
. The issue happens when the first string of the cluster names is identical. In this scenario, the creation gets rejected because the PolicyAssignment name exists in OpenShift® Container Platform.For example:
Hub assignment createsapp1-bp1-apps
:- Application -
app1
- BackupPolicy -
bp1
- AppCluster -
apps.cluster1
app1-bp1-apps
(The OpenShift Container Platform rejects it)- Application -
app1
- BackupPolicy -
bp1
- AppCluster -
apps.cluster2
- Application -
- Resolution
- To create the PolicyAssignment for the spoke application, delete the PolicyAssignment CR for the hub application assignment and attempt spoke application assignment again.
Backups do not work as defined in the backup policies
- Problem statement
- Sometimes, backups do not work as defined in the backup policies, especially when you set hourly policies. For example, if you set a policy for two hours and it does not run every two hours, then gaps exist in the backup history. The possible reason might be that during pod crash and restart, scheduled jobs were not accounting for the time zone, causing gaps in run intervals.
- Diagnosis
- The following are the observed symptoms:
- Policies with custom every X hour at minute YY schedules: the first scheduled run of this policy will run at minute YY after X hours + time zone offset from UTC instead of at minute YY after X hours.
- Monthly and yearly policies run more frequently.
- Resolution
- You can start backups manually until the next scheduled time.
Backup & Restore service deployed in IBM Cloud Satellite
- Problem statement
- You can encounter an error when you attempt backup operation on IBM Storage Fusion Backup & Restore service that is deployed in IBM Cloud® Satellite.
- Diagnosis
-
Backup operations fail with the following log entries:
level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=pods, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name> level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=replicasets.apps, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name> level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=deployments.apps, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name>
- Cause
- An issue exists with the default OADP plug-in and it must be disabled to continue.
- Resolution
-
Do the following steps to disable the plug-in:
- In the OpenShift console, go to .
- Search for the CustomResourceDefiniton
DataProtectionApplication
. - In the Instances tab, locate the instance that is named
velero
. - Open the YAML file in edit mode for the instance.
- Under the entry
spec:velero:defaultPlugins
, remove the line foropenshift
. - Save the YAML file.
Backup jobs are stuck in a running state for a long time and are not canceled
- Resolution
- Do the following steps to resolve the issue:
- Ensure that all jobs are finished and the queue is empty before you do any disruptive actions like node restarts.
- If jobs are running for a long period and do not progress, follow the steps to delete the
backup or restore CR directly.
- Log in to IBM Storage Fusion.
- Go to and get the name of the job that is stuck.
- Run the following command to delete backup
job.
oc delete fbackup <job_name>
- Run the following command to delete restore
job.
oc delete frestore <job_name>
Policy creation
- Problem statement
- Sometimes, when you create a backup policy, the following errors can occur:
Error: Policy daily-snapshot could not created.
- Resolution
- Restart the
isf-data-protection-operator-controller-manager-* pod
in IBM Storage Fusion namespace. It triggers the recreation of the in-place-snapshot BackupStorageLocation CR.
Policy assignment from Backup & Restore service page of the OpenShift Container Platform console
- Problem statement
- In the Backup & Restore service page of the OpenShift Container Platform console, the backup policy assignment to an application fails with a gateway timeout error.
- Resolution
- Use your IBM Storage Fusion user interface.
Backup of multiple VMs attempt is failed
- Problem statement
- This issue occurs when some VMs are in a migrating state. The OpenShift Container Platform does not support snapshot of the VMs in migrating state.
- Resolution
- Follow the steps to resolve this issue:
- Check whether the virtual machine is in a migrating state:
- Run the following command to check migrating
VM.
oc get virtualmachineinstancemigrations -A
Example output:NAMESPACE NAME PHASE VMI fb-bm1-fs-1-5g-10 rhel8-lesser-wildcat-migration-8fhbo Failed rhel8-lesser-wildcat vm-centipede-bm2 centos-stream9-chilly-hawk-migration-57jyk Failed centos-stream9-chilly-hawk vm-centos9-bm1-1 centos-stream9-instant-toad-migration-bfyz6 Failed centos-stream9-instant-toad vm-centos9-bm1-1 centos-stream9-instant-toad-migration-d9547 Failed centos-stream9-instant-toad vm-windows10-bm2-1 kubevirt-workload-update-4dm57 Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-f2s5w Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-gt6nj Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-rjwmn Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-vfxfl TargetReady win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-z2thw Failed win10-zealous-unicorn vm-windows11-bm2-1 kubevirt-workload-update-9gr6v Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-clbck Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-j6pmx Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-sfbbx Pending win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-th5dd Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-zl679 Failed win11-graceful-coyote vm-windows11-bm2-2 kubevirt-workload-update-7dp6g Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-9nb9m TargetReady win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-cdrf5 Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-dm8fz Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-kwr6c Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-zt8wx Failed win11-conservative-moth
- Exclude the migrating virtual machine from the backup. Reattempt it after the migration is complete.
Backup applications table does not show the new backup times for the backed-up applications
- Problem statement
- The backup applications table does not show the new backup times for the backed-up applications.
- Resolution
- Go to the Applications and Jobs view to see the last successful backup job for a given application. For applications on the hub, the Applications table has the correct last backup time.
Backups are failing for the virtual machines
- Problem statement
- The backups and snapshots are failing for the virtual machines that is mounted with second disk.
- Resolution
-
- Run the following command to get disks details for the virtual
machine.
Example output:oc get virtualmachine -A -o json | jq '.items[] | [{name:.metadata.name, namespace:.metadata.namespace, volumes:.spec.template.spec.volumes}] | select(.[].volumes[].dataVolume | length > 1) | {name :.[].name, namespace:.[].namespace}'
{ "name": "rhel9-absent-basilisk", "namespace": "vmtesting" }
- If you find the virtual machines are mounted with second disk, then follow the steps mentioned in the Red Hat solution to resolve the issue.
- Run the following command to get disks details for the virtual
machine.
Known issues and limitations
- The OpenShift Container Platform cluster can have problems and become unusable. After you recover the cluster, rejoin the connections. For the steps to clean the connection and setup the connection between two clusters again, see Connection setup after OpenShift Container Platform cluster recovery.OpenShift Container Platform cluster can have problems and become unusable.
- The S3 bucket must not have an expiration policy or an archive rule. For more information about this known issue, see S3 buckets must not enable expiration policies.