Backup issues
List of backup issues in the Backup & Restore service of IBM Fusion.
Backup size for block volumes is not correct
When the policy backup storage location gets changed, the backup size for application backups with Ceph RBD block volumes is incorrectly displayed during the first backup. As the correct information is displayed after the first backup following the change in the backup storage location, you can ignore this issue.
Shallow copy support is not available
Shallow copy support is currently unavailable in the IBM Storage Scale CSI driver. Consequently, a new copy of the data is created when the PVC is generated from the snapshot.
Multiple VM backups failing with snapshot deadline exceeded error
- Resolution
-
- Run the following command to enable full
auditing:
sudo auditctl -w /etc/shadow -p
- Rerun the backup and run the following command to identify the file that is causing ownership or
permission issues.
sudo ausearch -m avc -ts recent
- If the ausearch command reports issues, then run the following commands to
generate a local policy to allow access.
sudo ausearch -c 'qemu-ga' --raw | audit2allow -M my-qemuga sudo semodule -X 300 -i my-qemuga.pp
- Run the following command to enable full
auditing:
IBM Cloud Limitations with OADP DataMover
- Problem statement
- Backups of raw volumes fail for IBM Cloud with OADP 1.4.0 or lower.
- Cause
- The OADP 1.3 or higher expose the volume of the underlying host during backup or restore. The
folders on the host are exposed in the Pods that are associated with the Daemonset node-agent. The
/var/lib/kubelet/{plugins,pods} folder is exposed by default. The folders
required to work on IBM Cloud are /var/data/kubelet/{plugins,pods}. As a
result, the backup and restore of volumeMode: block volumes fail with the following example
error:
Failed transferring data [BMYBR0009](https://ibm.com/docs/SSFETU_2.9/errorcodes/BMYBR0009.html) There was an error when processing the job in the Transaction Manager service. The underlying error was: 'Data uploads watch caused an exception: DataUpload d4e7706d-7f0f-4448-b3a0-e9cdff8d33db-1 failed with message: data path backup failed: Failed to run kopia backup: unable to get local block device entry: resolveSymlink: lstat /var/data: no such file or directory'.
The ID of the individual DataUpload varies for jobs.
- Resolution
-
- Set the DataMover type to
legacy
by either the global method or the Per PolicyAssignment method. For the procedure to update the type, see Configure DataMover type for Backup and Restore.Though this workaround allows continued use of of DataMover
kopia
type, it has the following drawbacks:It disables further changes to DataMover and Velero configurations, such as the ability to change resource allocations (CPU, Memory, and Ephemeral-storage) and nodeSelectors (datamover node placement) for Datamover type
kopia
.Legacy is not affected by this workaround.
To avoid job failures, do not make these changes while a backup or restore job is in progress.
- In the OpenShift Console, go to .
- Select
openshift-adp-controller-manager
and scale the number of Pods to 0. - Go to and select node-agent.
- Select the YAML tab.
- Under the volumes section, add the additional volume
host-data
as shown in the following example.Note: It exposes an additional folder on the host other than the folders mention in the Cause of this issue.volumes: - name: host-pods hostPath: path: /var/lib/kubelet/pods type: '' - name: host-plugins hostPath: path: /var/lib/kubelet/plugins type: '' - name: host-data hostPath: path: /var/data/kubelet type: '' - name: scratch emptyDir: {} - name: certs emptyDir: {}
- Under
volumeMounts
, add thehost-data
volume as shown in the following example.volumeMounts: - name: host-pods mountPath: /host_pods mountPropagation: HostToContainer - name: host-plugins mountPath: /var/lib/kubelet/plugins mountPropagation: HostToContainer - name: host-data mountPath: /var/data/kubelet mountPropagation: HostToContainer - name: scratch mountPath: /scratch - name: certs mountPath: /etc/ssl/certs
- Save the changes and wait for a couple minutes for the Pods to restart.
The backups and restores of PersistentVolumeClaims with volumeMode: block succeeds on Red Hat® OpenShift® of IBM Cloud.
- Set the DataMover type to
Failed to create snapshot content
- Problem statement
- Failed to create snapshot content with the following error:
Cannot find CSI PersistentVolumeSource for directory-based static volume
- Resolution
- To resolve the error, see https://www.ibm.com/docs/en/scalecsi/2.10?topic=snapshot-create-volumesnapshot.
Assign a backup policy operation fails
- Problem statement
- If you have a PolicyAssignment for an application on the hub and you create a PolicyAssignment
for the same application on the spoke, then your attempt to assign a backup policy for the
application fails. In both assignments, the application, backup policy, and short-form cluster name
are the same. The current format of the PolicyAssignment CR name is
appName-backupPolicyName-shortFormClusterName
. The issue happens when the first string of the cluster names is identical. In this scenario, the creation gets rejected because the PolicyAssignment name exists in OpenShift Container Platform.For example:
Hub assignment createsapp1-bp1-apps
:- Application -
app1
- BackupPolicy -
bp1
- AppCluster -
apps.cluster1
app1-bp1-apps
(The OpenShift Container Platform rejects it)- Application -
app1
- BackupPolicy -
bp1
- AppCluster -
apps.cluster2
- Application -
- Resolution
- To create the PolicyAssignment for the spoke application, delete the PolicyAssignment CR for the hub application assignment and attempt spoke application assignment again.
Backups do not work as defined in the backup policies
- Problem statement
- Sometimes, backups do not work as defined in the backup policies, especially when you set hourly policies. For example, if you set a policy for two hours and it does not run every two hours, then gaps exist in the backup history. The possible reason might be that during pod crash and restart, scheduled jobs were not accounting for the time zone, causing gaps in run intervals.
- Diagnosis
- The following are the observed symptoms:
- Policies with custom every X hour at minute YY schedules: the first scheduled run of this policy will run at minute YY after X hours + time zone offset from UTC instead of at minute YY after X hours.
- Monthly and yearly policies run more frequently.
- Resolution
- You can start backups manually until the next scheduled time.
Backup & Restore service deployed in IBM Cloud Satellite
- Problem statement
- You can encounter an error when you attempt backup operation on IBM Fusion Backup & Restore service that is deployed in IBM Cloud® Satellite.
- Diagnosis
-
Backup operations fail with the following log entries:
level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=pods, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name> level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=replicasets.apps, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name> level=error msg="Error backing up item" backup=<item> error="error executing custom action (groupResource=deployments.apps, namespace=<namespace>, name=<name>): rpc error: code = Unknown desc = configmaps \"config\" not found" error.file="/remote-source/velero/app/pkg/backup/item_backupper.go:326" error.function="github.com/vmware-tanzu/velero/pkg/backup.(*itemBackupper).executeActions" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=<name>
- Cause
- An issue exists with the default OADP plug-in and it must be disabled to continue.
- Resolution
-
Do the following steps to disable the plug-in:
- In the OpenShift console, go to .
- Search for the CustomResourceDefiniton
DataProtectionApplication
. - In the Instances tab, locate the instance that is named
velero
. - Open the YAML file in edit mode for the instance.
- Under the entry
spec:velero:defaultPlugins
, remove the line foropenshift
. - Save the YAML file.
Backup jobs are stuck in a running state for a long time and are not canceled
- Resolution
- Do the following steps to resolve the issue:
- Ensure that all jobs are finished and the queue is empty before you do any disruptive actions like node restarts.
- If jobs are running for a long period and do not progress, follow the steps to delete the
backup or restore CR directly.
- Log in to IBM Fusion.
- Go to and get the name of the job that is stuck.
- Run the following command to delete backup
job.
oc delete fbackup <job_name>
- Run the following command to delete restore
job.
oc delete frestore <job_name>
Policy creation
- Problem statement
- Sometimes, when you create a backup policy, the following errors can occur:
Error: Policy daily-snapshot could not created.
- Resolution
- Restart the
isf-data-protection-operator-controller-manager-* pod
in IBM Fusion namespace. It triggers the recreation of the in-place-snapshot BackupStorageLocation CR.
Policy assignment from Backup & Restore service page of the OpenShift Container Platform console
- Problem statement
- In the Backup & Restore service page of the OpenShift Container Platform console, the backup policy assignment to an application fails with a gateway timeout error.
- Resolution
- Use your IBM Fusion user interface.
Backup of multiple VMs attempt is failed
- Problem statement
- This issue occurs when some VMs are in a migrating state. The OpenShift Container Platform does not support snapshot of the VMs in migrating state.
- Resolution
- Follow the steps to resolve this issue:
- Check whether the virtual machine is in a migrating state:
- Run the following command to check migrating
VM.
oc get virtualmachineinstancemigrations -A
Example output:NAMESPACE NAME PHASE VMI fb-bm1-fs-1-5g-10 rhel8-lesser-wildcat-migration-8fhbo Failed rhel8-lesser-wildcat vm-centipede-bm2 centos-stream9-chilly-hawk-migration-57jyk Failed centos-stream9-chilly-hawk vm-centos9-bm1-1 centos-stream9-instant-toad-migration-bfyz6 Failed centos-stream9-instant-toad vm-centos9-bm1-1 centos-stream9-instant-toad-migration-d9547 Failed centos-stream9-instant-toad vm-windows10-bm2-1 kubevirt-workload-update-4dm57 Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-f2s5w Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-gt6nj Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-rjwmn Failed win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-vfxfl TargetReady win10-zealous-unicorn vm-windows10-bm2-1 kubevirt-workload-update-z2thw Failed win10-zealous-unicorn vm-windows11-bm2-1 kubevirt-workload-update-9gr6v Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-clbck Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-j6pmx Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-sfbbx Pending win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-th5dd Failed win11-graceful-coyote vm-windows11-bm2-1 kubevirt-workload-update-zl679 Failed win11-graceful-coyote vm-windows11-bm2-2 kubevirt-workload-update-7dp6g Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-9nb9m TargetReady win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-cdrf5 Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-dm8fz Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-kwr6c Failed win11-conservative-moth vm-windows11-bm2-2 kubevirt-workload-update-zt8wx Failed win11-conservative-moth
- Exclude the migrating virtual machine from the backup. Reattempt it after the migration is complete.
Backup applications table does not show the new backup times for the backed-up applications
- Problem statement
- The backup applications table does not show the new backup times for the backed-up applications.
- Resolution
- Go to the Applications and Jobs view to see the last successful backup job for a given application. For applications on the hub, the Applications table has the correct last backup time.
Backups are failing for the virtual machines
- Problem statement
- The backups and snapshots are failing for the virtual machines that is mounted with second disk.
- Resolution
-
- Run the following command to get disks details for the virtual
machine.
Example output:oc get virtualmachine -A -o json | jq '.items[] | [{name:.metadata.name, namespace:.metadata.namespace, volumes:.spec.template.spec.volumes}] | select(.[].volumes[].dataVolume | length > 1) | {name :.[].name, namespace:.[].namespace}'
{ "name": "rhel9-absent-basilisk", "namespace": "vmtesting" }
- If you find the virtual machines are mounted with second disk, then follow the steps mentioned in the Red Hat solution to resolve the issue.
- Run the following command to get disks details for the virtual
machine.
Known issues and limitations
- The OpenShift Container Platform cluster can have problems and become unusable. After you recover the cluster, rejoin the connections. For the steps to clean the connection and setup the connection between two clusters again, see Connection setup after OpenShift Container Platform cluster recovery.OpenShift Container Platform cluster can have problems and become unusable.
- The S3 bucket must not have an expiration policy or an archive rule. For more information about this known issue, see S3 buckets must not enable expiration policies.
- The Azure Endpoint URL must not contain the name of the bucket.