Backup & Restore service installation and upgrade issues
Use this troubleshooting information to resolve install and upgrade problems that are related to Backup & Restore service.
Fresh installation of Backup & Restore may hang or fail
- Cause
- It is because the
isf-prereqoperator deletes the OADP operator artifacts during installation, which creates a race condition.
- Symptoms
- Check the Installed Operators in the
ibm-backup-restorenamespace for any of the following issues:- OADP is not installed
- An error in OADP operator
- An error in OAPD subscription
guardian-dm-operatororguardian-dp-operatorin failed state.
- Resolution
-
Important:
- This is applicable only for the fresh installation of IBM Storage Fusion Backup & Restore 2.8.1 on OpenShift® Container Platform 4.15 and earlier. Do not apply this patch if you are upgrading from an earlier version.
- Backup & Restore 2.8.1 is not supported on OpenShift Container Platform 4.16.
- Download and run uninstall-backup-restore.sh to clean the failed installation.
- Download and run br_pre_install_patch281.sh.
- Install Backup & Restore service (hub or spoke).
Note: After the Backup & Restore is installed successfully, run the following oc command to patchspectrumfusionCR:oc -n <FusionNS> patch --type json --patch='[ { op: remove, path: /spec/configuration/services/skipOnBoardingServices } ]' spectrumfusion <SpectrumFusionCR>Download and run br-post-install-patch281.sh to update the supported OADP version in the cluster service version.
Restore fails on IBM Power Systems
Restore failures are observed in a IBM Power Systems environment. If you have cluster(s) running on IBM Power Systems, do not upgrade the Backup & Restore service to 2.8.1 instead contact Contact IBM Support.
Application information not repopulated after service reinstallation
- Problem statement
- The application information gets removed during Backup & Restore service uninstallation. It does not get
repopulated in MongoDB after the service reinstallation. Restore jobs validation fail with the
following error
message.
postJobRequest: Application Info not returned from application service for applicationId
- Resolution
- The workaround is to restart the
isf-application-operator-controller-managerpod in the IBM Storage Fusion namespace.
Backup & Restore server installation displays an Invalid Input error
- Problem statement
-
The Backup & Restore service deployment fails with the following error in the IBM Storage Fusion user interface:
Install Error: Invalid Input: Both the api Server and bootstrapToken fields need to be populated
- Resolution
- As a resolution, try with a private browser window.
Backup & Restore hub service in a custom namespace
- Problem statement
- In general, the Backup & Restore service is installed
or upgraded to the default
ibm-backup-restorenamespace. To avoid issues during the installation or upgrade of the service with custom namespace, do the resolution steps.
- Resolution
-
- Go to .
- Open the
ibm-backup-restore-serviceCR and go to the YAML tab. - In spec.onboarding.parameters, search for parameter with name as
namespace and change the defaultValue to the
custom-namespace where the service must be installed.Example:
parameters: - dataType: string name: namespace defaultValue: <custom-namespace> userInterface: false required: true descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004 - Store the following YAML in server-fsi.yaml, and replace
<custom-namespace>with the namespace where the service must be installed.apiVersion: service.isf.ibm.com/v1 kind: FusionServiceInstance metadata: name: ibm-backup-restore-service-instance namespace: ibm-spectrum-fusion-ns spec: parameters: - name: doInstall provided: false value: 'true' - name: namespace provided: false value: <custom-namespace> - name: storageClass provided: true value: lvms-lmvg serviceDefinition: ibm-backup-restore-service triggerUpdate: false enabled: true doInstall: true - Run the following command to apply the changes:
oc apply -f server-fsi.yaml - For the upgrade procedure, upgrade the IBM Storage Fusion operator, repeat step 1 of the resolution, and then upgrade the service.
Pods in Crashloopbackoff state after upgrade
- Problem statement
- The Backup & Restore service health changes to unknown and two pods go into Crashlookbackoff state.
- Resolution
-
In the resource settings of
guardian-dp-operatorpod that is inibm-backup-restorenamespace, set the value of IBM Storage Fusion operator memory limits to 1000 mi.Example:resources: limits: cpu: 1000m memory: 1000Mi requests: cpu: 500m memory: 250Mi
storage-operator and isf-data-protection-operator-controller-manager pods crash with
CrashLoopBackOff status
- Problems statement
-
The
storage-operatorandisf-data-protection-operator-controller-managercrashes due to OOM error.
- Resolution
- To resolve the error, increase the memory limit:
- Log in to the Red Hat® OpenShift web console as an administrator.
- Select the project
ibm-spectrum-fusion-ns. - Go to , click IBM Storage Fusion, and go to the YAML tab.
- Change the memory limits:
isf-storage-operator:- Search for
control-plane: isf-storage-operatorin the YAML file. - In the YAML, change the memory limit for
isf-storage-operatorcontainer from 300Mi to 500Mi.containers: - resources: limits: cpu: 100m memory: 500Mi - Click Save.
- Wait until a new
isf-storage-operatorpod comes up.
isf-data-protection-operator-controller-manager:- Search for
control-plane: isf-data-protection-operator-controller-managerin the YAML file. - In the YAML, change the memory limit for
isf-storage-operatorcontainer from 500Mi to 1000Mi.containers: - resources: limits: cpu: 500m memory: 1000Mi - Click Save.
- Wait until a new
isf-data-protection-operator-controller-managerpod comes up.
- Search for
Backup & Restore service goes into unknown state
- Problem statement
- This Backup & Restore service might go into unknown state when you upgrade IBM Storage Fusion from 2.6.x to 2.8.0.
- Resolution
- It automatically shows healthy on the IBM Storage Fusion user interface after you upgrade the Backup & Restore service.
MongoDB pod crashes with CrashLoopBackOff status
- Problem statement
- The MongoDB pod crashes due to an OOM error.
- Resolution
- To resolve the error, increase the memory limit from 256Mi to 512Mi. Do the following steps to
change the memory limit:
- Log in to the Red Hat OpenShift web console as an administrator.
- Go to .
- Select the project
ibm-backup-restore. - Select the MongoDB pod, and go to the YAML tab.
- In the YAML, change the memory limit for MongoDB container from 256Mi to 512Mi.
- Click Save.
Backup & Restore stuck at 5% during upgrade
- Diagnosis
-
- In the OpenShift user interface, go to
the Installed Operator and filter on
ibm-backup-restorenamespace. - Click
IBM Storage Fusion Backup and Restore Server. - Go to Subscriptions.
- If you see the following errors, then do the workaround steps to resolve the issue:
"error validating existing CRs against new CRD's schema for "guardiancopyrestores.guardian.isf.ibm.com": error validating custom resource against new schema for GuardianCopyRestore"
- In the OpenShift user interface, go to
the Installed Operator and filter on
- Resolution
-
- From command line or command prompt, log in to the cluster and run the following commands to
delete the
guardiancopyrestoreCRs and 2.6.0 cs:oc -n ibm-backup-restore delete guardiancopyrestore.guardian.isf.ibm.com --all oc -n ibm-backup-restore delete csv guardian-dm-operator.v2.6.0 - From the OpenShift Container Platform console, go to
Installed Operator and filter on
ibm-backup-restorenamespace. - Click
IBM Storage Fusion Backup and Restore Server. - Go to Subscriptions.
- Find the failing
installplanand delete it. - Go to Installed Operator and go to
ibm-backup-restorenamespace. - Find
IBM Storage Fusion Backup and Restore Serverand click Upgrade available and approve the Install plan forIBM Storage Fusion Backup and Restore Server. - Wait for the service upgrade to resume.
- From command line or command prompt, log in to the cluster and run the following commands to
delete the
Backup & Restore service installation gets stuck after upgrade to IBM Storage Fusion 2.7.0
- Problem statement
- If the installation or upgrade of the Backup & Restore service does not reach 100% completion, then check for failed startup probes. Run the following
command to look for any pods that are not in the READY
state:
Example output:oc get pods -n ibm-backup-restoreNAME READY STATUS RESTARTS AGE applicationsvc-855746ffbf-2pmz7 0/1 Running 5 (2m8s ago) 17m ... job-manager-845dc56b8d-r5w6j 0/1 Running 5 (2m50s ago) 17m ...
- Go to OpenShift Container Platform.
- Go to the Installed Operators view with the selected
ibm-backup-restoreproject or namespace. - Click IBM Storage Fusion Backup & Restore Server.
- Select the Data Protection Server tab and click the instance.
- Select the YAML tab and update the
following:
totriggerUpgrade: falsetriggerUpgrade: true - Save and reload the YAML.
After some time, all the pods must be READY and the Backup & Restore service must show healthy in the IBM Storage Fusion user interface.
- Diagnosis
If found, then describe each of the pods and look at the event section to determine whether there is a startup probe failure.
Example output that shows probe failure:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- ... Warning Unhealthy 3m40s (x5 over 6m40s) kubelet Startup probe failed: HTTP probe failed with statuscode: 503
If one or more pods show startup probe failures, then patch the probes to provide additional start time. Update the following script and run to patch the probes. To determine the respective deployment names, update the DEPLOYMENT_NAMES variable with the list of deployments that have failing startup probes and run the script.oc get deployment -n ibm-backup-restoreExample output:
Now, check whether the pods are online. If the startup probes continue to fail, then increase the STARTUP_DELAY_SEC parameter and try again.NAME READY UP-TO-DATE AVAILABLE AGE ... applicationsvc 0/1 1 1 12h ... job-manager 0/1 1 1 12h ...
If it is an upgrade and the pods come online, but the progress does not reach 100% in the IBM Storage Fusion user interface even after 30 minutes, then you must update the Backup & Restore Server CR to re-trigger the upgrade.
- Resolution
- Update the Backup & Restore Server CR to re-trigger the upgrade
Backup & Restore service install and upgrade issue with the IBM Storage Protect Plus catalog source
- Problem statement
- The backups can fail with a
FailedValidationerror after you install or upgrade the IBM Storage Fusion and Backup & Restore service to 2.8.0.validationErrors': ['an existing backup storage location wasn't specified at backup creation time and the default guardian-minio wasn't found.
Error: BackupStorageLocation.velero.io "guardian-minio" not found'
It occurs when the IBM Storage Protect Plus catalog exists in the OpenShift Container Platform clusters.
- Resolution
- The workaround is to upgrade the OADP, installed in
the
ibm-backup-restorenamespace, to version 1.3.