Backup & Restore service installation and upgrade issues
Use this troubleshooting information to resolve install and upgrade problems that are related to Backup & Restore service.
Fresh installation of Backup & Restore may hang or fail
- Cause
- It is because the
isf-prereq
operator deletes the OADP operator artifacts during installation, which creates a race condition.
- Symptoms
- Check the Installed Operators in the
ibm-backup-restore
namespace for any of the following issues:- OADP is not installed
- An error in OADP operator
- An error in OAPD subscription
guardian-dm-operator
orguardian-dp-operator
in failed state.
- Resolution
-
Important:
- This is applicable only for the fresh installation of IBM Storage Fusion Backup & Restore 2.8.1 on OpenShift® Container Platform 4.15 and earlier. Do not apply this patch if you are upgrading from an earlier version.
- Backup & Restore 2.8.1 is not supported on OpenShift Container Platform 4.16.
- Download and run uninstall-backup-restore.sh to clean the failed installation.
- Download and run br_pre_install_patch281.sh.
- Install Backup & Restore service (hub or spoke).
Note: After the Backup & Restore is installed successfully, run the following oc command to patchspectrumfusion
CR:oc -n <FusionNS> patch --type json --patch='[ { op: remove, path: /spec/configuration/services/skipOnBoardingServices } ]' spectrumfusion <SpectrumFusionCR>
Download and run br-post-install-patch281.sh to update the supported OADP version in the cluster service version.
Application information not repopulated after service reinstallation
- Problem statement
- The application information gets removed during Backup & Restore service uninstallation. It does not get
repopulated in MongoDB after the service reinstallation. Restore jobs validation fail with the
following error
message.
postJobRequest: Application Info not returned from application service for applicationId
- Resolution
- The workaround is to restart the
isf-application-operator-controller-manager
pod in the IBM Storage Fusion namespace.
Backup & Restore server installation displays an Invalid Input error
- Problem statement
-
The Backup & Restore service deployment fails with the following error in the IBM Storage Fusion user interface:
Install Error: Invalid Input: Both the api Server and bootstrapToken fields need to be populated
- Resolution
- As a resolution, try with a private browser window.
Backup & Restore hub service in a custom namespace
- Problem statement
- In general, the Backup & Restore service is installed
or upgraded to the default
ibm-backup-restore
namespace. To avoid issues during the installation or upgrade of the service with custom namespace, do the resolution steps.
- Resolution
-
- Go to .
- Open the
ibm-backup-restore-service
CR and go to the YAML tab. - In spec.onboarding.parameters, search for parameter with name as
namespace and change the defaultValue to the
custom-namespace where the service must be installed.Example:
parameters: - dataType: string name: namespace defaultValue: <custom-namespace> userInterface: false required: true descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004
- Store the following YAML in server-fsi.yaml, and replace
<custom-namespace>
with the namespace where the service must be installed.apiVersion: service.isf.ibm.com/v1 kind: FusionServiceInstance metadata: name: ibm-backup-restore-service-instance namespace: ibm-spectrum-fusion-ns spec: parameters: - name: doInstall provided: false value: 'true' - name: namespace provided: false value: <custom-namespace> - name: storageClass provided: true value: lvms-lmvg serviceDefinition: ibm-backup-restore-service triggerUpdate: false enabled: true doInstall: true
- Run the following command to apply the changes:
oc apply -f server-fsi.yaml
- For the upgrade procedure, upgrade the IBM Storage Fusion operator, repeat step 1 of the resolution, and then upgrade the service.
Pods in Crashloopbackoff state after upgrade
- Problem statement
- The Backup & Restore service health changes to unknown and two pods go into Crashlookbackoff state.
- Resolution
-
In the resource settings of
guardian-dp-operator
pod that is inibm-backup-restore
namespace, set the value of IBM Storage Fusion operator memory limits to 1000 mi.Example:resources: limits: cpu: 1000m memory: 1000Mi requests: cpu: 500m memory: 250Mi
storage-operator and isf-data-protection-operator-controller-manager pods crash with
CrashLoopBackOff
status
- Problems statement
-
The
storage-operator
andisf-data-protection-operator-controller-manager
crashes due to OOM error.
- Resolution
- To resolve the error, increase the memory limit:
- Log in to the Red Hat® OpenShift web console as an administrator.
- Select the project
ibm-spectrum-fusion-ns
. - Go to IBM Storage Fusion, and go to the YAML tab. , click
- Change the memory limits:
isf-storage-operator
:- Search for
control-plane: isf-storage-operator
in the YAML file. - In the YAML, change the memory limit for
isf-storage-operator
container from 300Mi to 500Mi.containers: - resources: limits: cpu: 100m memory: 500Mi
- Click Save.
- Wait until a new
isf-storage-operator
pod comes up.
isf-data-protection-operator-controller-manager
:- Search for
control-plane: isf-data-protection-operator-controller-manager
in the YAML file. - In the YAML, change the memory limit for
isf-storage-operator
container from 500Mi to 1000Mi.containers: - resources: limits: cpu: 500m memory: 1000Mi
- Click Save.
- Wait until a new
isf-data-protection-operator-controller-manager
pod comes up.
- Search for
Backup & Restore service goes into unknown state
- Problem statement
- This Backup & Restore service might go into unknown state when you upgrade IBM Storage Fusion from 2.6.x to 2.8.0.
- Resolution
- It automatically shows healthy on the IBM Storage Fusion user interface after you upgrade the Backup & Restore service.
MongoDB pod crashes with CrashLoopBackOff
status
- Problem statement
- The MongoDB pod crashes due to an OOM error.
- Resolution
- To resolve the error, increase the memory limit from 256Mi to 512Mi. Do the following steps to
change the memory limit:
- Log in to the Red Hat OpenShift web console as an administrator.
- Go to .
- Select the project
ibm-backup-restore
. - Select the MongoDB pod, and go to the YAML tab.
- In the YAML, change the memory limit for MongoDB container from 256Mi to 512Mi.
- Click Save.
Backup & Restore stuck at 5% during upgrade
- Diagnosis
-
- In the OpenShift user interface, go to
the Installed Operator and filter on
ibm-backup-restore
namespace. - Click
IBM Storage Fusion Backup and Restore Server
. - Go to Subscriptions.
- If you see the following errors, then do the workaround steps to resolve the issue:
"error validating existing CRs against new CRD's schema for "guardiancopyrestores.guardian.isf.ibm.com": error validating custom resource against new schema for GuardianCopyRestore"
- In the OpenShift user interface, go to
the Installed Operator and filter on
- Resolution
-
- From command line or command prompt, log in to the cluster and run the following commands to
delete the
guardiancopyrestore
CRs and 2.6.0 cs:oc -n ibm-backup-restore delete guardiancopyrestore.guardian.isf.ibm.com --all oc -n ibm-backup-restore delete csv guardian-dm-operator.v2.6.0
- From the OpenShift Container Platform console, go to
Installed Operator and filter on
ibm-backup-restore
namespace. - Click
IBM Storage Fusion Backup and Restore Server
. - Go to Subscriptions.
- Find the failing
installplan
and delete it. - Go to Installed Operator and go to
ibm-backup-restore
namespace. - Find
IBM Storage Fusion Backup and Restore Server
and click Upgrade available and approve the Install plan forIBM Storage Fusion Backup and Restore Server
. - Wait for the service upgrade to resume.
- From command line or command prompt, log in to the cluster and run the following commands to
delete the
Backup & Restore service installation gets stuck after upgrade to IBM Storage Fusion 2.7.0
- Problem statement
- If the installation or upgrade of the Backup & Restore service does not reach 100% completion, then check for failed startup probes. Run the following
command to look for any pods that are not in the READY
state:
Example output:oc get pods -n ibm-backup-restore
NAME READY STATUS RESTARTS AGE applicationsvc-855746ffbf-2pmz7 0/1 Running 5 (2m8s ago) 17m ... job-manager-845dc56b8d-r5w6j 0/1 Running 5 (2m50s ago) 17m ...
- Go to OpenShift Container Platform.
- Go to the Installed Operators view with the selected
ibm-backup-restore
project or namespace. - Click IBM Storage Fusion Backup & Restore Server.
- Select the Data Protection Server tab and click the instance.
- Select the YAML tab and update the
following:
totriggerUpgrade: false
triggerUpgrade: true
- Save and reload the YAML.
After some time, all the pods must be READY and the Backup & Restore service must show healthy in the IBM Storage Fusion user interface.
- Diagnosis
If found, then describe each of the pods and look at the event section to determine whether there is a startup probe failure.
Example output that shows probe failure:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- ... Warning Unhealthy 3m40s (x5 over 6m40s) kubelet Startup probe failed: HTTP probe failed with statuscode: 503
If one or more pods show startup probe failures, then patch the probes to provide additional start time. Update the following script and run to patch the probes. To determine the respective deployment names, update the DEPLOYMENT_NAMES variable with the list of deployments that have failing startup probes and run the script.oc get deployment -n ibm-backup-restore
Example output:
NAME READY UP-TO-DATE AVAILABLE AGE ... applicationsvc 0/1 1 1 12h ... job-manager 0/1 1 1 12h ...
If it is an upgrade and the pods come online, but the progress does not reach 100% in the IBM Storage Fusion user interface even after 30 minutes, then you must update the Backup & Restore Server CR to re-trigger the upgrade.
- Resolution
- Update the Backup & Restore Server CR to re-trigger the upgrade
Backup & Restore service install and upgrade issue with the IBM Storage Protect Plus catalog source
- Problem statement
- The backups can fail with a
FailedValidation
error after you install or upgrade the IBM Storage Fusion and Backup & Restore service to 2.8.0.validationErrors': ['an existing backup storage location wasn't specified at backup creation time and the default guardian-minio wasn't found.
Error: BackupStorageLocation.velero.io "guardian-minio" not found'
It occurs when the IBM Storage Protect Plus catalog exists in the OpenShift Container Platform clusters.
- Resolution
- The workaround is to upgrade the OADP, installed in
the
ibm-backup-restore
namespace, to version 1.3.