Troubleshooting upgrade issues
Troubleshooting IBM Fusion HCI upgrade issues.
Install strategy fails for Fusion Operator after cluster upgrade to Red Hat OpenShift Container Platform 4.15.3
- Problem statement
- The IBM Fusion HCI 2.7.2 is upgraded to 2.8.0 with
Red Hat® OpenShift® Container Platform 4.14.x. If Red Hat OpenShift Container Platform is upgraded to 4.15.2 or higher in this
setup, then the Fusion operator status in OperatorHub fails with the following error:
install strategy failed: rolebindings.rbac.authorization.k8s.io "isf-update-operator-controller-manager-service-auth-reader" already exists
- Cause
- The error occurs because of a known Red Hat OpenShift Container Platform issue. For more information about the issue, see https://issues.redhat.com/projects/OCPBUGS/issues/OCPBUGS-32311?filter=allopenissues.
- Resolution
-
- Run the following command to check all the existing role
bindings.
Sample output of the command:oc get csv isf-operator.v2.8.0 -ojson | jq '.status.conditions[].message' -n ibm-spectrum-fusion-ns
"all requirements found, attempting install" "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists" "webhooks not installed" "all requirements found, attempting install" "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists" "webhooks not installed" "all requirements found, attempting install" "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists"
- Take back up of the YAMLs of the reported role bindings.
oc get rolebinding isf-application-operator-controller-manager-service-auth-reader -n kube-system -o yaml > isf-application-operator-controller-manager-service-auth-reader_rb.yaml
- Run the following command to delete each reported role
binding:
oc delete rolebinding isf-application-operator-controller-manager-service-auth-reader -n kube-system
- Iterate through steps 1, 2, and 3 until the IBM Fusion HCI operator CSV reports Healthy and the Fusion
operator status shows
Succeeded
.
- Run the following command to check all the existing role
bindings.
catalogsource isf-catalog does not get updated with status
- Problem statement
- If you upgrade from 4.14 to 4.15, then it is fusion-catalog and not isf-catalog.
- Resolution
-
- Log in to the OpenShift Container Platform console as a cluster administrator.
- Create a new CatalogSource by using the YAML editor.Sample catalogsource YAML for online upgrade:
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: fusion-catalog namespace: openshift-marketplace spec: displayName: IBM Fusion Catalog image: 'icr.io/cpopen/isf-operator-catalog:2.8.0-linux.amd64' publisher: IBM sourceType: grpc
Sample catalogsource YAML for offline upgrade:apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: fusion-catalog namespace: openshift-marketplace spec: displayName: IBM Fusion Catalog image: $TARGET_PATH/isf-operator-catalog:2.8.0-linux.amd64' publisher: IBM sourceType: grpc
- Save the YAML.
- Confirm that the CatalogSource 'fusion-catalog' is in Ready state:
- Go to .
- Change namespace to
openshift-marketplace
. - In Resources, find
CatalogSource
. - From the list, select fusion-catalog. The Details tab opens by default.
- Confirm that the status is Ready in the Details page.
- Go to
ibm-spectrum-fusion-ns
project.
and make sure that you select
- From the Installed Operators list, select IBM Fusion that is on 2.7.2 version. The Details tab opens by default.
- Go to Subscription tab and check whether the Update approval is Manual or Automatic. If it is Automatic, change the Update approval to Manual.
- In the Update approval section, click edit icon and change the channel value to v2.0.
- Go to Actions and select Edit Subscription.
- In the YAML tab, update the value of the source in the
Spec
section tofusion-catalog
. - Save the YAML.
- Proceed with step 8 of Upgrading IBM Fusion HCI management software topic.
Restore failures post upgrade
- Problem statement
- After upgrade, you might encounter some restore failures. Check whether the job logs of the
failed restore jobs contain the following error:
Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added, provider "nonroot":
- Resolution
- Contact IBM Support to resolve this known issue.
Community operator catalog is shown as missing
- Resolution
- If the Community operator catalog is shown as missing, then create it before you attempt upgrade.
Machine config roll out error
- Resolution
- If an operation causes
machine config roll out
and gets stuck for a long time, then check whether the node to be updated is pingable and has an IP after restart. If there exist any DHCP or network issues that prevent the node from getting a hostname, then fix them and restart the node.
On-demand backup failures post upgrade
- Problem statement
- Post upgrade, on-demand backup failures might happen for existing applications.
- Resolution
- Do the following manual steps after you upgrade to avoid this problem:
- Run the following command to display the phase status of the backup policies associated with all
your applications:
Example output:oc get fpa -A
oc get fpa -A NAMESPACE NAME PROVIDER APPLICATION BACKUPPOLICY DATACONSISTENCY PHASE LASTBACKUPTIMESTAMP CAPACITY ibm-spectrum-fusion-ns deptest2-azure-hourly-30 isf-ibmspp deptest2 azure-hourly-30 Assigned 66m <no value> ibm-spectrum-fusion-ns new-generic-1-azure-hourly-45 isf-ibmspp new-generic-1 azure-hourly-45 Assigned 21m <no value> ibm-spectrum-fusion-ns new-mongo-project-1-azure-hourly-15 isf-ibmspp new-mongo-project-1 azure-hourly-15 InitializeError 81m <no value> ibm-spectrum-fusion-ns new-mongo-project-azure-hourly-30 isf-ibmspp new-mongo-project azure-hourly-30 Assigned 66m <no value>
- Verify whether your
policyassignment
CR corresponds to any application inInitializeError
phase. In this example, thenew-mongo-project-1
application is inInitializeError
phase. - Log in to IBM Fusion HCI user interface.
- Go to tab.
- Unassign the backup policy that is assigned to the application in
InitializeError
phase and wait for its unassignment. In this example, unassignazure-hourly-15
policy fromnew-mongo-project-1
application. - Reassign the backup policy.
- Run the following command to display the phase status of the backup policies associated with all
your applications:
ImagePull failure
- Resolution
- If an
ImagePull
failure occurs due to intermittent network or registry issue during an upgrade, then restart the pod and retry. If the issue persists, contact IBM support.
DeadlineExceeded error
- Problem statement
- IBM Cloud Paks foundational services operator ClusterServiceVersion (CSV) status shows Failed and its InstallPlan status shows Failed after the subscription gets created.
- Resolution
- If you notice that the operator installation or upgrade fails with
DeadlineExceeded error
, see Operator installation or upgrade fails with DeadlineExceeded error.
Operator OOMKilled
error in IBM Fusion namespace
- Problem statement
- The pods go into crash loop state with the OOMKilled error after OpenShift Container Platform upgrade.
- Resolution
- Follow steps 1 through 4 below to address the
OOMKilled
error related to theisf-update-operator
:- Go to IBM Fusion
clusterserviceversion
object ( tab). - Search for the deployment name of the
isf-update-operator
(isf-update-operator-controller-manager
) from the list of deployments in theclusterserviceversion
object underspec.install.spec.deployments
. - In the specified deployment object, search for the container name manager
under the
spec.template.spec.containers
and increase the memory limit in theresources.limits.memory
. - After changing the limits in the IBM Fusion
clusterserviceversion
, the update operator pod restarts with the new limits. - If the OOMKilled issue still persists, then follow the steps 1 - 4 again.
- Go to IBM Fusion