Troubleshooting upgrade issues
Troubleshooting IBM Fusion HCI System upgrade issues.
Install strategy fails for Fusion Operator after cluster upgrade to Red Hat OpenShift Container Platform 4.15.3
- Problem statement
- The IBM Fusion HCI System 2.7.2 is upgraded to 2.8.0 with
Red Hat® OpenShift® Container Platform 4.14.x. If Red Hat OpenShift Container Platform is upgraded to 4.15.2 or higher in this
setup, then the Fusion operator status in OperatorHub fails with the following error:
install strategy failed: rolebindings.rbac.authorization.k8s.io "isf-update-operator-controller-manager-service-auth-reader" already exists
- Cause
- The error occurs because of a known Red Hat OpenShift Container Platform issue. For more information about the issue, see https://issues.redhat.com/projects/OCPBUGS/issues/OCPBUGS-32311?filter=allopenissues.
- Resolution
-
- Run the following command to check all the existing role
bindings.
Sample output of the command:oc get csv isf-operator.v2.8.0 -ojson | jq '.status.conditions[].message' -n ibm-spectrum-fusion-ns
"all requirements found, attempting install" "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists" "webhooks not installed" "all requirements found, attempting install" "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists" "webhooks not installed" "all requirements found, attempting install" "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists"
- Take back up of the YAMLs of the reported role bindings.
oc get rolebinding isf-application-operator-controller-manager-service-auth-reader -n kube-system -o yaml > isf-application-operator-controller-manager-service-auth-reader_rb.yaml
- Run the following command to delete each reported role
binding:
oc delete rolebinding isf-application-operator-controller-manager-service-auth-reader -n kube-system
- Iterate through steps 1, 2, and 3 until the IBM Fusion HCI System operator CSV reports Healthy and the Fusion
operator status shows
Succeeded
.
- Run the following command to check all the existing role
bindings.
catalogsource isf-catalog does not get updated with status
- Problem statement
- If you upgrade from 4.14 to 4.15, then it is fusion-catalog and not isf-catalog.
- Resolution
-
- Log in to the OpenShift Container Platform console as a cluster administrator.
- Create a new CatalogSource by using the YAML editor.Sample catalogsource YAML for online upgrade:
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: fusion-catalog namespace: openshift-marketplace spec: displayName: IBM Fusion Catalog image: 'icr.io/cpopen/isf-operator-catalog:2.8.0-linux.amd64' publisher: IBM sourceType: grpc
Sample catalogsource YAML for offline upgrade:apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: fusion-catalog namespace: openshift-marketplace spec: displayName: IBM Fusion Catalog image: $TARGET_PATH/isf-operator-catalog:2.8.0-linux.amd64' publisher: IBM sourceType: grpc
- Save the YAML.
- Confirm that the CatalogSource 'fusion-catalog' is in Ready state:
- Go to .
- Change namespace to
openshift-marketplace
. - In Resources, find
CatalogSource
. - From the list, select fusion-catalog. The Details tab opens by default.
- Confirm that the status is Ready in the Details page.
- Go to
ibm-spectrum-fusion-ns
project.
and make sure that you select
- From the Installed Operators list, select IBM Fusion that is on 2.7.2 version. The Details tab opens by default.
- Go to Subscription tab and check whether the Update approval is Manual or Automatic. If it is Automatic, change the Update approval to Manual.
- In the Update approval section, click edit icon and change the channel value to v2.0.
- Go to Actions and select Edit Subscription.
- In the YAML tab, update the value of the source in the
Spec
section tofusion-catalog
. - Save the YAML.
- Proceed with step 8 of Upgrading IBM Fusion HCI System management software topic.
Red Hat OpenShift Container Platform 4.13.24 does not progress beyond 80%
- Problem statement
-
OpenShift Container Platform 4.13.24 does not progress beyond 80% and the OpenShift
service-ca
displays the following message:Progressing: service-ca does not have available replicas
- Resolution
-
Log in to OpenShift Container Platform by using the CLI and run the following command to fix the error related to the
service-ca
operator and its associated pods:
For more information about this issue, see https://access.redhat.com/solutions/5875621.oc adm policy remove-scc-from-group anyuid system:authenticated
Restore failures post upgrade
- Problem statement
- After upgrade, you might encounter some restore failures. Check whether the job logs of the
failed restore jobs contain the following error:
Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added, provider "nonroot":
- Resolution
- Contact IBM Support to resolve this known issue.
BMH shows registration error after OpenShift Container Platform upgrade 4.12.36 to 4.13.15 or higher
- Problem statement
- BMH shows registration error after OpenShift Container Platform upgrade 4.12.36 to 4.13.15 or higher. You can observe this error in the BMH of every node in the cluster.
- Symptom
- After IBM Fusion HCI System 2.6.2 cluster is upgraded to OpenShift Container Platform 4.13, the existing BMH for compute nodes starts to show registration error. To view the error, go to .
- Diagnosis
- If you describe BMH by using the following command, you can see an error that indicates that
field slaves is not recognized:
Sample error output:oc -n openshift-machine-api describe bmh <bmh name>
Cannot generate image: serde_yaml::Error: unknown field `slaves`, expected one of `mode`, `options`, `ports`, `port`
This error occurs because OpenShift Container Platform 4.13 uses
nmstate/nmcli
spec that does not recognizeslaves
keyword in the bond configuration.
- Resolution
- Run the following steps to resolve the registration error:
- For every such BMH, find the network secret. For
example:
oc -n openshift-machine-api get secret |grep network-secret
- Gt the secret. For example,
compute-1-ru5 bmh
:oc -n openshift-machine-api get secret compute-1-ru5-network-secret -o json|jq '.data.nmstate' |tr -d '"' CiAgICBpbnRlcmZhY2VzOgogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBiYXJlbWV0YWwgaW50ZXJmYWNlcwogICAgICBpcHY0OgogICAgICAgIGRoY3A6IHRydWUKICAgICAgICBlbmFibGVkOiB0cnVlCiAgICAgIGlwdjY6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICBtaWltb246ICIxNDAiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczFmMG5wMAogICAgICAgIC0gZW5zMWYwbnAxCiAgICAgIG5hbWU6IGJvbmQwCiAgICAgIG1hYy1hZGRyZXNzOiAiMTA6NzA6ZmQ6Yjg6ZTY6NWUiCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBCb25kIGNvbm5lY3Rpb24gZW5zbGF2aW5nIGVuczNmMCBhbmQgZW5zM2YxCiAgICAgIGlwdjQ6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgaXB2NjoKICAgICAgICBhZGRyZXNzOgogICAgICAgIC0gaXA6IGZlODA6OmVhZWI6ZDNmZjpmZWZkOmNmYjgKICAgICAgICAgIHByZWZpeC1sZW5ndGg6IDY0CiAgICAgICAgLSBpcDogZmQ4YzoyMTVkOjE3OGU6YzBkZTplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIGF1dG9jb25mOiBmYWxzZQogICAgICAgIGRoY3A6IGZhbHNlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBsaW5rLWFnZ3JlZ2F0aW9uOgogICAgICAgIG1vZGU6IDgwMi4zYWQKICAgICAgICBvcHRpb25zOgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgbGFjcF9yYXRlOiAiMSIKICAgICAgICAgIHhtaXRfaGFzaF9wb2xpY3k6ICIxIgogICAgICAgIHBvcnRzOgogICAgICAgIC0gZW5zM2YwbnAwCiAgICAgICAgLSBlbnMzZjFucDEKICAgICAgbXR1OiA5MDAwCiAgICAgIG5hbWU6IGJvbmQxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBzdG9yYWdlIHZsYW4gaW50ZXJmYWNlIGNvbm5lY3Rpb24gb24gdG9wIG9mIGJvbmQxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMS4zMjAxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiB2bGFuCiAgICAgIHZsYW46CiAgICAgICAgYmFzZS1pZmFjZTogYm9uZDEKICAgICAgICBpZDogMzIwMQ==
- Base64 decode from the previous command:
echo CiAgICBpbnRlcmZhY2VzOgogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBiYXJlbWV0YWwgaW50ZXJmYWNlcwogICAgICBpcHY0OgogICAgICAgIGRoY3A6IHRydWUKICAgICAgICBlbmFibGVkOiB0cnVlCiAgICAgIGlwdjY6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICBtaWltb246ICIxNDAiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczFmMG5wMAogICAgICAgIC0gZW5zMWYwbnAxCiAgICAgIG5hbWU6IGJvbmQwCiAgICAgIG1hYy1hZGRyZXNzOiAiMTA6NzA6ZmQ6Yjg6ZTY6NWUiCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBCb25kIGNvbm5lY3Rpb24gZW5zbGF2aW5nIGVuczNmMCBhbmQgZW5zM2YxCiAgICAgIGlwdjQ6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgaXB2NjoKICAgICAgICBhZGRyZXNzOgogICAgICAgIC0gaXA6IGZlODA6OmVhZWI6ZDNmZjpmZWZkOmNmYjgKICAgICAgICAgIHByZWZpeC1sZW5ndGg6IDY0CiAgICAgICAgLSBpcDogZmQ4YzoyMTVkOjE3OGU6YzBkZTplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIGF1dG9jb25mOiBmYWxzZQogICAgICAgIGRoY3A6IGZhbHNlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBsaW5rLWFnZ3JlZ2F0aW9uOgogICAgICAgIG1vZGU6IDgwMi4zYWQKICAgICAgICBvcHRpb25zOgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgbGFjcF9yYXRlOiAiMSIKICAgICAgICAgIHhtaXRfaGFzaF9wb2xpY3k6ICIxIgogICAgICAgIHBvcnRzOgogICAgICAgIC0gZW5zM2YwbnAwCiAgICAgICAgLSBlbnMzZjFucDEKICAgICAgbXR1OiA5MDAwCiAgICAgIG5hbWU6IGJvbmQxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBzdG9yYWdlIHZsYW4gaW50ZXJmYWNlIGNvbm5lY3Rpb24gb24gdG9wIG9mIGJvbmQxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMS4zMjAxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiB2bGFuCiAgICAgIHZsYW46CiAgICAgICAgYmFzZS1pZmFjZTogYm9uZDEKICAgICAgICBpZDogMzIwMQ==|base64 -d interfaces: - description: Bond connection enslaving baremetal interfaces ipv4: autoconf: true dhcp: true enabled: true ipv6: enabled: false link-aggregation: mode: 802.3ad options: lacp_rate: "1" miimon: "140" xmit_hash_policy: "1" slaves: - ens1f0np0 - ens1f0np1 name: bond0 mac-address: "10:70:fd:b8:e6:5e" state: up type: bond - description: Bond connection enslaving ens3f0 and ens3f1 ipv4: enabled: false ipv6: address: - ip: fe80::eaeb:d3ff:fefd:cfb8 prefix-length: 64 - ip: fd8c:215d:178e:c0de:eaeb:d3ff:fefd:cfb8 prefix-length: 64 autoconf: false dhcp: false enabled: true link-aggregation: mode: 802.3ad options: miimon: "140" lacp_rate: "1" xmit_hash_policy: "1" slaves: - ens3f0np0 - ens3f1np1 mtu: 9000 name: bond1 state: up type: bond - description: storage vlan interface connection on top of bond1 mtu: 9000 name: bond1.3201 state: up type: vlan vlan: base-iface: bond1 id: 3201
- In a YAML editor, paste the output in the previous step and replace two occurrences of
slaves
withports
and removeautoconf: true
from ipv4 section:interfaces: - description: Bond connection enslaving baremetal interfaces ipv4: dhcp: true enabled: true ipv6: enabled: false link-aggregation: mode: 802.3ad options: lacp_rate: "1" miimon: "140" xmit_hash_policy: "1" ports: - ens1f0np0 - ens1f0np1 name: bond0 mac-address: "10:70:fd:b8:e6:5e" state: up type: bond - description: Bond connection enslaving ens3f0 and ens3f1 ipv4: enabled: false ipv6: address: - ip: fe80::eaeb:d3ff:fefd:cfb8 prefix-length: 64 - ip: fd8c:215d:178e:c0de:eaeb:d3ff:fefd:cfb8 prefix-length: 64 autoconf: false dhcp: false enabled: true link-aggregation: mode: 802.3ad options: miimon: "140" lacp_rate: "1" xmit_hash_policy: "1" ports: - ens3f0np0 - ens3f1np1 mtu: 9000 name: bond1 state: up type: bond - description: storage vlan interface connection on top of bond1 mtu: 9000 name: bond1.3201 state: up type: vlan vlan: base-iface: bond1 id: 3201
- base64 encode this changed YAML file content:
cat /home/garg/slaves.yaml |base64 -w0 aW50ZXJmYWNlczoKICAgIC0gZGVzY3JpcHRpb246IEJvbmQgY29ubmVjdGlvbiBlbnNsYXZpbmcgYmFyZW1ldGFsIGludGVyZmFjZXMKICAgICAgaXB2NDoKICAgICAgICBkaGNwOiB0cnVlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBpcHY2OgogICAgICAgIGVuYWJsZWQ6IGZhbHNlCiAgICAgIGxpbmstYWdncmVnYXRpb246CiAgICAgICAgbW9kZTogODAyLjNhZAogICAgICAgIG9wdGlvbnM6CiAgICAgICAgICBsYWNwX3JhdGU6ICIxIgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgeG1pdF9oYXNoX3BvbGljeTogIjEiCiAgICAgICAgcG9ydHM6CiAgICAgICAgLSBlbnMxZjBucDAKICAgICAgICAtIGVuczFmMG5wMQogICAgICBuYW1lOiBib25kMAogICAgICBtYWMtYWRkcmVzczogIjEwOjcwOmZkOmI4OmU2OjVlIgogICAgICBzdGF0ZTogdXAKICAgICAgdHlwZTogYm9uZAogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBlbnMzZjAgYW5kIGVuczNmMQogICAgICBpcHY0OgogICAgICAgIGVuYWJsZWQ6IGZhbHNlCiAgICAgIGlwdjY6CiAgICAgICAgYWRkcmVzczoKICAgICAgICAtIGlwOiBmZTgwOjplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIC0gaXA6IGZkOGM6MjE1ZDoxNzhlOmMwZGU6ZWFlYjpkM2ZmOmZlZmQ6Y2ZiOAogICAgICAgICAgcHJlZml4LWxlbmd0aDogNjQKICAgICAgICBhdXRvY29uZjogZmFsc2UKICAgICAgICBkaGNwOiBmYWxzZQogICAgICAgIGVuYWJsZWQ6IHRydWUKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIG1paW1vbjogIjE0MCIKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczNmMG5wMAogICAgICAgIC0gZW5zM2YxbnAxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMQogICAgICBzdGF0ZTogdXAKICAgICAgdHlwZTogYm9uZAogICAgLSBkZXNjcmlwdGlvbjogc3RvcmFnZSB2bGFuIGludGVyZmFjZSBjb25uZWN0aW9uIG9uIHRvcCBvZiBib25kMQogICAgICBtdHU6IDkwMDAKICAgICAgbmFtZTogYm9uZDEuMzIwMQogICAgICBzdGF0ZTogdXAKICAgICAgdHlwZTogdmxhbgogICAgICB2bGFuOgogICAgICAgIGJhc2UtaWZhY2U6IGJvbmQxCiAgICAgICAgaWQ6IDMyMDEK
- Edit the network secret and replace spec.data.nmstate value with the base64
encoded value:
Example:oc -n openshift-machine-api edit secret compute-1-ru5-network-secret
apiVersion: v1 data: nmstate: CiAgICBpbnRlcmZhY2VzOgogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBiYXJlbWV0YWwgaW50ZXJmYWNlcwogICAgICBpcHY0OgogICAgICAgIGRoY3A6IHRydWUKICAgICAgICBlbmFibGVkOiB0cnVlCiAgICAgIGlwdjY6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICBtaWltb246ICIxNDAiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczFmMG5wMAogICAgICAgIC0gZW5zMWYwbnAxCiAgICAgIG5hbWU6IGJvbmQwCiAgICAgIG1hYy1hZGRyZXNzOiAiMTA6NzA6ZmQ6Yjg6ZTY6NWUiCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBCb25kIGNvbm5lY3Rpb24gZW5zbGF2aW5nIGVuczNmMCBhbmQgZW5zM2YxCiAgICAgIGlwdjQ6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgaXB2NjoKICAgICAgICBhZGRyZXNzOgogICAgICAgIC0gaXA6IGZlODA6OmVhZWI6ZDNmZjpmZWZkOmNmYjgKICAgICAgICAgIHByZWZpeC1sZW5ndGg6IDY0CiAgICAgICAgLSBpcDogZmQ4YzoyMTVkOjE3OGU6YzBkZTplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIGF1dG9jb25mOiBmYWxzZQogICAgICAgIGRoY3A6IGZhbHNlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBsaW5rLWFnZ3JlZ2F0aW9uOgogICAgICAgIG1vZGU6IDgwMi4zYWQKICAgICAgICBvcHRpb25zOgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgbGFjcF9yYXRlOiAiMSIKICAgICAgICAgIHhtaXRfaGFzaF9wb2xpY3k6ICIxIgogICAgICAgIHBvcnRzOgogICAgICAgIC0gZW5zM2YwbnAwCiAgICAgICAgLSBlbnMzZjFucDEKICAgICAgbXR1OiA5MDAwCiAgICAgIG5hbWU6IGJvbmQxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBzdG9yYWdlIHZsYW4gaW50ZXJmYWNlIGNvbm5lY3Rpb24gb24gdG9wIG9mIGJvbmQxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMS4zMjAxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiB2bGFuCiAgICAgIHZsYW46CiAgICAgICAgYmFzZS1pZmFjZTogYm9uZDEKICAgICAgICBpZDogMzIwMQ== kind: Secret metadata: creationTimestamp: "2023-11-06T05:13:06Z" labels: environment.metal3.io: baremetal name: compute-1-ru5-network-secret namespace: openshift-machine-api ownerReferences: - apiVersion: metal3.io/v1alpha1 kind: PreprovisioningImage name: compute-1-ru5 uid: 1591fbbd-b787-4c46-b37e-53cbd62d885e - apiVersion: metal3.io/v1alpha1 kind: BareMetalHost name: compute-1-ru5 uid: 0e03f8c3-6c01-4cd9-b499-03e4f67cd753 resourceVersion: "8788604" uid: ee1af992-6c19-449b-8cb2-7315d1692498 type: Opaque
- For every such BMH, find the network secret. For
example:
Community operator catalog is shown as missing
- Resolution
- If the Community operator catalog is shown as missing, then create it before you attempt upgrade.
Machine config roll out error
- Resolution
- If an operation causes
machine config roll out
and gets stuck for a long time, then check whether the node to be updated is pingable and has an IP after restart. If there exist any DHCP or network issues that prevent the node from getting a hostname, then fix them and restart the node.
On-demand backup failures post upgrade
- Problem statement
- Post upgrade, on-demand backup failures might happen for existing applications.
- Resolution
- Do the following manual steps after you upgrade to avoid this problem:
- Run the following command to display the phase status of the backup policies associated with all
your applications:
Example output:oc get fpa -A
oc get fpa -A NAMESPACE NAME PROVIDER APPLICATION BACKUPPOLICY DATACONSISTENCY PHASE LASTBACKUPTIMESTAMP CAPACITY ibm-spectrum-fusion-ns deptest2-azure-hourly-30 isf-ibmspp deptest2 azure-hourly-30 Assigned 66m <no value> ibm-spectrum-fusion-ns new-generic-1-azure-hourly-45 isf-ibmspp new-generic-1 azure-hourly-45 Assigned 21m <no value> ibm-spectrum-fusion-ns new-mongo-project-1-azure-hourly-15 isf-ibmspp new-mongo-project-1 azure-hourly-15 InitializeError 81m <no value> ibm-spectrum-fusion-ns new-mongo-project-azure-hourly-30 isf-ibmspp new-mongo-project azure-hourly-30 Assigned 66m <no value>
- Verify whether your
policyassignment
CR corresponds to any application inInitializeError
phase. In this example, thenew-mongo-project-1
application is inInitializeError
phase. - Log in to IBM Fusion HCI user interface.
- Go to tab.
- Unassign the backup policy that is assigned to the application in
InitializeError
phase and wait for its unassignment. In this example, unassignazure-hourly-15
policy fromnew-mongo-project-1
application. - Reassign the backup policy.
- Run the following command to display the phase status of the backup policies associated with all
your applications:
ImagePull failure
- Resolution
- If an
ImagePull
failure occurs due to intermittent network or registry issue during an upgrade, then restart the pod and retry. If the issue persists, contact IBM support.
DeadlineExceeded error
- Problem statement
- IBM Cloud Paks foundational services operator ClusterServiceVersion (CSV) status shows Failed and its InstallPlan status shows Failed after the subscription gets created.
- Resolution
- If you notice that the operator installation or upgrade fails with
DeadlineExceeded error
, see Operator installation or upgrade fails with DeadlineExceeded error.
IBM Fusion operator upgrade is stuck due to Grafana operator
- Problem statement
- The IBM Fusion operator is decoupled with the
Grafana operator, and the IBM Fusion operator
might get stuck due to missing Grafana images during upgrade that can be validated using
isf-subscription
.
- Resolution
- Follow the steps to resolve the issue:
- If you are using Grafana, then re-mirror the community operator. For procedure, see Mirroring images for a disconnected installation using the oc-mirror plugin.
- If you do not use Grafana, then uninstall the grafana operator. For the procedure to uninstall, see Deleting operators from a cluster.
- After uninstalling Grafana operator, if you still see same error then contact IBM support.
Update operator OOMKilled error
- Problem statement
- The pods go into crash loop state with the OOMKilled error after OpenShift Container Platform upgrade.
- Resolution
- To resolve the OOMKilled issue for the update operator, do the following resolution steps:
- Go to IBM Fusion
clusterserviceversion
object ( tab). - Search for the deployment name of the
isf-update-operator
(isf-update-operator-controller-manager
) from the list of deployments in theclusterserviceversion
object underspec.install.spec.deployments
. - In the specified deployment object, search for the container name manager
under the
spec.template.spec.containers
and increase the memory limit in theresources.limits.memory
. - After changing the limits in the IBM Fusion
clusterserviceversion
, the update operator pod restarts with the new limits. - If the OOMKilled issue still persists, then follow the steps 1 - 4 again.
- Go to IBM Fusion