Troubleshooting upgrade issues

Troubleshooting IBM Fusion HCI System upgrade issues.

Install strategy fails for Fusion Operator after cluster upgrade to Red Hat OpenShift Container Platform 4.15.3

Problem statement
The IBM Fusion HCI System 2.7.2 is upgraded to 2.8.0 with Red Hat® OpenShift® Container Platform 4.14.x. If Red Hat OpenShift Container Platform is upgraded to 4.15.2 or higher in this setup, then the Fusion operator status in OperatorHub fails with the following error:
install strategy failed: rolebindings.rbac.authorization.k8s.io "isf-update-operator-controller-manager-service-auth-reader" already exists 
Cause
The error occurs because of a known Red Hat OpenShift Container Platform issue. For more information about the issue, see https://issues.redhat.com/projects/OCPBUGS/issues/OCPBUGS-32311?filter=allopenissues.
Resolution
  1. Run the following command to check all the existing role bindings.
    oc get csv isf-operator.v2.8.0 -ojson | jq
            '.status.conditions[].message' -n ibm-spectrum-fusion-ns  
    Sample output of the command:
    "all requirements found, attempting install"
    "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists"
    "webhooks not installed"
    "all requirements found, attempting install"
    "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists"
    "webhooks not installed"
    "all requirements found, attempting install"
    "install strategy failed: rolebindings.rbac.authorization.k8s.io \"isf-application-operator-controller-manager-service-auth-reader\" already exists"
    
  2. Take back up of the YAMLs of the reported role bindings.
    oc get rolebinding isf-application-operator-controller-manager-service-auth-reader -n kube-system -o yaml > isf-application-operator-controller-manager-service-auth-reader_rb.yaml
    
  3. Run the following command to delete each reported role binding:
    oc delete rolebinding
            isf-application-operator-controller-manager-service-auth-reader -n kube-system
  4. Iterate through steps 1, 2, and 3 until the IBM Fusion HCI System operator CSV reports Healthy and the Fusion operator status shows Succeeded.

catalogsource isf-catalog does not get updated with status

Problem statement
If you upgrade from 4.14 to 4.15, then it is fusion-catalog and not isf-catalog.
Resolution
  1. Log in to the OpenShift Container Platform console as a cluster administrator.
  2. Create a new CatalogSource by using the YAML editor.
    Sample catalogsource YAML for online upgrade:
    
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: fusion-catalog
      namespace: openshift-marketplace
    spec:
      displayName: IBM Fusion Catalog
      image: 'icr.io/cpopen/isf-operator-catalog:2.8.0-linux.amd64'
      publisher: IBM
      sourceType: grpc
    Sample catalogsource YAML for offline upgrade:
    
    apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    metadata:
      name: fusion-catalog
      namespace: openshift-marketplace
    spec:
      displayName: IBM Fusion Catalog
      image: $TARGET_PATH/isf-operator-catalog:2.8.0-linux.amd64'
      publisher: IBM
      sourceType: grpc
  3. Save the YAML.
  4. Confirm that the CatalogSource 'fusion-catalog' is in Ready state:
    1. Go to Home > Search.
    2. Change namespace to openshift-marketplace.
    3. In Resources, find CatalogSource.
    4. From the list, select fusion-catalog. The Details tab opens by default.
    5. Confirm that the status is Ready in the Details page.
  5. Go to Operators > Installed Operators and make sure that you select ibm-spectrum-fusion-ns project.
  6. From the Installed Operators list, select IBM Fusion that is on 2.7.2 version. The Details tab opens by default.
  7. Go to Subscription tab and check whether the Update approval is Manual or Automatic. If it is Automatic, change the Update approval to Manual.
  8. In the Update approval section, click edit icon and change the channel value to v2.0.
  9. Go to Actions and select Edit Subscription.
  10. In the YAML tab, update the value of the source in the Spec section to fusion-catalog.
  11. Save the YAML.
  12. Proceed with step 8 of Upgrading IBM Fusion HCI System management software topic.

Red Hat OpenShift Container Platform 4.13.24 does not progress beyond 80%

Problem statement
OpenShift Container Platform 4.13.24 does not progress beyond 80% and the OpenShift service-ca displays the following message:
Progressing: service-ca does not have available replicas
Resolution
Log in to OpenShift Container Platform by using the CLI and run the following command to fix the error related to the service-ca operator and its associated pods:
oc adm policy remove-scc-from-group anyuid system:authenticated
For more information about this issue, see https://access.redhat.com/solutions/5875621.

Restore failures post upgrade

Problem statement
After upgrade, you might encounter some restore failures. Check whether the job logs of the failed restore jobs contain the following error:
Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added, provider "nonroot":
Resolution
Contact IBM Support to resolve this known issue.

BMH shows registration error after OpenShift Container Platform upgrade 4.12.36 to 4.13.15 or higher

Problem statement
BMH shows registration error after OpenShift Container Platform upgrade 4.12.36 to 4.13.15 or higher. You can observe this error in the BMH of every node in the cluster.
Symptom
After IBM Fusion HCI System 2.6.2 cluster is upgraded to OpenShift Container Platform 4.13, the existing BMH for compute nodes starts to show registration error. To view the error, go to console > Compute > Bare metal host.
Diagnosis
If you describe BMH by using the following command, you can see an error that indicates that field slaves is not recognized:
oc -n openshift-machine-api describe bmh <bmh name>
Sample error output:
Cannot generate image: serde_yaml::Error: unknown field `slaves`, expected one of `mode`, `options`, `ports`, `port` 

This error occurs because OpenShift Container Platform 4.13 uses nmstate/nmcli spec that does not recognize slaves keyword in the bond configuration.

Resolution
Run the following steps to resolve the registration error:
  1. For every such BMH, find the network secret. For example:
    oc -n openshift-machine-api get secret |grep network-secret
  2. Gt the secret. For example, compute-1-ru5 bmh:
    oc -n openshift-machine-api get secret compute-1-ru5-network-secret      -o json|jq '.data.nmstate' |tr -d '"'
    CiAgICBpbnRlcmZhY2VzOgogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBiYXJlbWV0YWwgaW50ZXJmYWNlcwogICAgICBpcHY0OgogICAgICAgIGRoY3A6IHRydWUKICAgICAgICBlbmFibGVkOiB0cnVlCiAgICAgIGlwdjY6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICBtaWltb246ICIxNDAiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczFmMG5wMAogICAgICAgIC0gZW5zMWYwbnAxCiAgICAgIG5hbWU6IGJvbmQwCiAgICAgIG1hYy1hZGRyZXNzOiAiMTA6NzA6ZmQ6Yjg6ZTY6NWUiCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBCb25kIGNvbm5lY3Rpb24gZW5zbGF2aW5nIGVuczNmMCBhbmQgZW5zM2YxCiAgICAgIGlwdjQ6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgaXB2NjoKICAgICAgICBhZGRyZXNzOgogICAgICAgIC0gaXA6IGZlODA6OmVhZWI6ZDNmZjpmZWZkOmNmYjgKICAgICAgICAgIHByZWZpeC1sZW5ndGg6IDY0CiAgICAgICAgLSBpcDogZmQ4YzoyMTVkOjE3OGU6YzBkZTplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIGF1dG9jb25mOiBmYWxzZQogICAgICAgIGRoY3A6IGZhbHNlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBsaW5rLWFnZ3JlZ2F0aW9uOgogICAgICAgIG1vZGU6IDgwMi4zYWQKICAgICAgICBvcHRpb25zOgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgbGFjcF9yYXRlOiAiMSIKICAgICAgICAgIHhtaXRfaGFzaF9wb2xpY3k6ICIxIgogICAgICAgIHBvcnRzOgogICAgICAgIC0gZW5zM2YwbnAwCiAgICAgICAgLSBlbnMzZjFucDEKICAgICAgbXR1OiA5MDAwCiAgICAgIG5hbWU6IGJvbmQxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBzdG9yYWdlIHZsYW4gaW50ZXJmYWNlIGNvbm5lY3Rpb24gb24gdG9wIG9mIGJvbmQxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMS4zMjAxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiB2bGFuCiAgICAgIHZsYW46CiAgICAgICAgYmFzZS1pZmFjZTogYm9uZDEKICAgICAgICBpZDogMzIwMQ==
  3. Base64 decode from the previous command:
     echo CiAgICBpbnRlcmZhY2VzOgogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBiYXJlbWV0YWwgaW50ZXJmYWNlcwogICAgICBpcHY0OgogICAgICAgIGRoY3A6IHRydWUKICAgICAgICBlbmFibGVkOiB0cnVlCiAgICAgIGlwdjY6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICBtaWltb246ICIxNDAiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczFmMG5wMAogICAgICAgIC0gZW5zMWYwbnAxCiAgICAgIG5hbWU6IGJvbmQwCiAgICAgIG1hYy1hZGRyZXNzOiAiMTA6NzA6ZmQ6Yjg6ZTY6NWUiCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBCb25kIGNvbm5lY3Rpb24gZW5zbGF2aW5nIGVuczNmMCBhbmQgZW5zM2YxCiAgICAgIGlwdjQ6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgaXB2NjoKICAgICAgICBhZGRyZXNzOgogICAgICAgIC0gaXA6IGZlODA6OmVhZWI6ZDNmZjpmZWZkOmNmYjgKICAgICAgICAgIHByZWZpeC1sZW5ndGg6IDY0CiAgICAgICAgLSBpcDogZmQ4YzoyMTVkOjE3OGU6YzBkZTplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIGF1dG9jb25mOiBmYWxzZQogICAgICAgIGRoY3A6IGZhbHNlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBsaW5rLWFnZ3JlZ2F0aW9uOgogICAgICAgIG1vZGU6IDgwMi4zYWQKICAgICAgICBvcHRpb25zOgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgbGFjcF9yYXRlOiAiMSIKICAgICAgICAgIHhtaXRfaGFzaF9wb2xpY3k6ICIxIgogICAgICAgIHBvcnRzOgogICAgICAgIC0gZW5zM2YwbnAwCiAgICAgICAgLSBlbnMzZjFucDEKICAgICAgbXR1OiA5MDAwCiAgICAgIG5hbWU6IGJvbmQxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBzdG9yYWdlIHZsYW4gaW50ZXJmYWNlIGNvbm5lY3Rpb24gb24gdG9wIG9mIGJvbmQxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMS4zMjAxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiB2bGFuCiAgICAgIHZsYW46CiAgICAgICAgYmFzZS1pZmFjZTogYm9uZDEKICAgICAgICBpZDogMzIwMQ==|base64 -d
    
        interfaces:
        - description: Bond connection enslaving baremetal interfaces
          ipv4:
            autoconf: true
            dhcp: true
            enabled: true
          ipv6:
            enabled: false
          link-aggregation:
            mode: 802.3ad
            options:
              lacp_rate: "1"
              miimon: "140"
              xmit_hash_policy: "1"
            slaves:
            - ens1f0np0
            - ens1f0np1
          name: bond0
          mac-address: "10:70:fd:b8:e6:5e"
          state: up
          type: bond
        - description: Bond connection enslaving ens3f0 and ens3f1
          ipv4:
            enabled: false
          ipv6:
            address:
            - ip: fe80::eaeb:d3ff:fefd:cfb8
              prefix-length: 64
            - ip: fd8c:215d:178e:c0de:eaeb:d3ff:fefd:cfb8
              prefix-length: 64
            autoconf: false
            dhcp: false
            enabled: true
          link-aggregation:
            mode: 802.3ad
            options:
              miimon: "140"
              lacp_rate: "1"
              xmit_hash_policy: "1"
            slaves:
            - ens3f0np0
            - ens3f1np1
          mtu: 9000
          name: bond1
          state: up
          type: bond
        - description: storage vlan interface connection on top of bond1
          mtu: 9000
          name: bond1.3201
          state: up
          type: vlan
          vlan:
            base-iface: bond1
            id: 3201
  4. In a YAML editor, paste the output in the previous step and replace two occurrences of slaves with ports and remove autoconf: true from ipv4 section:
    interfaces:
        - description: Bond connection enslaving baremetal interfaces
          ipv4:
            dhcp: true
            enabled: true
          ipv6:
            enabled: false
          link-aggregation:
            mode: 802.3ad
            options:
              lacp_rate: "1"
              miimon: "140"
              xmit_hash_policy: "1"
            ports:
            - ens1f0np0
            - ens1f0np1
          name: bond0
          mac-address: "10:70:fd:b8:e6:5e"
          state: up
          type: bond
        - description: Bond connection enslaving ens3f0 and ens3f1
          ipv4:
            enabled: false
          ipv6:
            address:
            - ip: fe80::eaeb:d3ff:fefd:cfb8
              prefix-length: 64
            - ip: fd8c:215d:178e:c0de:eaeb:d3ff:fefd:cfb8
              prefix-length: 64
            autoconf: false
            dhcp: false
            enabled: true
          link-aggregation:
            mode: 802.3ad
            options:
              miimon: "140"
              lacp_rate: "1"
              xmit_hash_policy: "1"
            ports:
            - ens3f0np0
            - ens3f1np1
          mtu: 9000
          name: bond1
          state: up
          type: bond
        - description: storage vlan interface connection on top of bond1
          mtu: 9000
          name: bond1.3201
          state: up
          type: vlan
          vlan:
            base-iface: bond1
            id: 3201
  5. base64 encode this changed YAML file content:
    cat /home/garg/slaves.yaml |base64 -w0
    aW50ZXJmYWNlczoKICAgIC0gZGVzY3JpcHRpb246IEJvbmQgY29ubmVjdGlvbiBlbnNsYXZpbmcgYmFyZW1ldGFsIGludGVyZmFjZXMKICAgICAgaXB2NDoKICAgICAgICBkaGNwOiB0cnVlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBpcHY2OgogICAgICAgIGVuYWJsZWQ6IGZhbHNlCiAgICAgIGxpbmstYWdncmVnYXRpb246CiAgICAgICAgbW9kZTogODAyLjNhZAogICAgICAgIG9wdGlvbnM6CiAgICAgICAgICBsYWNwX3JhdGU6ICIxIgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgeG1pdF9oYXNoX3BvbGljeTogIjEiCiAgICAgICAgcG9ydHM6CiAgICAgICAgLSBlbnMxZjBucDAKICAgICAgICAtIGVuczFmMG5wMQogICAgICBuYW1lOiBib25kMAogICAgICBtYWMtYWRkcmVzczogIjEwOjcwOmZkOmI4OmU2OjVlIgogICAgICBzdGF0ZTogdXAKICAgICAgdHlwZTogYm9uZAogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBlbnMzZjAgYW5kIGVuczNmMQogICAgICBpcHY0OgogICAgICAgIGVuYWJsZWQ6IGZhbHNlCiAgICAgIGlwdjY6CiAgICAgICAgYWRkcmVzczoKICAgICAgICAtIGlwOiBmZTgwOjplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIC0gaXA6IGZkOGM6MjE1ZDoxNzhlOmMwZGU6ZWFlYjpkM2ZmOmZlZmQ6Y2ZiOAogICAgICAgICAgcHJlZml4LWxlbmd0aDogNjQKICAgICAgICBhdXRvY29uZjogZmFsc2UKICAgICAgICBkaGNwOiBmYWxzZQogICAgICAgIGVuYWJsZWQ6IHRydWUKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIG1paW1vbjogIjE0MCIKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczNmMG5wMAogICAgICAgIC0gZW5zM2YxbnAxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMQogICAgICBzdGF0ZTogdXAKICAgICAgdHlwZTogYm9uZAogICAgLSBkZXNjcmlwdGlvbjogc3RvcmFnZSB2bGFuIGludGVyZmFjZSBjb25uZWN0aW9uIG9uIHRvcCBvZiBib25kMQogICAgICBtdHU6IDkwMDAKICAgICAgbmFtZTogYm9uZDEuMzIwMQogICAgICBzdGF0ZTogdXAKICAgICAgdHlwZTogdmxhbgogICAgICB2bGFuOgogICAgICAgIGJhc2UtaWZhY2U6IGJvbmQxCiAgICAgICAgaWQ6IDMyMDEK
  6. Edit the network secret and replace spec.data.nmstate value with the base64 encoded value:
    oc -n openshift-machine-api  edit secret  compute-1-ru5-network-secret
    Example:
    apiVersion: v1
    data:
      nmstate: CiAgICBpbnRlcmZhY2VzOgogICAgLSBkZXNjcmlwdGlvbjogQm9uZCBjb25uZWN0aW9uIGVuc2xhdmluZyBiYXJlbWV0YWwgaW50ZXJmYWNlcwogICAgICBpcHY0OgogICAgICAgIGRoY3A6IHRydWUKICAgICAgICBlbmFibGVkOiB0cnVlCiAgICAgIGlwdjY6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgbGluay1hZ2dyZWdhdGlvbjoKICAgICAgICBtb2RlOiA4MDIuM2FkCiAgICAgICAgb3B0aW9uczoKICAgICAgICAgIGxhY3BfcmF0ZTogIjEiCiAgICAgICAgICBtaWltb246ICIxNDAiCiAgICAgICAgICB4bWl0X2hhc2hfcG9saWN5OiAiMSIKICAgICAgICBwb3J0czoKICAgICAgICAtIGVuczFmMG5wMAogICAgICAgIC0gZW5zMWYwbnAxCiAgICAgIG5hbWU6IGJvbmQwCiAgICAgIG1hYy1hZGRyZXNzOiAiMTA6NzA6ZmQ6Yjg6ZTY6NWUiCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBCb25kIGNvbm5lY3Rpb24gZW5zbGF2aW5nIGVuczNmMCBhbmQgZW5zM2YxCiAgICAgIGlwdjQ6CiAgICAgICAgZW5hYmxlZDogZmFsc2UKICAgICAgaXB2NjoKICAgICAgICBhZGRyZXNzOgogICAgICAgIC0gaXA6IGZlODA6OmVhZWI6ZDNmZjpmZWZkOmNmYjgKICAgICAgICAgIHByZWZpeC1sZW5ndGg6IDY0CiAgICAgICAgLSBpcDogZmQ4YzoyMTVkOjE3OGU6YzBkZTplYWViOmQzZmY6ZmVmZDpjZmI4CiAgICAgICAgICBwcmVmaXgtbGVuZ3RoOiA2NAogICAgICAgIGF1dG9jb25mOiBmYWxzZQogICAgICAgIGRoY3A6IGZhbHNlCiAgICAgICAgZW5hYmxlZDogdHJ1ZQogICAgICBsaW5rLWFnZ3JlZ2F0aW9uOgogICAgICAgIG1vZGU6IDgwMi4zYWQKICAgICAgICBvcHRpb25zOgogICAgICAgICAgbWlpbW9uOiAiMTQwIgogICAgICAgICAgbGFjcF9yYXRlOiAiMSIKICAgICAgICAgIHhtaXRfaGFzaF9wb2xpY3k6ICIxIgogICAgICAgIHBvcnRzOgogICAgICAgIC0gZW5zM2YwbnAwCiAgICAgICAgLSBlbnMzZjFucDEKICAgICAgbXR1OiA5MDAwCiAgICAgIG5hbWU6IGJvbmQxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiBib25kCiAgICAtIGRlc2NyaXB0aW9uOiBzdG9yYWdlIHZsYW4gaW50ZXJmYWNlIGNvbm5lY3Rpb24gb24gdG9wIG9mIGJvbmQxCiAgICAgIG10dTogOTAwMAogICAgICBuYW1lOiBib25kMS4zMjAxCiAgICAgIHN0YXRlOiB1cAogICAgICB0eXBlOiB2bGFuCiAgICAgIHZsYW46CiAgICAgICAgYmFzZS1pZmFjZTogYm9uZDEKICAgICAgICBpZDogMzIwMQ==
    kind: Secret
    metadata:
      creationTimestamp: "2023-11-06T05:13:06Z"
      labels:
        environment.metal3.io: baremetal
      name: compute-1-ru5-network-secret
      namespace: openshift-machine-api
      ownerReferences:
      - apiVersion: metal3.io/v1alpha1
        kind: PreprovisioningImage
        name: compute-1-ru5
        uid: 1591fbbd-b787-4c46-b37e-53cbd62d885e
      - apiVersion: metal3.io/v1alpha1
        kind: BareMetalHost
        name: compute-1-ru5
        uid: 0e03f8c3-6c01-4cd9-b499-03e4f67cd753
      resourceVersion: "8788604"
      uid: ee1af992-6c19-449b-8cb2-7315d1692498
    type: Opaque
    

Community operator catalog is shown as missing

Resolution
If the Community operator catalog is shown as missing, then create it before you attempt upgrade.

Machine config roll out error

Resolution
If an operation causes machine config roll out and gets stuck for a long time, then check whether the node to be updated is pingable and has an IP after restart. If there exist any DHCP or network issues that prevent the node from getting a hostname, then fix them and restart the node.

On-demand backup failures post upgrade

Problem statement
Post upgrade, on-demand backup failures might happen for existing applications.
Resolution
Do the following manual steps after you upgrade to avoid this problem:
  1. Run the following command to display the phase status of the backup policies associated with all your applications:
    oc get fpa -A
    Example output:
    oc get fpa -A
    
    NAMESPACE                NAME                                  PROVIDER     APPLICATION           BACKUPPOLICY      DATACONSISTENCY   PHASE             LASTBACKUPTIMESTAMP   CAPACITY
    ibm-spectrum-fusion-ns   deptest2-azure-hourly-30              isf-ibmspp   deptest2              azure-hourly-30                     Assigned          66m                   <no value>
    ibm-spectrum-fusion-ns   new-generic-1-azure-hourly-45         isf-ibmspp   new-generic-1         azure-hourly-45                     Assigned          21m                   <no value>
    ibm-spectrum-fusion-ns   new-mongo-project-1-azure-hourly-15   isf-ibmspp   new-mongo-project-1   azure-hourly-15                     InitializeError   81m                   <no value>
    ibm-spectrum-fusion-ns   new-mongo-project-azure-hourly-30     isf-ibmspp   new-mongo-project     azure-hourly-30                     Assigned          66m                   <no value>
  2. Verify whether your policyassignment CR corresponds to any application in InitializeError phase. In this example, the new-mongo-project-1 application is in InitializeError phase.
  3. Log in to IBM Fusion HCI user interface.
  4. Go to Applications > Backups tab.
  5. Unassign the backup policy that is assigned to the application in InitializeError phase and wait for its unassignment. In this example, unassign azure-hourly-15 policy from new-mongo-project-1 application.
  6. Reassign the backup policy.

ImagePull failure

Resolution
If an ImagePull failure occurs due to intermittent network or registry issue during an upgrade, then restart the pod and retry. If the issue persists, contact IBM support.

DeadlineExceeded error

Problem statement
IBM Cloud Paks foundational services operator ClusterServiceVersion (CSV) status shows Failed and its InstallPlan status shows Failed after the subscription gets created.
Resolution
If you notice that the operator installation or upgrade fails with DeadlineExceeded error, see Operator installation or upgrade fails with DeadlineExceeded error.

IBM Fusion operator upgrade is stuck due to Grafana operator

Problem statement
The IBM Fusion operator is decoupled with the Grafana operator, and the IBM Fusion operator might get stuck due to missing Grafana images during upgrade that can be validated using isf-subscription.
Resolution
Follow the steps to resolve the issue:
  1. If you are using Grafana, then re-mirror the community operator. For procedure, see Mirroring images for a disconnected installation using the oc-mirror plugin.
  2. If you do not use Grafana, then uninstall the grafana operator. For the procedure to uninstall, see Deleting operators from a cluster.
  3. After uninstalling Grafana operator, if you still see same error then contact IBM support.

Update operator OOMKilled error

Problem statement
The pods go into crash loop state with the OOMKilled error after OpenShift Container Platform upgrade.
Resolution
To resolve the OOMKilled issue for the update operator, do the following resolution steps:
  1. Go to IBM Fusion clusterserviceversion object (Operators > Installed Operators > IBM Fusion operator > YAML tab).
  2. Search for the deployment name of the isf-update-operator (isf-update-operator-controller-manager) from the list of deployments in the clusterserviceversion object under spec.install.spec.deployments.
  3. In the specified deployment object, search for the container name manager under the spec.template.spec.containers and increase the memory limit in the resources.limits.memory.
  4. After changing the limits in the IBM Fusion clusterserviceversion, the update operator pod restarts with the new limits.
  5. If the OOMKilled issue still persists, then follow the steps 1 - 4 again.