IBM Storage Fusion installation and upgrade issues

List of known issues and limitations in IBM Storage Fusion software installation and upgrade.

Troubleshooting installation issues

Enterprise registry
  • ISF Catalog not found in OperatorHub
    If you cannot find ISF Catalog in OperatorHub, then do the following checks:
    • Check whether the isf-catalog pod is running in openshift-marketplace.
    • Check whether there is an IMagepullBackOff error. If the error exists, check your global pull-secret again.
  • After an offline installation of IBM Storage Fusion, the user interface access may result in a “502 Bad Gateway” error.
    Cause
    This might be a DNS settings problem. The issue occurs whenever the isf-ui-svc is mapped to a load balancer IP address through the overly aggressive use of *.
    Diagnosis
    Run the following sample commands on isf-proxy pod terminal and check whether the DNS servers are the same. If they are different, then it is a DNS setting problem because the output must be the same.
    Sample with issue:
    nslookup isf-ui-svc.ibm-spectrum-fusion-ns.svc.cluster.local
    Server:         172.31.0.10
    Address:        172.31.0.10#53
    
    Name:   isf-ui-svc.ibm-spectrum-fusion-ns.svc.cluster.local.test.mtr.bj.cn
    Address: 10.202.80.111                ---------->check here
    
    sh-4.4$ nslookup isf-ui-svc
    Server:         172.31.0.10
    Address:        172.31.0.10#53
    
    Name:   isf-ui-svc.ibm-spectrum-fusion-ns.svc.cluster.local
    Address: 172.31.54.228               ----------->check here
    Sample without issues:
    sh-4.4$ nslookup isf-ui-svc.ibm-spectrum-fusion-ns.svc.cluster.local
    Server:         172.30.0.10
    Address:        172.30.0.10#53
    
    Name:   isf-ui-svc.ibm-spectrum-fusion-ns.svc.cluster.local
    Address: 172.30.194.120
    
    sh-4.4$ 
    sh-4.4$ nslookup isf-ui-svc
    Server:         172.30.0.10
    Address:        172.30.0.10#53
    
    Name:   isf-ui-svc.ibm-spectrum-fusion-ns.svc.cluster.local
    Address: 172.30.194.120
    Resolution
    1. In the DNS server, check /var/named/ocp-zonefile.db config file.
    2. Comment out the wildcard character used to point to all external DNS addresses.
      Note: Check all your hostnames to ensure they do not map to a load balancer in use.
    3. Restart DNS.
    4. Open IBM Storage Fusion UI to check whether the issue is resolved.

Troubleshooting installation and upgrade issues

  • Update operator OOMKilled error
    To resolve the OOMKilled issue for the update operator, do the following resolution steps:
    Note: If the pod is in a CrashLoopBackOff state, delete the isf-update-operator-* pod. The pod comes back to running state for a couple of minutes. Do steps 1-4 in the following resolution steps when the update operator pod is in Running state.
    1. In the OpenShift® Container Platform console, go to Home > Search.
    2. Search for UpdateManager in the Resources drop-down list.
    3. In the UpdateManagers, open the version instance.
    4. Go to the YAML tab.
    5. Increase the memory limit in spec.resources.limits.memory.
    6. After a couple of minutes, check whether the IBM Storage Fusion clusterserviceversion object (Operators > Installed Operators > IBM Storage Fusion operator > YAML tab) reflects the updated limit set for the update operator:
      • Search for the deployment name of the update operator (isf-update-operator-controller-manager) from the list of deployments under spec.install.spec.deployments.
      • In the specified deployment object, search for the container name manager under the spec.template.spec.containers. Also, check whether the limits.memory is equal to the one in the UpdateManager CR. If not equal, change the memory under limits.memeory to the same limits value as mentioned in the UpdateManager CR in step 5.
      • Go to Workloads > Deployments > isf-update-operator-controller-manager > YAML tab and check whether the limits.memory is equal to the limit set in the previous step. If not equal, change the memory under limits.memory to the same limits value as mentioned in the previous steps.
  • x509: certificate signed by an unknown authority
    The x509: certificate signed by an unknown authority error can occur when you trigger a service or firmware upgrade. A sample error is as follows:
    Internal error occurred: failed calling webhook "mupdatemanager.kb.io": failed to call webhook: Post "https://isf-update-operator-controller-manager-service.ibm-spectrum-fusion-ns.svc:443/mutate-update-isf-ibm-com-v1-updatemanager?timeout=10s": x509: certificate signed by unknown authority
    Do the following resolution steps:
    1. In the OpenShift Container Platform console, go to Home > Search.
    2. From the Resources drop-down list, select MutatingWebhookConfiguration.
    3. Select the Label drop-down list and change it to Name.
    4. Search for mupdatemanager. Check whether there are more than one instance of mupdatemanager.* webhook. If so, take a backup of the older one and delete it.
    5. Go back to Home > Search page.
    6. From the Resources, select ValidatingWebhookConfiguration.
    7. Search for vupdatemanager. Check whether there are more than one instance of vupdatemanager.* webhook. If so, take the backup of the older one and delete it.
  • IBM Storage Fusion operator does not report correct status.

    IBM Storage Fusion operator does not report status "Succeeded" or goes from "Succeeded" to "Installing > Pending > InstallReady" status.

    Symptom
    IBM identified an issue in IBM Storage Fusion operator where operator status does not go to Succeeded but keeps changing between "Installing > Pending > InstallReady", or changes from "Succeeded" to "Installing > Pending > InstallReady".This behavior can be observed on Red Hat OpenShift Platform version (OCP) 4.12.25 or higher only. Any IBM Storage Fusion 2.5.x or 2.6.0 users who deploy IBM Storage Fusion on OpenShift Container Platform 4.12.25 or higher, or who upgrade existing IBM Storage Fusion (2.5.x/2.6.0) that is deployed on OpenShift Container Platform cluster to 4.12.25.
    Resolution
    If you have a deployed or upgraded IBM Storage Fusion 2.5.x or 2.6.0 on OCP 4.12.24 or lesser and plan to upgrade OpenShift Container Platform, then remove the following webhook section from IBM Storage Fusion CSV before you upgrade:
    Example:
       oc -n ibm-spectrum-fusion-ns edit
              isf-operator.v2.6.1-3729811
    Delete the following content:
    - generateName: cfusionservicedefinitionsfusionserviceinstances.kb.io
              containerPort: 443
              sideEffects: None
              deploymentName: isf-prereq-operator-controller-manager
              targetPort: 9443
              conversionCRDs:
                - fusionservicedefinitions.service.isf.ibm.com
                - fusionserviceinstances.service.isf.ibm.com
              type: ConversionWebhook
              admissionReviewVersions:
                - v1
              webhookPath: /convert
    
    Do not start Backup & Restore service installation until you complete the step 1 workaround. Otherwise, the installation gets stuck at 38%. If this happens, then do the following steps:
    1. Uninstall service using script here - https://www.ibm.com/support/pages/node/7014899
    2. Remove the webhook content.
    3. Reinstall the service.
    Note: This issue is resolved in 2.6.1 version.