Troubleshooting installation issues

Troubleshooting IBM Fusion HCI System installation issues.

Failed to pull SCA certs error on OpenShift Container Platform console

Problem statement
The following error may occur in oc get co command or on OpenShift® Container Platform console:
Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 500: {"code":"ACCT-MGMT-9","href":"/api/accounts_mgmt/v1/errors/9","id":"9","kind":"Error","operation_id":"316316c3-771a-4bb1-b6f2-3a52084fcbd1","reason":"400 Bad Request"}
Resolution
For the resolution, see Red Hat customer portal.

Empty node list in Network precheck wizard

Problem statement
The Network validation wizard page (Network setup stage 1) of the IBM Fusion HCI System installation can have an empty node list with the Finish button in enabled state. The InlineNotification status may also show a Connection complete! status with a green checkmark that suggests you can proceed to the next step.

Similarly, Network precheck wizard page (Red Hat OpenShift installation stage 2) of IBM Fusion HCI System may have an empty node list with the Next button enabled.

Resolution
  1. If you run into this scenario, confirm that the nodes are connected before you proceed:

    Check the response of the endpoint <https://<host IP address>:3000/api/v1/verifyDHCP

  2. Find the failed nodes from the response.
  3. Manually verify the configuration of the node in DHCP and DNS.

    The IP addresses of the nodes can either be incorrect or not reachable.

  4. Fix the issue and restart Network setup (stage 1) installation or Red Hat OpenShift installation (stage 2).

Nodes added as local host or local domain to Red Hat OpenShift Container Platform cluster

Resolution
If the Red Hat® OpenShift installation fails, then retry OpenShift installation wizard. If problem persists, contact IBM Support.

ISF node exporter pod in container creating state

Problem statement
You might encounter ISF node exporter pod in container creating state at the end of stage 2 installation. An example pod in container creating state:
2022-05-16T12:06:07.538Z ERROR controller-runtime.manager.controller.node Reconciler error {"reconciler group": "storage.isf.ibm.com", "reconciler kind": "Node", "name": "isf-node-exporter", "namespace": "openshift-monitoring", "error": "pod is not in ready state isf-node-exporter-5bc7bb6587-vcr2n Skipping the installation process"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
Resolution
Ignore this message and proceed to work with next steps as the error gets resolved by itself.

ImagePull failure during an installation

Problem statement
The ImagePull failure can occur due to intermittent network or registry issues.
Resolution
If an ImagePull failure occurs due to intermittent network or registry issue during IBM Fusion HCI installation, then restart the pod and retry. If the issue persists, contact IBM support.

Pull a container image from the registry.connect.redhat.com

Problem statement
If you pull a container image from the registry.connect.redhat.com, it redirects to AWS S3. It is a known issue in Red Hat.
Diagnostic steps
To verify the error message, do the following steps:
  1. Log in to Red Hat OpenShift Container Platform control node.
  2. Use the following commands to manually pull an image from registry.connect.redhat.com:
    
    $ podman login registry.connect.redhat.com
    $ podman pull registry.connect.redhat.com/seldonio/seldon-core-operator-bundle:latest
    Note: An example image is used for illustration.
Resolution
A portion of the content is hosted on registry.connect.redhat.com by using the following AWS S3 bucket: rhc4tp-prod-z8cxf-image-registry-us-east-1-evenkyleffocxqvofrk.s3.dualstack.us-east-1.amazonaws.com. Allow this domain so that OpenShift Container Platform can access it in your firewall.

Known issues

  • During Red Hat OpenShift cluster creation, you cannot download logs but can monitor them on the user interface and download them after installation.
  • If you observe an error Configmap fusion platform not found in fusion namespace in the prereq operator logs, then the error does not have any impact and can be ignored from the isf-prereq-operator-controller-manager-xxxx pod logs.