Logs and troubleshooting
Information about logs, known issues, and troubleshooting that are associated with OpenShift Container Platform accelerator V4.9.0.1.
Logs
See the OpenShift Container Platform accelerator related logs for reference.
-
How to gather OpenShift installation logs?
See the RedHat documentation - Gathering installation logs
. For example, on the primary helper node, run the following command to gather logs from the three masters nodes:
openshift-install gather bootstrap \ --master <master0 IP address> \ --master <master1 IP address> \ --master <master2 IP address> \ --key /core_rsa
Known issues and troubleshooting
See the following known issues, limitations, and workarounds:
-
The OpenShift image registry does not work if the primary helper goes down.
-
How to debug an OpenShift 4.x install?
- If the bootstrap is still in process, run the following command to log into the bootstrap node:
``` ssh core@<bootstrap_ip> -i <coreos_key> ```- Run the following command to look for errors in the journal:
journalctl -b -f -u bootkube.service- Run the following commands to look for failed pods:
sudo bash crictl ps -a crictl logs <container_id>- If the bootstrap is complete and install is still in process, run the following command to log into the master node:
ssh core@<master_ip> -i <coreos_key>- Run the following command to look for errors in the journal:
journalctl -b -f -u bootkube.service- Run the following command to look for failed pods:
podman ps -a podman logs <container_id> sudo bash crictl ps -a crictl logs <container_id>-
From the Helper Node, do the following steps:
-
Run the following command to check the status of the Cluster Operators:
oc get co -
Run the following commands to check the status of the nodes (ensure they are not 100% allocated from CPU or memory wise):
oc get nodes oc describe node <node>
-
Deployments of the cloned OpenShift Container Platform accelerator does not show on the "Managed accelerator instances" page of the new IBM Cloud Pak® System user interface.
Cause: It is a limitation in the current IBM Cloud Pak® System user interface.
Resolution: The deployments are shown in the old IBM Cloud Pak® System user interface. The kubeadmin password can be retrieved with these steps:
- Log in to the primary Helper node.
- Run the following commands:
export KUBECONFIG=/ocp-helper/artifacts/ignition_files/auth/kubeconfigoc get secret cluster-info -o json | jq ".data.clusterinfo" | cut -d'"' -f2 | base64 –d | base64 -d -
The OpenShift Container Platform online installation of Operator can fail with the following error:
Error: ImagePullBackOff. You install OpenShift Container Platform 4.x by using a IBM Cloud Pak® System template and an offline IBM Cloud Platform Common Services Docker Registry. The Pullback error message occurs when you install an OpenShift Container Platform operator from the OperatorHub of a cluster that does not have internet connectivity.Cause: The default pull secret that got set up during the initial installation of the environment contain only the pull secret of the local offline registry. However, the installation of the operator tries to access external sites (registry.redhat.io) that require a different credential for access.
Resolution: For resolution, see "Can I enable online connectivity for OpenShift Container Platform offline installation updates?" in the OpenShift Container Platform accelerator FAQs topic.
-
When you add new worker node to the Red Hat OpenShift Container Platform cluster by using the IBM Cloud Pak® System add node function, sometimes the node is not added to the OpenShift cluster.
Cause: When you create Red Hat OpenShift Container Platform 4.9 cluster and add one or more worker nodes from the IBM Cloud Pak® System user interface, one or more worker nodes might not be added to the Red Hat OpenShift Container Platform cluster. It happens when the worker node is not started in time.
Resolution: To resolve this issue, manually approve CSRs for the newly added node from the Primary Helper virtual machine.
- Check for pending Certificate Signing Request (CSRs) with the following command:
oc get csr | grep -i Pending- Approve a pending CSR by running the following command:
oc adm certificate approve <csr_name>- Repeat these commands till the node is added to the cluster and till no more CSRs are pending.
-
When you add a new worker node to the Red Hat OpenShift Container Platform cluster by using the IBM Cloud Pak® System add node function, sometimes the node is added to the OpenShift cluster as ‘localhost’.
Cause: When you create a Red Hat OpenShift Container Platform 4.9 cluster and add a worker node from the IBM Cloud Pak® System user interface, after some time the worker node is added to the Red Hat OpenShift Container Platform cluster but with the name as localhost.
Resolution: To resolve this issue, do these steps:
- Delete the localhost node by running the following command from the Primary Helper virtual machine:
oc delete node localhost- Restart the newly added worker node (the virtual machine that appeared as localhost).
- Once the node is restarted, check for any pending CSRs by running the following command:
oc get csr | grep -i Pending- Approve a pending CSR by running the following command:
oc adm certificate approve <csr_name>- Repeat these commands till the node is added to the cluster with the correct hostname and till no more CSRs are pending.