Troubleshooting issues in installation
Known issues in the installation of Managed services.
-
If you face issues while accessing the Managed services user interface post the installation, check your firewall settings and turn it off.
-
If the Managed services entry is missing from the navigation menu Automate Infrastructure, this might be caused by a problem during Managed services installation.
To resolve this issue:
- Delete the Managed services installation instance and install it again.
- After successful installation and creation of the Managed services entry in the navigation menu Automate Infrastructure, delete the Service library pods to restart them.
-
There will be instances where the
cam-tenant-api pod
does not start even after 45 minutes and an error 'Failed to get IAM access token' is displayed. Restart the pod to resolve the issue. -
Sometimes, even after the successful deployment of Managed services, the "cam-mongo" microservice might go down unexpectedly.
Run the following command to check the pod log:
kubectl describe pods -n cp4aiops
If this command does not provide you the necessary details to understand the issue, run the kubectl command to get logs from previously running container. For example, this
kubectl -n cp4aiops logs cam-mongo-5c89fcccbd-r2hv4
command results in the following output:exception in initAndListen: 98 Unable to lock file: /data/db/mongod.lock Resource temporarily unavailable. Is a mongod instance already running?, terminatingConclusion: While starting the container inside "cam-mongo" pod it was unable to use the existing /data/db/mongod.lock file and hence your pod will be not up and running and you cannot acces CAM URL.Solution:
To resolve the issue, do the following steps:
-
Use the following pod creation yaml to spin up a container and mount the cam-mongo volume within it. It mounts the concerned pv's, that is,
cam-mongo-pv
where /data/db/ is present.apiVersion: v1 kind: Pod metadata: name: mongo-troubleshoot-pod spec: volumes: - name: cam-mongo-pv persistentVolumeClaim: claimName: cam-mongo-pv containers: - name: mongo-troubleshoot image: nginx ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/data/db" name: cam-mongo-pv
-
Use
podman exec -it /bin/bash
to keep stdin open and to allocate a terminal. Run the following commands:cd /data/db rm mongod.lock rm WiredTiger.lock
-
Kill the pod that you created for troubleshooting.
-
Run the following command to kill the corrupted cam-mongo pod:
kubectl delete pods -n cp4aiops
-
-
Managed services Container Debugging (kubectl)
When a container is not in running state, run the following kubect1 commands to describe pods and look for errors:
kubectl -n cp4aiops get pod kubectl -n cp4aiops describe pod <podname> kubectl -n cp4aiops get pv kubectl -n cp4aiops describe pv <pvname>
Look for events or error messages when describing the pods or persistent volumes that are not in health states. For example,
CrashLoopBackoff, Pending (for a while), Init (for a while)
. -
Run the following commands to ensure PVs are created successfully:
kubectl -n cp4aiops describe pv cam-mongo-pv
If PVs are not set up, follow PV setup steps before you install Managed services:
Note: PVs must be deleted and re-created everytime Managed services is installed.
-
Managed services installation fails due to an incorrect
Worker node architecture
value.The installation fails with the following error message:
Events: Type Reason Age From Message ---- ------ ---- --- ------- Warning FailedScheduling 71s (x2 over 71s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.
To resolve this issue, review the input parameter for the 'Worker node architecture' and check if the supported architecture is selected/entered correctly:
amd64
-
Managed services reinstall fails due to an existing
cam-api-secret-gen-job
left from prior Managed services installations and the job hangs indefinitely with the following error:Internal service error : rpc error: code = Unknown desc = jobs.batch “cam-api-secret-gen-job” already exists root@csz25087:~# kubectl -n cp4aiops get pods NAME READY STATUS RESTARTS AGE cam-api-secret-gen-job-n5d87 0/1 Completed 0 24m
To resolve this issue:
-
Run the following command:
kubectl -n cp4aiops delete job cam-api-secret-gen-job
-
Install Managed services.
-
-
Managed services installation fails due to an existing
template-crd-gen-job
. In the prior installations, you might see the job hang for ten minutes and then time out.Internal service error : rpc error: code = Unknown desc = jobs.batch "template-crd-gen-job" already exists root@csz25087:~# kubectl -n cp4aiops get pods NAME READY STATUS RESTARTS AGE template-crd-gen-job-wm7mj 0/1 ImagePullBackOff 0 8m27s
To resolve this issue:
-
Run the following command:
kubectl -n cp4aiops delete job template-crd-gen-job
-
Install Managed services.
-
-
Encountered an error
3.6.0.0 (20220113_2156) x86_64
while accessing the library page after you uninstall and reinstall Managed services.To resolve this issue, increase the
icpdata_addon_version
to a higher value in the configmapcam-proxy-zen-extension
.oc -n cp4aiops edit configmap cam-proxy-zen-extension
-
A Bad Gateway error appears on some Managed Services pages following the installation of Infrastructure Automation.
In order to resolve this issue, restart the
cam-tenant-api
pod as follows:oc delete pod $(oc get pod|grep cam-tenant-api|awk '{print $1}')