OCP troubleshooting

See the following information to troubleshoot RedHat OpenShift Container Platform issues.

Network discovery errors:
'cannot resolve bootstrap urls'
'Sherpa service nginx gateway timeout'

When these errors occur, restart dns pods using the following commands:
  1. kubectl get pods --namespace=openshift-dns
  2. kubectl delete --all pods --namespace=openshift-dns
    
  3. kubectl get pods --namespace=openshift-dns
Ensure all pods are up and running.

Services not binding to storage (after upgrade or uninstall)

Some services fail to bind to the provisioned storage, typically resulting in pods stuck in 'pending' state.

After removing a previous installation of Agile Service Manager and some of its PersistentVolumeClaim (PVC) objects, any associated PersistentVolume (PV) objects are placed in a 'Released' state. They are now unavailable for bonding, even if new PVCs that are part of a new Agile Service Manager installation have the same name and namespace. This is an important security feature to safeguard the previous PV data.

Investigating the problem: The following example lists the 'elasticsearch' pods and their status, and the result shows the 'pending' status, indicating the problem.
$ kubectl get pod -l app=elasticsearch

NAME                      READY  STATUS             RESTARTS  AGE
asm-elasticsearch-0       0/1    ContainerCreating  0         4s
asm-elasticsearch-1       0/1    Pending            0         3s
asm-elasticsearch-2       0/1    Pending            0         3s
This example examines the state of the PersistentVolumeClaims and the (truncated) result indicates that the status is 'pending'.
$ kubectl get pvc -l app=elasticsearch

NAME                       STATUS    VOLUME
data-asm-elasticsearch-0   Bound     asm-data-elasticsearch-0
data-asm-elasticsearch-1   Pending
data-asm-elasticsearch-2   Pending
This example examines the PersistentVolumes and the (truncated) result indicates that the status is 'released'.
$ kubectl get pv -l app=elasticsearch

NAME                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS
asm-data-elasticsearch-0   75Gi       RWO            Retain           Bound
asm-data-elasticsearch-1   75Gi       RWO            Retain           Released
asm-data-elasticsearch-2   75Gi       RWO            Retain           Released
Solution: As admin user, remove the PV.Spec.ClaimRef.UID field from the PV objects to make the PV available again. The following (truncated) example shows a PV that is bound to a specific PVC:
apiVersion: v1
kind: PersistentVolume
spec:
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: data-asm-elasticsearch-1
    namespace: default
    resourceVersion: "81033"
    uid: 3dc73022-bb1d-11e8-997a-00000a330243
To solve the problem, you edit the PV object and remove the uid field, after which the PV status changes to 'Available', as shown in the following example:
$ kubectl get pv -l app=elasticsearch

NAME                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS
asm-data-elasticsearch-0   75Gi       RWO            Retain           Bound
asm-data-elasticsearch-1   75Gi       RWO            Retain           Available
asm-data-elasticsearch-2   75Gi       RWO            Retain           Available

User interface timeout errors

To prevent or mitigate UI timeout errors, you can increase the timeout values for the following parameters, which are defined in configmap:
  • topologyServiceTimeout
  • searchServiceTimeout
  • layoutServiceTimeout
To change the timeout values of these (in seconds) edit the configmap using the following command:
kubectl edit configmap {{ .Release.Name }}-asm-ui-config
When done, restart the NOI webgui pod.