OCP troubleshooting

See the following information to troubleshoot RedHat OpenShift Container Platform issues.

Network discovery errors:
'cannot resolve bootstrap urls'
'Sherpa service nginx gateway timeout'

When these errors occur, restart dns pods using the following commands:

kubectl get pods --namespace=openshift-dns

kubectl delete --all pods --namespace=openshift-dns

kubectl get pods --namespace=openshift-dns

Ensure all pods are up and running.

Services not binding to storage (after upgrade or uninstall)

Some services fail to bind to the provisioned storage, typically resulting in pods stuck in 'pending' state.

After removing a previous installation of Agile Service Manager and some of its PersistentVolumeClaim (PVC) objects, any associated PersistentVolume (PV) objects are placed in a 'Released' state. They are now unavailable for bonding, even if new PVCs that are part of a new Agile Service Manager installation have the same name and namespace. This is an important security feature to safeguard the previous PV data.

Investigating the problem: The following example lists the 'elasticsearch' pods and their status, and the result shows the 'pending' status, indicating the problem.

$ kubectl get pod -l app=elasticsearch


NAME                      READY  STATUS             RESTARTS  AGE
asm-elasticsearch-0       0/1    ContainerCreating  0         4s
asm-elasticsearch-1       0/1    Pending            0         3s
asm-elasticsearch-2       0/1    Pending            0         3s

This example examines the state of the PersistentVolumeClaims and the (truncated) result indicates that the status is 'pending'.

$ kubectl get pvc -l app=elasticsearch


NAME                       STATUS    VOLUME
data-asm-elasticsearch-0   Bound     asm-data-elasticsearch-0
data-asm-elasticsearch-1   Pending
data-asm-elasticsearch-2   Pending

This example examines the PersistentVolumes and the (truncated) result indicates that the status is 'released'.

$ kubectl get pv -l app=elasticsearch


NAME                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS
asm-data-elasticsearch-0   75Gi       RWO            Retain           Bound
asm-data-elasticsearch-1   75Gi       RWO            Retain           Released
asm-data-elasticsearch-2   75Gi       RWO            Retain           Released

Solution: As admin user, remove the PV.Spec.ClaimRef.UID field from the PV objects to make the PV available again. The following (truncated) example shows a PV that is bound to a specific PVC:

apiVersion: v1
kind: PersistentVolume
spec:
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: data-asm-elasticsearch-1
    namespace: default
    resourceVersion: "81033"
    uid: 3dc73022-bb1d-11e8-997a-00000a330243

To solve the problem, you edit the PV object and remove the uid field, after which the PV status changes to 'Available', as shown in the following example:

$ kubectl get pv -l app=elasticsearch


NAME                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS
asm-data-elasticsearch-0   75Gi       RWO            Retain           Bound
asm-data-elasticsearch-1   75Gi       RWO            Retain           Available
asm-data-elasticsearch-2   75Gi       RWO            Retain           Available

User interface timeout errors

To prevent or mitigate UI timeout errors, you can increase the timeout values for the following parameters, which are defined in configmap:

topologyServiceTimeout
searchServiceTimeout
layoutServiceTimeout

To change the timeout values of these (in seconds) edit the configmap using the following command:

kubectl edit configmap {{ .Release.Name }}-asm-ui-config

When done, restart the NOI webgui pod.

OCP troubleshooting

Network discovery errors: 'cannot resolve bootstrap urls' 'Sherpa service nginx gateway timeout'

Services not binding to storage (after upgrade or uninstall)

User interface timeout errors

Network discovery errors:
'cannot resolve bootstrap urls'
'Sherpa service nginx gateway timeout'