OCP troubleshooting
See the following information to troubleshoot RedHat OpenShift Container Platform issues.
Network discovery errors:
'cannot resolve bootstrap urls'
'Sherpa
service nginx gateway timeout'
When these errors occur, restart dns pods using the following commands:
-
kubectl get pods --namespace=openshift-dns -
kubectl delete --all pods --namespace=openshift-dns -
kubectl get pods --namespace=openshift-dns
Services not binding to storage (after upgrade or uninstall)
Some services fail to bind to the provisioned storage, typically resulting in pods stuck in 'pending' state.
After removing a previous installation of Agile Service Manager and some of its PersistentVolumeClaim (PVC) objects, any associated PersistentVolume (PV) objects are placed in a 'Released' state. They are now unavailable for bonding, even if new PVCs that are part of a new Agile Service Manager installation have the same name and namespace. This is an important security feature to safeguard the previous PV data.
Investigating the problem: The following example lists the 'elasticsearch' pods and their
status, and the result shows the 'pending' status, indicating the
problem.
$ kubectl get pod -l app=elasticsearch
NAME READY STATUS RESTARTS AGE
asm-elasticsearch-0 0/1 ContainerCreating 0 4s
asm-elasticsearch-1 0/1 Pending 0 3s
asm-elasticsearch-2 0/1 Pending 0 3s
This
example examines the state of the PersistentVolumeClaims and the (truncated) result indicates that
the status is 'pending'.
$ kubectl get pvc -l app=elasticsearch
NAME STATUS VOLUME
data-asm-elasticsearch-0 Bound asm-data-elasticsearch-0
data-asm-elasticsearch-1 Pending
data-asm-elasticsearch-2 Pending
This
example examines the PersistentVolumes and the (truncated) result indicates that the status is
'released'.$ kubectl get pv -l app=elasticsearch
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS
asm-data-elasticsearch-0 75Gi RWO Retain Bound
asm-data-elasticsearch-1 75Gi RWO Retain Released
asm-data-elasticsearch-2 75Gi RWO Retain Released
Solution: As admin user, remove the PV.Spec.ClaimRef.UID field from the
PV objects to make the PV available again. The following (truncated) example shows a PV that is
bound to a specific
PVC:
apiVersion: v1
kind: PersistentVolume
spec:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: data-asm-elasticsearch-1
namespace: default
resourceVersion: "81033"
uid: 3dc73022-bb1d-11e8-997a-00000a330243
To
solve the problem, you edit the PV object and remove the uid field, after which the
PV status changes to 'Available', as shown in the following
example:$ kubectl get pv -l app=elasticsearch
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS
asm-data-elasticsearch-0 75Gi RWO Retain Bound
asm-data-elasticsearch-1 75Gi RWO Retain Available
asm-data-elasticsearch-2 75Gi RWO Retain Available
User interface timeout errors
To prevent or mitigate UI timeout errors, you can increase the timeout values for the following
parameters, which are defined in configmap:
- topologyServiceTimeout
- searchServiceTimeout
- layoutServiceTimeout
kubectl edit configmap {{ .Release.Name }}-asm-ui-config
When done,
restart the NOI webgui pod.