Troubleshooting deployment
- no endpoints available for service "ibm-spectrum-scale-webhook-service"
- core pods are stuck in init
- core, GUI, or collector pods are in ErrImgPull or ImagePullBackOff state
- core, GUI, or collector pods are not up
- core, GUI, or collector pods show container restarts
- all pods are running but the GPFS cluster is stuck in the "arbitrating" state
- remote mount file system not getting configured or mounted
no endpoints available for service "ibm-spectrum-scale-webhook-service"
When creating or editing IBM Storage Scale container native custom resources, it is possible that validating or mutating webhooks fails when the operator pod is unavailable. If you receive an error: no endpoints for service "ibm-spectrum-scale-webhook-service"
,
check the state of the operator pod.
Example error:
# kubectl apply -f remotecluster.yaml
remotecluster.scale.spectrum.ibm.com/remotecluster-sample unchanged
Error from server (InternalError): error when creating "remotecluster.yaml": Internal error occurred: failed calling webhook "mcluster.scale.spectrum.ibm.com": failed to call webhook: Post "https://ibm-spectrum-scale-webhook-service.ibm-spectrum-scale-operator.svc:443/mutate-scale-spectrum-ibm-com-v1beta1-cluster?timeout=10s": no endpoints available for service "ibm-spectrum-scale-webhook-service"
Checking the operator pod, the STATUS is ImagePullBackOff
:
# kubectl get pods -n ibm-spectrum-scale-operator
NAME READY STATUS RESTARTS AGE
pod/ibm-spectrum-scale-controller-manager-64bb4798df-rrj4j 0/1 ImagePullBackOff 10 (4m14s ago) 34m
In the example, the operator is unable to pull the image due to errors in the pull credentials. To remedy the no endpoints available
issue, the operator problems must be resolved first, then retry the original commands.
core pods are stuck in init
If for some reason, the IBM Storage Scale container native cluster fails to create, the core pods on the worker nodes get stuck in the Init container.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
...
worker0 2/2 Init:1/2 0 2h
worker1 2/2 Init:1/2 0 2h
worker2 2/2 Init:1/2 0 2h
worker3 2/2 Init:1/2 0 2h
There is no recovery from this problem. For more information about clean up, see Cleanup IBM Storage Scale container native and Cleanup Red Hat OpenShift nodes. For more information on redeploying, see Installing the IBM Storage Scale container native operator and cluster.
core, GUI, or collector pods are in ErrImgPull or ImagePullBackOff state
When viewing kubectl get pods -n ibm-spectrum-scale
, if any of the pods are in ErrImgPull
or ImagePullBackOff
state, use kubectl describe pod <podname>
to get more details on the pod and
look for any errors that may be happening.
kubectl describe pod <pod-name> -n ibm-spectrum-scale
core, GUI, or collector pods are not up
-
If the pods are not deployed in the
ibm-spectrum-scale
namespace, or the cluster is not created, examine operator pod logs:kubectl logs $(kubectl get pods -n ibm-spectrum-scale-operator -ojson | jq -r ".items[0].metadata.name") -n ibm-spectrum-scale-operator
core, GUI, or collector pods show container restarts
-
Kubernetes keeps the logs of the current container and the previous container. Look at the previous container's logs for any clues by using the following command:
kubectl logs -p <scale pod> -n ibm-spectrum-scale
all pods are running but the GPFS cluster is stuck in the "arbitrating" state
If the cluster is stuck in an arbitrating state:
-
Check the output of
mmlscluster
.kubectl exec $(kubectl get pods -lapp.kubernetes.io/name=core -n ibm-spectrum-scale -o json | jq -r ".items[0].metadata.name") -- mmlscluster
-
Check the GPFS logs.
kubectl logs $(kubectl get pods -lapp.kubernetes.io/name=core -n ibm-spectrum-scale -o json | jq -r ".items[0].metadata.name") -c logs | grep mmfs.log.latest
remote mount file system not getting configured or mounted
-
Check the
RemoteCluster
objects and theFilesystem
objects. TheFilesystem
controller waits until the RemoteCluster object is Ready before attempting to configure the remote mount file system. Describe the objects and checkStatus
orEvents
for any reasons for failures.-
Remote Clusters
kubectl get remotecluster.scale -n ibm-spectrum-scale kubectl describe remotecluster.scale <name> -n ibm-spectrum-scale
-
* File system
```bash
kubectl get filesystem.scale -n ibm-spectrum-scale
kubectl describe filesystem.scale <name> -n ibm-spectrum-scale
```
Check the `Status` and `Events` for any reasons of failure.
If nothing, check the operator logs for any errors:
```bash
kubectl logs $(kubectl get pods -n ibm-spectrum-scale-operator -ojson | jq -r ".items[0].metadata.name") -n ibm-spectrum-scale-operator
```
- Enter the
mmnetverify
command to verify the network between the clusters. For more information, see mmnetverify command in IBM Storage Scale documentation.