Troubleshooting IBM Storage Scale Container Storage Interface (CSI)
CSI pods stuck in CrashLoopBackOff (Unauthorized GET request)
The following output shows an example of the CSI pods in CrashLoopBackOff
.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
ibm-spectrum-scale-csi-9566l 1/2 CrashLoopBackOff 9 26m
ibm-spectrum-scale-csi-attacher-0 1/1 Running 0 85m
ibm-spectrum-scale-csi-klr7x 1/2 CrashLoopBackOff 9 26m
ibm-spectrum-scale-csi-operator-56955949c4-mzn7g 1/1 Running 0 90m
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 0 85m
ibm-spectrum-scale-csi-xlxkl 1/2 CrashLoopBackOff 9 26m
The logs of the CSI pods might reveal what is causing the problems.
# kubectl logs ibm-spectrum-scale-csi-9566l -c ibm-spectrum-scale-csi
...
I1218 17:27:33.875884 1 http_utils.go:60] http_utils FormatURL. url: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.example.com:443/
I1218 17:27:33.875894 1 rest_v2.go:586] rest_v2 doHTTP. endpoint: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.example.com:443/scalemgmt/v2/cluster, method: GET, param: <nil>
I1218 17:27:33.875900 1 http_utils.go:74] http_utils HttpExecuteUserAuth. type: GET, url: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.example.com:443/scalemgmt/v2/cluster, user: csi-cnsa-gui-user
-
Check that the
csi-cnsa-gui-user
role was created.# kubectl exec ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/lsuser Defaulting container name to liberty. Use 'kubectl describe pod/ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale' to see all of the containers in this pod. Name Long name Password status Group names Failed login attempts Target Feedback Date ContainerOperator active ContainerOperator 0 EFSSG1000I The command completed successfully.
In this case, the
csi-cnsa-gui-user
role was not created. To resolve the issue, enter the following command to create the GUI user:# kubectl exec -c liberty ibm-spectrum-scale-gui-0 -n ibm-spectrum-scale -- /usr/lpp/mmfs/gui/cli/mkuser csi-cnsa-gui-user -p csi-cnsa-gui-password -g CsiAdmin EFSSG0019I The user csi-cnsa-gui-user has been successfully created. EFSSG1000I The command completed successfully.
-
Check that the
csi-remote-mount-storage-cluster-1
secret was created with correct credentials.# kubectl get secrets csi-remote-mount-storage-cluster-1 -n ibm-spectrum-scale-csi -ojsonpath='{.data.username}' | base64 --decode csi-cnsa-gui-user # kubectl get secrets csi-remote-mount-storage-cluster-1 -n ibm-spectrum-scale-csi -ojsonpath='{.data.password}' | base64 --decode this-is-a-bad-password
In this case, the
csi-remote-mount-storage-cluster-1
secret was created without the correct password. To resolve the issue, enter the following command to delete the secret and re-create it with correct values:# kubectl delete secrets csi-remote-mount-storage-cluster-1 -n ibm-spectrum-scale-csi secret "csi-remote-mount-storage-cluster-1" deleted # kubectl create secret generic csi-remote-mount-storage-cluster-1 --from-literal=username=csi-cnsa-gui-user --from-literal=password=csi-cnsa-gui-password -n ibm-spectrum-scale-csi secret/csi-remote-mount-storage-cluster-1 created # kubectl label secret csi-remote-mount-storage-cluster-1 product=ibm-spectrum-scale-csi -n ibm-spectrum-scale-csi secret/csi-remote-mount-storage-cluster-1 labeled
CSI CR is never created
When all the core pods are running and the IBM Storage Scale container native cluster appears to be in a good state, the CSI CR is created automatically. In some error paths this does not happen and causes the driver pods to not be scheduled.
# kubectl get po,csiscaleoperator -n ibm-spectrum-scale-csi
NAME READY STATUS RESTARTS AGE
pod/ibm-spectrum-scale-csi-operator-79bd756d58-ht6hf 1/1 Running 0 47h
Only the operator pod is listed and no results are found for csiscaleoperators
-
Check that all the GUI pods are up and running.
# kubectl get pods -n ibm-spectrum-scale NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-gui-0 4/4 Running 0 3m58s ibm-spectrum-scale-gui-1 4/4 Running 0 95s ibm-spectrum-scale-pmcollector-0 2/2 Running 0 3m59s worker0 2/2 Running 0 3m59s worker1 2/2 Running 0 3m58s worker2 2/2 Running 0 3m58s
All GUI pods must be up and running before the CSI CR is created. Each pod can take a few minutes for all containers in the pod to enter the
Running
state. -
Check that the daemon status has a nonempty cluster ID.
kubectl describe daemon -n ibm-spectrum-scale
Find the status section and validate that the
Cluster ID
field exists and is not empty.Status: Cluster ID: 3004252500454687654 Cluster Name: example.cluster.com
If those fields are missing, then the IBM Storage Scale container native cluster is experiencing an issue. Check the operator logs for more information.