Simultaneous restart of worker nodes causes GlusterFS to fail
When you restart all the worker nodes at the same time, GlusterFS does not start.
Causes
Because of a simultaneous restart of the worker nodes, Heketi pod does not start. The Heketi container fails to start as it is unable to mount heketidbstorage volumes. Status of heketidbstorage shows as offline because the corresponding bricks are not online due to an unclean shutdown.
Resolving the problem
Get the GlusterFS pod information by running the following command:
kubectl -n kube-system get pod | grep gluster
Following is an example of the command output:
glusterfs-36nd0 1/1 Running 4 7d
glusterfs-3m5ql 1/1 Running 3 7d
glusterfs-tc279 1/1 Running 16 7d
Complete the following steps for all the GlusterFS pods:
-
Log in to the GlusterFS pod:
kubectl -n kube-system exec -it <POD ID> bash
Following is an example of the command and its output:
root@BPILICPMSTR001:~/cluster# kubectl -n kube-system exec -it glusterfs-36nd0 bash [root@bpilicpwrk001 /]#
-
Check the status of the GlusterFS volume on the pod:
gluster volume status
Following is an example of the command and its output:
[root@bpilicpwrk001 /]# gluster volume status Status of volume: heketidbstorage Gluster process TCP Port RDMA Port Online Pid Brick 10.10.25.49:/var/lib/heketi/mounts/vg _22bbf0fbb483f9c170774d83081c3420/brick_2fb 3a10c7eafb8bed375829e8aaf782a/brick 49153 0 Y 5858 Brick 10.10.25.51:/var/lib/heketi/mounts/vg _118f22bc13626321606280ea1d79fdc3/brick_649 4a3b077c38667f07a59197efabea7/brick 49153 0 Y 5318 Brick 10.10.25.50:/var/lib/heketi/mounts/vg _d4d4f2e86c08f571befe7fc272dc4aae/brick_dc9 416bf4d88e45ff4d0061c08ef5b19/brick 49153 0 Y 5441 Self-heal Daemon on localhost N/A N/A Y 5878 Self-heal Daemon on 10.10.25.50 N/A N/A Y 5461 Self-heal Daemon on 10.10.25.51 N/A N/A Y 5338 Task Status of Volume heketidbstorage There are no active volume tasks [root@bpilicpwrk001 /]#
If the bricks corresponding to heketidbstorage are down, restart the bricks by running the following commands:
gluster volume stop heketidbstorage
gluster volume start heketidbstorage force
-
Verify the Heketi pod status:
kubectl -n kube-system get pod | grep heketi
The status displays a message similar to the following message:
heketi-402978595-pjnd7 1/1 Running 0 2h