Simultaneous restart of worker nodes causes GlusterFS to fail

When you restart all the worker nodes at the same time, GlusterFS does not start.

Causes

Because of a simultaneous restart of the worker nodes, Heketi pod does not start. The Heketi container fails to start as it is unable to mount heketidbstorage volumes. Status of heketidbstorage shows as offline because the corresponding bricks are not online due to an unclean shutdown.

Resolving the problem

Get the GlusterFS pod information by running the following command:

kubectl -n kube-system get pod | grep gluster

Following is an example of the command output:

glusterfs-36nd0 1/1 Running 4 7d
glusterfs-3m5ql 1/1 Running 3 7d
glusterfs-tc279 1/1 Running 16 7d

Complete the following steps for all the GlusterFS pods:

 kubectl -n kube-system exec -it <POD ID> bash

Following is an example of the command and its output:

 root@BPILICPMSTR001:~/cluster# kubectl -n kube-system exec -it glusterfs-36nd0 bash
 [root@bpilicpwrk001 /]#

Check the status of the GlusterFS volume on the pod:

 gluster volume status

Following is an example of the command and its output:

 [root@bpilicpwrk001 /]# gluster volume status
 Status of volume: heketidbstorage
 Gluster process TCP Port RDMA Port Online Pid

 Brick 10.10.25.49:/var/lib/heketi/mounts/vg
 _22bbf0fbb483f9c170774d83081c3420/brick_2fb
 3a10c7eafb8bed375829e8aaf782a/brick 49153 0 Y 5858
 Brick 10.10.25.51:/var/lib/heketi/mounts/vg
 _118f22bc13626321606280ea1d79fdc3/brick_649
 4a3b077c38667f07a59197efabea7/brick 49153 0 Y 5318
 Brick 10.10.25.50:/var/lib/heketi/mounts/vg
 _d4d4f2e86c08f571befe7fc272dc4aae/brick_dc9
 416bf4d88e45ff4d0061c08ef5b19/brick 49153 0 Y 5441
 Self-heal Daemon on localhost N/A N/A Y 5878
 Self-heal Daemon on 10.10.25.50 N/A N/A Y 5461
 Self-heal Daemon on 10.10.25.51 N/A N/A Y 5338
 Task Status of Volume heketidbstorage

 There are no active volume tasks

 [root@bpilicpwrk001 /]#

If the bricks corresponding to heketidbstorage are down, restart the bricks by running the following commands:

 gluster volume stop heketidbstorage

 gluster volume start heketidbstorage force

Verify the Heketi pod status:
```
 kubectl -n kube-system get pod | grep heketi
```
The status displays a message similar to the following message:
```
 heketi-402978595-pjnd7 1/1 Running 0 2h
```