Recovering zone-aware HA applications with rwx storage
Applications that are deployed with topologyKey: topology.kubernetes.io/zone
,
have one or more replicas that are scheduled in each data zone, and are using shared storage, that
is, ReadWriteMany (rwx) CephFS volume, terminate themselves in the failed zone after few minutes and
new pods are rolled in and stuck in pending state until the zones are recovered.
An example of this type of application is detailed in the Installing Zone Aware Sample Application section.
Important: During zone
recovery if application pods go into CrashLoopBackOff (CLBO) state with permission denied error
while mounting the CephFS volume, then restart the nodes where the pods are scheduled. Wait for some
time and then check whether the pods are running again.