Recovering HA applications with rwx storage
Applications that are using
topologyKey: kubernetes.io/hostname
or no topology
configuration, have no protection against all the application replicas being in the same
zone.Note: This can happen even with podAntiAffinity and
topologyKey:
kubernetes.io/hostname
in the Pod spec because this anti-affinity rule is host-based
and not zone-based.If this happens and all replicas are located in the zone that fails, the application that uses
ReadWriteMany (rwx) storage takes 6-8 minutes to recover on the active zone. This pause is for the
Red Hat OpenShift Container Platform nodes in the failed zone to become NotReady
(60 seconds) and then for the default pod eviction timeout to expire (300 seconds).