Troubleshooting resiliency issues
Resolve resiliency issues with API Connect management system backups.
pgbackrest-shared-repo resiliency problems
If the Kubernetes node hosting pgbackrest-shared-repo pod is down and the PVC attached to the pod
has strict zone requirements (for example, in AWS or other clouds) or if the storage class is set to
local-storage
, then pgbackrest-shared-repo pod will not get rescheduled to another
Kubernetes node.
As a result, there is a single point of failure and the following conditions might occur:
- Backups of the management database fails
- Disk space fills with accumulated Postgres wal (Write-Ahead Logging) files
To avoid this problem, monitor the disk usage: Monitoring Postgres disk usage on VMware.
When the Kubernetes node comes back up, the pod is scheduled and all the processes should resume properly.