Troubleshooting resiliency issues on OpenShift and Cloud Pak for Integration
Resolve resiliency issues with API Connect management system backups.
pgbackrest-shared-repo resiliency problems
If the Kubernetes node hosting pgbackrest-shared-repo
pod is down and the PVC
attached to the pod has strict zone requirements (for example, in AWS or other clouds) or if the
storage class is set to local-storage
, then pgbackrest-shared-repo
pod will not get rescheduled to another Kubernetes node.
As a result, there is a single point of failure and the following conditions might occur:
- Backups of the management database fails
- Disk space fills with accumulated Postgres wal (Write-Ahead Logging) files
To avoid this problem, monitor the disk usage, see: Monitoring Postgres disk usage on OpenShift or Monitoring Postgres disk usage on Cloud Pak for Integration.
When the Kubernetes node comes back up, the pod is scheduled and all the processes should resume properly.