Monitoring Postgres disk usage
Monitor disk usage by the Postgres database in the management subsystem.
Postgres Database is the core database used in Management Subsystem. It is important to monitor the Postgres disk usage.
The API Connect operator tracks the current disk usage of the Postgres components, and regularly
updates the ManagementCluster
status. When one or more of the Postgres components
occupy 60% of the PVC (persistent volume claim) capacity, the API Connect operator changes the
status from Running
to Warning
.
API Connect operator brings down Postgres if the disk allocation reaches 80%. In this case, you must contact IBM support.
Note that there are various types of storage classes which can be used to deploy Management
Subsystem. For example, local-storage
or ceph block
.
Management Cluster disk stats
The ManagementCluster
instance has .status.postgresDataStats
where the operator prints the current disk usage of Postgres components. For example:
kubectl get mgmt -o json | jq .status.postgresDataStats
[
{
"instanceName": "<mgmt cr name>-site1-db-1",
"podName": "",
"pvcCapacity": 51200,
"pvcName": "",
"pvcType": "WAL",
"pvcUsed": 400,
"pvcUsedPercentage": 8
},
{
"instanceName": "<mgmt cr name>-site1-db-1",
"podName": "",
"pvcCapacity": 184320,
"pvcName": "",
"pvcType": "PostgreSQL",
"pvcUsed": 95,
"pvcUsedPercentage": 8
}
]
Warning condition is populated at 60% usage
When one of the Postgres components occupy 60% usage, the operator marks the overall status of
the ManagementCluster
to Warning
with an appropriate warning
message. The operator also updates the postgresDataStats
section with the current
data usage of the Postgres components.
Example of warning state:
kubectl get mgmt
NAME READY STATUS VERSION RECONCILED VERSION MESSAGE AGE
stv3-management 6/8 Warning 10.0.8.0 10.0.8.0-5363 Some services are not ready - see status condition for details 3d20h
Example of list of warning conditions:
kubectl get mgmt -o json
...
status:
conditions:
- lastTransitionTime: "2022-06-02T19:38:09Z"
message: Warning threshold=60%, Current disk usage=63%, Is wal archiving working?=false.
Database shutdown starts at 80%. Please contact IBM Support immediately.
reason: wal_disk_usage_more_than_warning_threshold
status: "True"
type: Warning
...
If you encounter the warning, contact IBM support to fix the root cause.
Error condition is populated at 80% usage
An error condition is populated when the operator brings down Postgres because is using 80% of available disk space. Postgres is brought down to avoid problems, such as disk corruption, that can occur if the disk becomes completely filled.
status:
conditions:
- lastTransitionTime: "2022-06-03T00:40:54Z"
message: Error threshold=80%, Current disk usage=82%, Is wal archiving working?=false.
Database is in shutdown mode and management services are disabled. Please contact
IBM Support immediately.
reason: wal_disk_usage_more_than_error_threshold
status: "True"
type: Error
If you encounter the error, contact IBM support to fix the root cause.
When local-storage
is used, it uses the entire disk that is allocated to the
worker node. In some cases, Kubernetes itself can face disk pressure before the operator reacts to
80% usage, and will start evicting pods.