Monitoring Postgres disk usage on OpenShift
In API Connect, you can monitor the disk space that is used by the Postgres database in the Management subsystem.
The Postgres database is the core database used in the Management subsystem. It is important to monitor the Postgres disk usage to avoid running out of space and causing an outage in your deployment.
The APIConnect
operator tracks the current disk usage of the Postgres
components, and regularly updates the ManagementCluster
CR's status. When one or
more of the Postgres components occupy 60% of the PVC (persistent volume claim) capacity, the
APIConnect
operator changes the CR's status from Running
to
Warning
.
APIConnect
operator brings down Postgres. If this situation occurs, contact IBM Support.Note that there are various types of storage classes which can be used to deploy Management
subsystem; for example, local-storage
or ceph block
. When
local-storage
is used, the entire disk is allocated to the worker node. In some
cases, before the APIConnect
operator reacts to an 80% usage condition, Kubernetes
itself might face disk pressure and start evicting pods.
Viewing the current disk usage
The ManagementCluster
instance includes the
.status.postgresDataStats
field, where the operator displays the current disk usage
of Postgres components. Run the following command to get the disk usage:
oc get mgmt -o json | jq .items[].status.postgresDataStats
The response looks like the following example:
[
{
"instanceName": "<mgmt cr name>-site1-db-1",
"podName": "",
"pvcCapacity": 51200,
"pvcName": "",
"pvcType": "WAL",
"pvcUsed": 400,
"pvcUsedPercentage": 8
},
{
"instanceName": "<mgmt cr name>-site1-db-1",
"podName": "",
"pvcCapacity": 184320,
"pvcName": "",
"pvcType": "PostgreSQL",
"pvcUsed": 95,
"pvcUsedPercentage": 8
}
]
Warning condition is populated at 60% usage
When one of the Postgres components uses 60% of its allocated space, the operator changes the
overall status of the ManagementCluster
to Warning
with an
appropriate warning message. The operator also updates the postgresDataStats
section with the current data usage of the Postgres components.
Warning
condition, contact IBM Support
for help correcting the root cause.To view the status, run the following command:
oc get mgmt
The following example response shows the warning state:
NAME READY STATUS VERSION RECONCILED VERSION MESSAGE AGE
stv3-management 6/8 Warning 10.0.8.0 10.0.8.0-5363 Some services are not ready - see status condition for details 3d20h
To view just the status, run the following command:
oc get mgmt -o json | jq .items[].status.phase
In this example, the status displayed in the response is "Warning":
"Warning"
To see the list of current status conditions, run the following command:
oc get mgmt -o json | jq .items[].status.conditions
The following example response displays the list of status conditions. Notice that the
Warning
"condition is set to "True" and the other conditions are set to "False":
status:
conditions:
- lastTransitionTime: "2022-06-02T19:38:09Z"
message: Warning threshold=60%, Current disk usage=63%, Is wal archiving working?=false.
Database shutdown starts at 80%. Please contact IBM Support immediately.
reason: wal_disk_usage_more_than_warning_threshold
status: "True"
type: Warning
Error condition is populated at 80% usage
When one of the Postgres components uses 80% of its allocated space, the operator changes the
overall status of the ManagementCluster
to Error
and brings down
Postgres avoid problems that can occur if the disk becomes completely filled.
Error
condition, contact IBM Support
for help correcting the root cause.In this case, when you run the following command to view the status conditions, you will see that the "Error" condition is set to "True:
oc get mgmt -o json | jq .items[].status.conditions
For example, the following condition shows the Error
status set to "True" due to
disk usage:
status:
conditions:
- lastTransitionTime: "2022-06-03T00:40:54Z"
message: Error threshold=80%, Current disk usage=82%, Is wal archiving working?=false.
Database is in shutdown mode and management services are disabled. Please contact
IBM Support immediately.
reason: wal_disk_usage_more_than_error_threshold
status: "True"
type: Error