Monitoring Postgres disk usage on OpenShift
In API Connect, you can monitor the disk space that is used by the Postgres database in the Management subsystem.
The Postgres database is the core database used in the Management subsystem. It is important to monitor the Postgres disk usage to avoid running out of space and causing an outage in your deployment.
In Version 10.0.1.4-eus and later, the APIConnect
operator tracks the current disk
usage of the Postgres components, and regularly updates the ManagementCluster
CR's
status. When one or more of the Postgres components occupy 50% of the PVC (persistent volume claim)
capacity, the APIConnect
operator changes the CR's status from
Running
to Warning
.
APIConnect
operator brings down Postgres. If this situation occurs, contact IBM Support.Note that there are various types of storage classes which can be used to deploy Management
subsystem; for example, local-storage
or ceph block
. When
local-storage
is used, the entire disk is allocated to the worker node. In some
cases, before the APIConnect
operator reacts to an 80% usage condition, Kubernetes
itself might face disk pressure and start evicting pods. In this situation, you might want to
increase the size of the disk allocated to the worker node as explained in Recovering on OpenShift and Cloud Pak for Integration when disks are filled by the management database.
Viewing the current disk usage
The ManagementCluster
instance includes the
.status.postgresDataStats
field, where the operator displays the current disk usage
of Postgres components. Run the following command to get the disk usage:
ocp get mgmt m1 -n APIC_namespace -o json | jq .status.postgresDataStats
The response looks like the following example:
[
{
"instanceName": "m1-ed60c42d-postgres",
"podName": "m1-ed60c42d-postgres-86766f69cb-xs7t5",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres",
"pvcType": "PostgreSQL",
"pvcUsed": 60895232,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-backrest-shared-repo",
"podName": "m1-ed60c42d-postgres-backrest-shared-repo-859484b5f6-r7cv6",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-pgbr-repo",
"pvcType": "pgBackRest",
"pvcUsed": 16203776,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres",
"podName": "m1-ed60c42d-postgres-86766f69cb-xs7t5",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-wal",
"pvcType": "WAL",
"pvcUsed": 201338880,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-fqbg",
"podName": "m1-ed60c42d-postgres-fqbg-84869d976c-49qgc",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-fqbg",
"pvcType": "PostgreSQL",
"pvcUsed": 59994112,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-fqbg",
"podName": "m1-ed60c42d-postgres-fqbg-84869d976c-49qgc",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-fqbg-wal",
"pvcType": "WAL",
"pvcUsed": 201334784,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-cimo",
"podName": "m1-ed60c42d-postgres-cimo-5769868b75-5z7ln",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-cimo",
"pvcType": "PostgreSQL",
"pvcUsed": 59994112,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-cimo",
"podName": "m1-ed60c42d-postgres-cimo-5769868b75-5z7ln",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-cimo-wal",
"pvcType": "WAL",
"pvcUsed": 201334784,
"pvcUsedPercentage": 0
}
]
Warning condition is populated at 50% usage
When one of the Postgres components uses 50% of its allocated space, the operator changes the
overall status of the ManagementCluster
to Warning
with an
appropriate warning message. The operator also updates the postgresDataStats
section with the current data usage of the Postgres components.
Warning
condition, contact IBM Support
for help correcting the root cause.To view the status, run the following command:
oc get mgmt m1
The following example response shows the warning state:
NAME READY STATUS VERSION RECONCILED VERSION AGE
m1 17/17 Warning 10.0.1-eus 10.0.1.4-ifix1-44-eus 26h
To view just the status, run the following command:
oc get mgmt m1 -o json | jq .status.phase
In this example, the status displayed in the response is "Warning":
"Warning"
To see the list of current status conditions, run the following command:
oc get mgmt m1 -o json | jq .status.conditions
The following example response displays the list of status conditions. Notice that the
Warning
"condition is set to "True" and the other conditions are set to "False":
[
{
"lastTransitionTime": "2021-09-08T19:49:51Z",
"message": "Current WAL disk usage of m1-ed60c42d-postgres is 51 percent. DATABASE SHUTDOWN starts at 80 percent utilization. Please contact IBM Support immediately",
"reason": "disk_usage_more_than_50_percent",
"status": "True",
"type": "Warning"
},
{
"lastTransitionTime": "2021-09-08T19:42:30Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2021-09-08T19:41:35Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Pending"
},
{
"lastTransitionTime": "2021-09-07T17:18:02Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Error"
},
{
"lastTransitionTime": "2021-09-07T17:18:02Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Failed"
}
]
The warning condition includes an explanation of why the condition was set to "True"; for example:
Current WAL disk usage of m1-ed60c42d-postgres is 51 percent. DATABASE SHUTDOWN starts at 80 percent utilization. Please contact IBM Support immediately
Error condition is populated at 80% usage
When one of the Postgres components uses 80% of its allocated space, the operator changes the
overall status of the ManagementCluster
to Error
and brings down
Postgres avoid problems that can occur if the disk becomes completely filled.
Error
condition, contact IBM Support
for help correcting the root cause.In this case, when you run the following command to view the status conditions, you will see that
the Error
condition is set to "True:
oc get mgmt m1 -o json | jq .status.conditions
For example, the following condition shows the Error
status set to "True" due to
disk usage:
- lastTransitionTime: "2021-08-28T01:42:56Z"
message: Current WAL disk usage of nihar-stac-3955b42d-site1-postgres is more
than 70 percent, initiating DATABASE SHUTDOWN
reason: disk_usage_more_than_70_percent
status: "True"
type: Error