Monitoring Postgres disk usage
Monitor disk usage by the Postgres database in the management subsystem.
Postgres Database is the core database used in Management Subsystem. It is important to monitor the Postgres disk usage.
In 10.0.3.0 and greater, the APIConnect operator tracks the current disk usage of the Postgres
components, and regularly updates the ManagementCluster
status. When one or more of
the Postgres components occupy 50% of the PVC (persistent volume claim) capacity, the APIConnect
operator changes the status from Running
to Warning
.
APIConnect operator brings down Postgres if the disk allocation reaches 80%. In this case, you must contact IBM support.
Note that there are various types of storage classes which can be used to deploy Management
Subsystem. For example, local-storage
or ceph block
.
- Management Cluster disk stats
- Warning condition is populated at 50% usage
- Error condition is populated at 80% usage
Management Cluster disk stats
The ManagementCluster
instance has .status.postgresDataStats
where the operator prints the current disk usage of Postgres components. For example:
kubectl get mgmt m1 -o json | jq .status.postgresDataStats
[
{
"instanceName": "m1-ed60c42d-postgres",
"podName": "m1-ed60c42d-postgres-86766f69cb-xs7t5",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres",
"pvcType": "PostgreSQL",
"pvcUsed": 60895232,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-backrest-shared-repo",
"podName": "m1-ed60c42d-postgres-backrest-shared-repo-859484b5f6-r7cv6",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-pgbr-repo",
"pvcType": "pgBackRest",
"pvcUsed": 16203776,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres",
"podName": "m1-ed60c42d-postgres-86766f69cb-xs7t5",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-wal",
"pvcType": "WAL",
"pvcUsed": 201338880,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-fqbg",
"podName": "m1-ed60c42d-postgres-fqbg-84869d976c-49qgc",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-fqbg",
"pvcType": "PostgreSQL",
"pvcUsed": 59994112,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-fqbg",
"podName": "m1-ed60c42d-postgres-fqbg-84869d976c-49qgc",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-fqbg-wal",
"pvcType": "WAL",
"pvcUsed": 201334784,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-cimo",
"podName": "m1-ed60c42d-postgres-cimo-5769868b75-5z7ln",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-cimo",
"pvcType": "PostgreSQL",
"pvcUsed": 59994112,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-cimo",
"podName": "m1-ed60c42d-postgres-cimo-5769868b75-5z7ln",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-cimo-wal",
"pvcType": "WAL",
"pvcUsed": 201334784,
"pvcUsedPercentage": 0
}
]
Warning condition is populated at 50% usage
When one of the Postgres components occupy 50% usage, the operator marks the overall status of
the ManagementCluster
to Warning
with an appropriate warning
message. The operator also updates the postgresDataStats
section with the current
data usage of the Postgres components.
Example of warning state:
kubectl get mgmt m1
NAME READY STATUS VERSION RECONCILED VERSION AGE
m1 17/17 Warning 10.0.1-eus 10.0.1.4-ifix1-44-eus 26h
kubectl get mgmt m1 -o json | jq .status.phase
"Warning"
Example of list of warning conditions:
kubectl get mgmt m1 -o json | jq .status.conditions
[
{
"lastTransitionTime": "2021-09-08T19:49:51Z",
"message": "Current WAL disk usage of m1-ed60c42d-postgres is 51 percent. DATABASE SHUTDOWN starts at 80 percent utilization. Please contact IBM Support immediately",
"reason": "disk_usage_more_than_50_percent",
"status": "True",
"type": "Warning"
},
{
"lastTransitionTime": "2021-09-08T19:42:30Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2021-09-08T19:41:35Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Pending"
},
{
"lastTransitionTime": "2021-09-07T17:18:02Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Error"
},
{
"lastTransitionTime": "2021-09-07T17:18:02Z",
"message": "",
"reason": "na",
"status": "False",
"type": "Failed"
}
]
The warning condition explains why the warning condition was set. For example:
Current WAL disk usage of m1-ed60c42d-postgres is 51 percent. DATABASE SHUTDOWN starts at 80 percent utilization. Please contact IBM Support immediately
If you encounter the warning, contact IBM support to fix the root cause.
Error condition is populated at 80% usage
An error condition is populated when the operator brings down Postgres because is using 80% of available disk space. Postgres is brought down to avoid problems, such as disk corruption, that can occur if the disk becomes completely filled.
- lastTransitionTime: "2021-08-28T01:42:56Z"
message: Current WAL disk usage of nihar-stac-3955b42d-site1-postgres is more
than 70 percent, initiating DATABASE SHUTDOWN
reason: disk_usage_more_than_70_percent
status: "True"
type: Error
If you encounter the error, contact IBM support to fix the root cause.
When local-storage
is used, it uses the entire disk that is allocated to the
worker node. In some cases, Kubernetes itself can face disk pressure before the operator reacts to
80% usage, and will start evicting pods. In these cases, you might need to increase the disk
allocated to the worker node, in order to stabilize the worker node, by following Recovering when disks are filled by the management database.