Monitoring Postgres disk usage on VMware
Monitor disk usage by the Postgres database in the management subsystem, when deployed on VMware.
Postgres Database is the core database used in Management Subsystem. It is important to monitor the Postgres disk usage.
In v10.0.3.0 and greater, the APIConnect operator tracks the current disk usage of the Postgres
components, and regularly updates apic health-check
command.
APIConnect operator regularly updates the ManagementCluster
instance status
which is reflected in the apic health-check
command.
apic health-check
command reports WARNING state when disk utilization reaches
50%.
apic health-check
command reports ERROR state when disk utilization reaches 70%
and Postgres is brought down.
See:
- Management Cluster disk stats
- Warning condition is populated at 50% usage
- Error condition is populated at 70% usage
Management Cluster disk stats
The ManagementCluster
instance has .status.postgresDataStats
where the operator prints the current disk usage of Postgres components. Use
kubectl
to view the status.
To use kubectl
, first ssh into the appliance as root
.
kubectl get mgmt m1 -o json | jq .status.postgresDataStats
[
{
"instanceName": "m1-ed60c42d-postgres",
"podName": "m1-ed60c42d-postgres-86766f69cb-xs7t5",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres",
"pvcType": "PostgreSQL",
"pvcUsed": 60895232,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-backrest-shared-repo",
"podName": "m1-ed60c42d-postgres-backrest-shared-repo-859484b5f6-r7cv6",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-pgbr-repo",
"pvcType": "pgBackRest",
"pvcUsed": 16203776,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres",
"podName": "m1-ed60c42d-postgres-86766f69cb-xs7t5",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-wal",
"pvcType": "WAL",
"pvcUsed": 201338880,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-fqbg",
"podName": "m1-ed60c42d-postgres-fqbg-84869d976c-49qgc",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-fqbg",
"pvcType": "PostgreSQL",
"pvcUsed": 59994112,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-fqbg",
"podName": "m1-ed60c42d-postgres-fqbg-84869d976c-49qgc",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-fqbg-wal",
"pvcType": "WAL",
"pvcUsed": 201334784,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-cimo",
"podName": "m1-ed60c42d-postgres-cimo-5769868b75-5z7ln",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-cimo",
"pvcType": "PostgreSQL",
"pvcUsed": 59994112,
"pvcUsedPercentage": 0
},
{
"instanceName": "m1-ed60c42d-postgres-cimo",
"podName": "m1-ed60c42d-postgres-cimo-5769868b75-5z7ln",
"pvcCapacity": 250181844992,
"pvcName": "m1-ed60c42d-postgres-cimo-wal",
"pvcType": "WAL",
"pvcUsed": 201334784,
"pvcUsedPercentage": 0
}
]
Warning condition is populated at 50% usage
When one of the Postgres components occupy 50% usage, the operator marks the overall status of
the ManagementCluster
to Warning
with an appropriate warning
message, which is included in the output from apic health-check
command.
The warning condition explains why the warning condition was set.
To use apic
, first ssh into the appliance as root
.
apic health-check
FATA[0007] Cluster not in good health:
ManagementCluster is not Ready or Complete | State: 17/17 Phase:
Warning Message: Current WAL disk usage of nihar-stac-3955b42d-site1-postgres is 51 percent.
DATABASE SHUTDOWN starts at 70 percent utilization. Please contact IBM Support immediately
If you encounter the warning, contact IBM support to fix the root cause.
Error condition is populated at 70% usage
An error condition is populated when the operator brings down Postgres because it is using 70% of available disk space. Postgres is brought down to avoid problems, such as disk corruption, that can occur if the disk becomes completely filled.
To use apic
, first ssh into the appliance as root
.
apic health-check
FATA[0011] Cluster not in good health:
ManagementCluster is not Ready or Complete | State: 17/17 Phase:
Error Message: Current WAL disk usage of nihar-stac-3955b42d-site1-postgres is more than 70 percent,
initiating DATABASE SHUTDOWN
If you encounter the error, contact IBM support to fix the root cause.
On VMware, because the entire disk is allocated to the worker node, in some cases before the operator reacts to 70% usage, , Kubernetes itself can face disk pressure and starts evicting pods. In these cases, increase the disk allocated to worker node in order to stabilize the worker node. See Adding disk space to a VMware appliance.