Monitoring Postgres disk usage on Cloud Pak for Integration

In API Connect, you can monitor the disk space that is used by the Postgres database in the Management subsystem.

The Postgres database is the core database used in the Management subsystem. It is important to monitor the Postgres disk usage to avoid running out of space and causing an outage in your deployment.

The APIConnect operator tracks the current disk usage of the Postgres components, and regularly updates the ManagementCluster CR's status. When one or more of the Postgres components occupy 60% of the PVC (persistent volume claim) capacity, the APIConnect operator changes the CR's status from Running to Warning.

Important: If the Postgres disk usage reaches 80%, then the APIConnect operator brings down Postgres. If this situation occurs, contact IBM Support.

Note that there are various types of storage classes which can be used to deploy Management subsystem; for example, local-storage or ceph block. When local-storage is used, the entire disk is allocated to the worker node. In some cases, before the APIConnect operator reacts to an 80% usage condition, Kubernetes itself might face disk pressure and start evicting pods.

Viewing the current disk usage

The ManagementCluster instance includes the .status.postgresDataStats field, where the operator displays the current disk usage of Postgres components. Run the following command to get the disk usage:

oc get mgmt <mgmt-cr-name> -o json | jq .status.postgresDataStats

The response looks like the following example:

[
  {
    "instanceName": "<mgmt cr name>-site1-db-1",
    "podName": "",
    "pvcCapacity": 51200,
    "pvcName": "",
    "pvcType": "WAL",
    "pvcUsed": 400,
    "pvcUsedPercentage": 8
  },
  {
    "instanceName": "<mgmt cr name>-site1-db-1",
    "podName": "",
    "pvcCapacity": 184320,
    "pvcName": "",
    "pvcType": "PostgreSQL",
    "pvcUsed": 95,
    "pvcUsedPercentage": 8
  }
]

Warning condition is populated at 60% usage

When one of the Postgres components uses 60% of its allocated space, the operator changes the overall status of the ManagementCluster CR and the APIConnectCluster CR to Warning with an appropriate warning message. The operator also updates the postgresDataStats section with the current data usage of the Postgres components.

Attention: If you encounter the Warning condition, contact IBM Support for help correcting the root cause.

To view the status from the command line, run the following command:

  • APIConnectCluster CR:
    oc get apiconnect

    The following example response shows the warning status:

    NAME         READY                                                                                                            STATUS   VERSION        RECONCILED VERSION   AGE
    production   Current WAL disk usage of production-mgmt-31be1757-postgres is 60 percent. DATABASE SHUTDOWN starts at 80 percent utilization. Please contact IBM Support immediately.   Warning    10.0.8.0-eus   10.0.8.0-3394-eus    11d
  • ManagementCluster CR:
    oc get mgmt

    The following example response shows the warning status:

    NAME              READY   STATUS    VERSION        RECONCILED VERSION   AGE
    production-mgmt   18/18   Warning   10.0.8.0-eus   10.0.8.0-3394-eus    43h
    
    Current WAL disk usage of m1-ed60c42d-postgres is 61 percent. DATABASE SHUTDOWN starts at 80 percent utilization. Please contact IBM Support immediately

To view the status in IBM Cloud Pak Platform UI, click the Integration instances tab to see the status displayed next to the instance name

Warning status for an instance in the Platform UI

Click the status value (Warning, for this example) to display the Conditions list where a detailed message explains why the condition was set:

Warning condition details displayed in the Platform UI

Error condition is populated at 80% usage

When one of the Postgres components uses 80% of its allocated space, the operator changes the overall status of the ManagementCluster to Error and brings down Postgres avoid problems that can occur if the disk becomes completely filled.

Attention: If you encounter the Error condition, contact IBM Support for help correcting the root cause.

To view the status from the command line, run the following command:

  • APIConnectCluster CR:
    oc get apiconnect

    The following example response shows the error status:

    NAME         READY                                                                                                            STATUS   VERSION        RECONCILED VERSION   AGE
    production   Current disk usage is more than 80 percent. DATABASE will be shutdown. Please contact IBM Support immediately.   Error    10.0.8.0-eus   10.0.8.0-3394-eus    11d
  • ManagementCluster CR:
    oc get mgmt

    The following example response shows the warning status:

    NAME              READY   STATUS   VERSION        RECONCILED VERSION   AGE
    production-mgmt   9/9     Error    10.0.8.0   10.0.8.0-3394    11d

To view the status in the Platform UI, click the Integration instances tab to see the status displayed next to the instance name

Error status for an instance in the Platform UI

Click the status value (Error in this example) to display the Conditions list where a detailed message explains why the condition was set:

Error condition details displayed in the Platform UI