IBM Support

How to clear /var directory data which purge action failed to delete the data

Troubleshooting


Problem

When run command #df -h we can see the /var directory will be full. If the /var directory is full, the postgresql database will be down and can not be started up anymore.

Cause

Perf service somehow failed to create the partition tables to record the cluster data. Purge action failed to delete the data from the database, the data will keep growing.

Resolving The Problem

1. Logon on the Platform HPC database.
=========================
#psql -U hpcadmin -d hpcdb
=========================
(Passwd is hpcadmin)

2. Run the following command to list the top size table.
=========================
#SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 20;
=========================

3. If we can see the outputs from the above steps has the similar information below, we can suppose the problem relates to such case.
=========================
Table Name Size Last Report Size This report Growth
ci_server_snap           19 GB 20 GB 1GB
ci_netcard_snap          18 GB 18 GB 0GB
resource_metrics         3756 MB 5195 MB 1439MB
ci_disk_util             3544 MB 3544 MB 0MB
hpc_hw_resource_metrics  2803 MB 2957 MB 154MB
......
=========================

4. Check the problematic tables partition status with the following sql command. We can see the problematic tables will not have the current date partition parts.
=========================
#select * from pg_tables where tablename like 'ci_server_snap%';
=========================

5. Backup the current hpc database.
=========================
#pg_dump -U hpcadmin hpcdb > hpcbackup
=========================

6. Drop the old data from the database
=========================
#delete from ci_server_snap where time_stamp < '%';
=========================
(please replace % to property time.)

7. Create new partition
=========================
#select init_create_partitions();
=========================

8. Free the disk.
=========================
#vacuum full;
=========================
(In this step, we need to make sure we have some free disk under the /var directory. The clear action needs the buffer space to free the disk)

[{"Product":{"code":"SSDV85","label":"Platform Cluster Manager"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF016","label":"Linux"}],"Version":"3.2","Edition":"Standard","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSZUCA","label":"IBM Spectrum Cluster Foundation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":null,"Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 September 2018

UID

isg3T1021385