Enabling file pruning

Enable the file pruning functionality in QHMM to manage the storage capacity. You can configure the maximum size and threshold percentage for the QHMM storage bucket. When the threshold is met during file upload or when a cleanup scheduler runs (default every 24 hours), older data is deleted.

watsonx.data on IBM Software Hub

Procedure

  1. Log in to the Red Hat OpenShift cluster by using one of the following options:
    1. Run the following command to log in to the cluster by providing a username and password:
      ibm-lakehouse-manage login-to-ocp \
      --user=${OCP_USERNAME} \
      --password=${OCP_PASSWORD} \
      --server=${OCP_URL}
    2. Run the following command to log in to the cluster by providing a token:
      ibm-lakehouse-manage login-to-ocp \
      --server=${OCP_URL} \
      --token=${OCP_TOKEN}
  2. Set the project by using the following command:
    oc project <PROJECT_CPD_INST_OPERANDS>
  3. Run the following command to list the config maps related to Presto.
    oc get cm |grep presto-config-cm
  4. Configure the following environment variables in the presto-config-cm config map file to enable file pruning for QHMM.
    • QHMM_BUCKET_MAX_USAGE_LIMIT: Maximum capacity of the storage in MB (default:10240 MB).
    • QHMM_RECORD_PRUNE_FREQUENCY_HRS: Frequency in hours at which the scheduler runs to prune data (default: 24 hours).
    • QHMM_RECORD_EXPIRY_DAYS: Record expiry time in days for deleting records from COS (default: 30 days).
    • QHMM_RECORD_PRUNE_THRESHOLD: Threshold at which QHMM triggers pruning or issues a warning to the user when the capacity is reached in percentage (default:80%).
    • ENABLE_QHMM_PRUNE: If QHMM pruning is enabled (default: false) or not.
  5. Run the following command to enable file pruning for QHMM. Replace the <engine-id> with the ID of the engine in which you want to disable the QHMM.
    oc edit ibm-lh-lakehouse-<engine-id>-presto-config-cm
  6. Ensure that the value of ENABLE_QHMM flag is changed to false and save the configuration map.
  7. Delete the Presto coordinator or worker pod to apply the updated environment configuration.
    oc delete pod <presto Coordinator/worker pod name>