Configuring CLU cleanup scheduler

Conversational Language Understanding (CLU) is a collection of services that supports both training and inference processes within the watsonx Assistant system. When an assistant is created, CLU logs a corresponding entry in its database. If this entry is not properly removed during an assistant deletion request, it results in a zombie record. This CLU cleanup scheduler is designed to identify and clean up such orphaned records.

Permissions you need for these tasks:
You must be an administrator of the Red Hat® OpenShift® project to manage the cluster.

Updating the environment variables

The CLU cleanup scheduler is enabled by default, with the necessary environment variables preconfigured within the script. To increase the number of zombie records targeted for deletion, or to resolve any operational issues, run the following commands:
  1. Export your assistant namespace.
    export PROJECT_CPD_INST_OPERANDS=<namespace where Assistant is installed>
  2. Export the instance.
    export INSTANCE=`oc get wa -n ${PROJECT_CPD_INST_OPERANDS} |grep -v NAME| awk '{print $1}'`
  3. Set up the cron schedule.
    Important: You must set the time in UTC time zone. It is recommended to schedule the CLU_CLEANUP_CRON_SCHEDULE during off-peak hours.
    You have the flexibility to define the cron schedule according to your requirements. For more details on the allowed values and special characters in CLU_CLEANUP_CRON_SCHEDULE, refer to Cron expressions.
    export CLU_CLEANUP_CRON_SCHEDULE="0 0 23 * * ?"
  4. To specify the daily limit for zombie workspace deletion, use:
    export NUM_OF_WORKSPACES_TO_DELETE=600 // set your preferred number
  5. To handle weekend zombie workspace deletion, you can configure a higher deletion count. For example, double the NUM_OF_WORKSPACES_TO_DELETE value.
    export NUM_OF_WORKSPACES_TO_DELETE_HIGH_RATE=1200

Scaling the CLU scheduler

Use the following script to update the environment variables, memory settings, and to update the automated cleanup of zombie workspace:

cat <<EOF | oc apply -f -
apiVersion: assistant.watson.ibm.com/v1
kind: TemporaryPatch
metadata:
  name: ${INSTANCE}-store-admin-clu-cleanup-env-vars
  namespace: ${PROJECT_CPD_INST_OPERANDS}
spec:
  apiVersion: assistant.watson.ibm.com/v1
  kind: WatsonAssistantStore
  name: ${INSTANCE}
  patchType: patchStrategicMerge
  patch:
    store-admin:
      deployment:
        spec:
          template:
            spec:
              containers:
              - name: store-admin
                env:
                - name: CLU_CLEAN_UP
                  value: "true"
                - name: NUM_OF_WORKSPACES_TO_DELETE
                  value: "${NUM_OF_WORKSPACES_TO_DELETE}"
                - name: NUM_OF_WORKSPACES_TO_DELETE_HIGH_RATE
                  value: "${NUM_OF_WORKSPACES_TO_DELETE_HIGH_RATE}"
                - name: CLU_CLEAN_UP_CRON_SCHEDULE 
                  value: "${CLU_CLEANUP_CRON_SCHEDULE}"
                - name: JAVA_MAXHEAP_SIZE 
                  value: "2300m"
                resources:
                  limits:
                    memory: 3Gi
                  requests:
                    memory: 3Gi
EOF

Resolving the issues seen in the logs

If you see any issues in the store-admin service logs even after changing the above values, increase the JAVA_MAXHEAP_SIZE along with the limits: memory and requests: memory. Best practice is to raise them by approximately 25%.