Configuring a local staging directory

You can create a local staging directory to precisely control the temporary storage. With a local staging directory, you can optimize the use of resources during data operations.

Local Staging Directory (hive.s3.staging-directory) is primarily used to avoid pod crash filling up when you run large operations (like CTAS operations) in the Query workspace. It is used to specify a temporary local storage for the hive. During such large operations, intermediate result set is stored in hive.s3.staging-directory before it is written or read out to S3. By default, hive.s3.staging-directory is set to java.io.tmpdir that is /tmp mounted on /filesystem in the worker pods. During large operations, it can fill up and this might lead to a pod restart. So, configuring local staging directory to appropriate storage is important.

Procedure

  1. Create a PV. Use the localStorageProvisioner and complete the following steps.
    Note: This is an optional step for setting a storage class for the local staging directory.
    1. Create a yaml file.
      vi pv1.yaml
    2. Copy the following content to yaml file and save the file.
      Note: The values for name, storage, storageClassName, path, and nodeAffinity are based on the client requirement. For example, in name: presto-staging-storage-pv1 in the following configuration format, presto-staging-storage-pv1 is customizable. The path must be same in all the PVs.
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        name: presto-staging-storage-pv1
      spec:
        capacity:
          storage: 100Gi
        volumeMode: Filesystem
        accessModes:
        - ReadWriteOnce
        persistentVolumeReclaimPolicy: Delete
        storageClassName: staging-storage
        local:
          path: /dev/stagingStorage
        nodeAffinity:
          required:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - worker0.o1-380113.cp.fyre.ibm.com
                - worker1.o1-380113.cp.fyre.ibm.com
                - worker2.o1-380113.cp.fyre.ibm.com
      
    3. To get the values field under nodeAffinity, run the following command. Use the name of the worker nodes that have the disk space available to mount the staging directory of Presto (Java) pods.
      oc get nodes
    4. Access the debugging session on the chosen worker node and create the necessary directory structure.
      oc debug node/<name of node> -- chroot /host mkdir -p <path used under local, /dev/stagingStorage>
      
      

      Repeat the steps c and d for all the selected nodes.

    5. Run the following command to apply the PV configurations.
      oc apply -f pv1.yaml
    6. Provision more PVs based on the t-shirt sizing. If there are three Presto (Java) pods, create pv2.yaml, and pv3.yaml For PV2, use the name as presto-staging-storage-pv2. For PV3, use name presto-staging-storage-pv3.
    7. Create two or more PVs based on the requirements.
      oc apply -f pv2.yaml pv3.yaml
  2. Set up local staging directory.
    1. Set the namespace in the console.
      oc project ${PROJECT_CPD_INST_OPERANDS}
    2. Determine which Presto (Java) engine you want to update.
      oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
      
      Example:
      oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
        DISPLAY NAME   ENGINE ID
        Presto (Java)         presto750
        presto-01      presto-01
      
    3. Add the cache configuration under the spec section of engine configuration.
      hive_s3_staging_directory_enabled: true
      s3StagingStorageClass: staging-storage
      s3StagingStorageSize: 2Gi

      Use oc patch tool to add the properties into the spec:

      oc patch wxdengine/lakehouse-presto-01 \
       --type=merge \
       -n cpd-instance \
        -p '{ "spec": { "hive_s3_staging_directory_enabled": "true",   "s3StagingStorageClass": "staging-storage", "s3StagingStorageSize": "2Gi" } }'

      To enable a local staging directory, set hive_s3_staging_directory_enabled as true.

      The default values for s3StagingStorageClass and s3StagingStorageSize are as specified.

      Ensure that the storageClassName matches with the s3StagingStorageClass.

  3. Remove the local staging directory.
    1. Determine the Presto (Java) instance from which you want to remove the local staging directory.
      oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
      Example:
      oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
        DISPLAY NAME   ENGINE ID
        Presto (Java)         presto750
        presto-01      presto-01
    2. Remove the local staging directory configurations.
      oc patch wxdengine/<engine_name> --type='json' -p='[{"op": "remove", "path": "/spec/hive_s3_staging_directory_enabled"}, {"op": "remove", "path": "/spec/s3StagingStorageSize"}, {"op": "remove", "path": "/spec/s3StagingStorageClass"}]'