You can create a local staging directory to precisely control the temporary storage. With
a local staging directory, you can optimize the use of resources during data
operations.
Local Staging Directory (hive.s3.staging-directory) is primarily used to avoid pod crash filling
up when you run large operations (like CTAS operations) in the Query workspace. It is used to
specify a temporary local storage for the hive. During such large operations, intermediate result
set is stored in hive.s3.staging-directory before it is written or read out to S3. By default,
hive.s3.staging-directory is set to java.io.tmpdir that is /tmp
mounted on /filesystem in the worker pods. During large operations, it can fill up
and this might lead to a pod restart. So, configuring local staging directory to appropriate storage
is important.
Procedure
-
Create a PV. Use the
localStorageProvisioner and complete the following
steps.
Note: This is an optional step for setting a storage class for the local staging directory.
- Create a
yaml file.
- Copy the following content to
yaml file and save the file.
Note: The values for name, storage, storageClassName, path, and
nodeAffinity are based on the client requirement. For example, in name:
presto-staging-storage-pv1 in the following configuration format,
presto-staging-storage-pv1 is customizable. The path must be same in all the
PVs.
apiVersion: v1
kind: PersistentVolume
metadata:
name: presto-staging-storage-pv1
spec:
capacity:
storage: 100Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: staging-storage
local:
path: /dev/stagingStorage
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker0.o1-380113.cp.fyre.ibm.com
- worker1.o1-380113.cp.fyre.ibm.com
- worker2.o1-380113.cp.fyre.ibm.com
- To get the values field under
nodeAffinity, run the following
command. Use the name of the worker nodes that have the disk space available to mount the staging
directory of Presto (Java) pods.
- Access the debugging session on the chosen worker node and create the necessary
directory structure.
oc debug node/<name of node> -- chroot /host mkdir -p <path used under local, /dev/stagingStorage>
Repeat the steps c and d for all the selected nodes.
- Run the following command to apply the PV configurations.
- Provision more PVs based on the t-shirt sizing. If there are three Presto (Java) pods, create
pv2.yaml, and pv3.yaml For PV2, use the name as
presto-staging-storage-pv2. For PV3, use name
presto-staging-storage-pv3.
- Create two or more PVs based on the requirements.
oc apply -f pv2.yaml pv3.yaml
- Set up local staging directory.
- Set the namespace in the console.
oc project ${PROJECT_CPD_INST_OPERANDS}
- Determine which Presto (Java) engine you want to update.
oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
Example:
oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
DISPLAY NAME ENGINE ID
Presto (Java) presto750
presto-01 presto-01
- Add the cache configuration under the spec section of engine configuration.
hive_s3_staging_directory_enabled: true
s3StagingStorageClass: staging-storage
s3StagingStorageSize: 2Gi
Use oc patch tool to add the properties into the spec:
oc patch wxdengine/lakehouse-presto-01 \
--type=merge \
-n cpd-instance \
-p '{ "spec": { "hive_s3_staging_directory_enabled": "true", "s3StagingStorageClass": "staging-storage", "s3StagingStorageSize": "2Gi" } }'
To enable a local staging directory, set
hive_s3_staging_directory_enabled as true.
The default values for s3StagingStorageClass and
s3StagingStorageSize are as specified.
Ensure that the storageClassName matches with the
s3StagingStorageClass.
- Remove the local staging directory.
- Determine the Presto (Java) instance from which you want to remove the local staging
directory.
oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
Example:
oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
DISPLAY NAME ENGINE ID
Presto (Java) presto750
presto-01 presto-01
- Remove the local staging directory configurations.
oc patch wxdengine/<engine_name> --type='json' -p='[{"op": "remove", "path": "/spec/hive_s3_staging_directory_enabled"}, {"op": "remove", "path": "/spec/s3StagingStorageSize"}, {"op": "remove", "path": "/spec/s3StagingStorageClass"}]'