Configuring ephemeral storage for runtime definitions
Ephemeral storage is node-local temporary storage (container-writable layer, emptyDir volumes) that is used by runtime pods for scratch data, unpacked models, staging, and logs. Insufficient ephemeral storage can cause pod eviction or job failures. Platform admins can centrally configure runtime-specific ephemeral storage requests and limits, ensuring predictable scheduling and safer capacity planning.
Before you begin
- You must have instance admin access to the IBM® Software Hub cluster
- You must have access to the
WmlBaseCR namespace (typically, the watsonx.ai™ project namespace) - Verify that the watsonx.ai operator is installed and healthy.
About this task
You can configure ephemeral storage for each runtime type in the WmlBase CR, in the spec.runtimeresources field. Each runtime key has request and limit values for ephemeral storage.
- Request
- The amount of ephemeral storage that is guaranteed for a pod. It is used for scheduling decisions.
- Limit
- The maximum ephemeral storage that the pod can use. Exceeding the limit can trigger termination or eviction.
The values must be in Kubernetes quantity formats (for example, 8Gi, 12Gi, 10240Mi).
Changing ephemeral storage requests and limits:
- Prevents runtime pods from being evicted under disk pressure
- Improves utilization by right-sizing storage per workload
You can control:
- Per-runtime requests and limits of ephemeral storage
- Cluster-wide behavior through a single CR update
Runtime resource types and their respective runtime definition json files
autoai:
auto_ai.kb-server.jsonauto_ai.ts-server.json
autorag:
auto_rag-server.json
autoai_runtime:
autoai-ts_rt25.1-py3.12-server.jsonautoai-kb_rt25.1-py3.12-server.json
genai:
genai-A25-py3.12-server.json
pmml:
pmml-3.0_4.3-multi-server.json
runtime_251:
runtime-25.1-py3.12-server.jsonruntime-25.1-py3.12-cuda-server.jsonruntime-25.1-py3.12-multi-server.jsonruntime-25.1-r4.4-server.json
spark_mllib:
spark-mllib_3.4-multi-server.jsonspark-mllib_3.5-multi-server.json
spss_modeler:
spss-modeler_online-server.jsonspss-modeler_batch-server.json
tensorflow_rt251:
tensorflow_rt25.1-py3.12-server.jsontensorflow_rt25.1-py3.12-dist-server.jsontensorflow_rt25.1-py3.12-edt-server.json
training_job:
training-job-server.jsontraining-job-go-server.jsontraining-job-with-restart-server.json
wml_hpo_job:
wml-hpo-job-server.json
wml_rshiny:
wml-rshiny-server.jsonwml-rshiny-rstudio-25.1-r4.4-server.json
Procedure
Rolling back the changes
If you need to revert the ephemeral storage configuration changes:
- Run this command:
oc apply -f wmlbase-backup.yaml - Confirm that the operator reconciles and the pod restarts successfully.
- Verify that the previous request and limit values are reverted in the pods.
Example
Example: contents of an example wmlbase yaml file after patching
apiVersion: wml.cpd.ibm.com/v1beta1
kind: WmlBase
metadata:
annotations:
meta.helm.sh/release-name: wml
meta.helm.sh/release-namespace: ins-540
creationTimestamp: "2026-04-15T10:08:29Z"
finalizers:
- wml.cpd.ibm.com/finalizer
generation: 3
labels:
app.kubernetes.io/managed-by: Helm
component-id: wml
name: wml-cr
namespace: ins-540
resourceVersion: "43732421"
uid: 769d0179-6a34-4eef-aa76-585ce5c3e10f
spec:
blockStorageClass: managed-nfs-storage
docker_registry_namespace_cpd: cp/cpd
docker_registry_prefix: cp.stg.icr.io
fileStorageClass: managed-nfs-storage
ignoreForMaintenance: false
imagePullSecret: ibm-entitlement-key
runtimeresources:
autoai:
ephemeral:
request: 50
limit: 550
autorag:
ephemeral:
request: 50
limit: 1000
autoai_runtime:
ephemeral:
request: 100
limit: 2000
genai:
ephemeral:
request: 100
limit: 2000
pmml:
ephemeral:
request: 100
limit: 2000
pytorch_onnx:
ephemeral:
request: 100
limit: 2000
runtime_251:
ephemeral:
request: 100
limit: 2000
spark_mllib:
ephemeral:
request: 100
limit: 2000
spss_modeler:
ephemeral:
request: 100
limit: 2000
tensorflow_rt251:
ephemeral:
request: 100
limit: 2000
training_job:
ephemeral:
request: 50
limit: 550
wml_hpo_job:
ephemeral:
request: 50
limit: 550
wml_rshiny:
ephemeral:
request: 100
limit: 2000
license:
accept: true
license: Enterprise
non_olm_deploy: true
version: 5.4.0
status:
buildNumber: 67
conditions:
- ansibleResult:
changed: 35
completion: "2026-04-24T08:23:42.49423+00:00"
failures: 0
ok: 452
skipped: 210
lastTransitionTime: "2026-04-15T10:08:36Z"
message: Awaiting next reconciliation
reason: Successful
status: "True"
type: Running
What to do next
After configuring ephemeral storage, monitor your runtime pods to ensure they are not being evicted due to disk pressure. You can verify the configuration by checking the pod specifications to confirm that the ephemeral storage requests and limits match your settings