Configuring ephemeral storage for runtime definitions

Ephemeral storage is node-local temporary storage (container-writable layer, emptyDir volumes) that is used by runtime pods for scratch data, unpacked models, staging, and logs. Insufficient ephemeral storage can cause pod eviction or job failures. Platform admins can centrally configure runtime-specific ephemeral storage requests and limits, ensuring predictable scheduling and safer capacity planning.

Before you begin

You must have instance admin access to the IBM® Software Hub cluster
You must have access to the WmlBase CR namespace (typically, the watsonx.ai™ project namespace)
Verify that the watsonx.ai operator is installed and healthy.

About this task

You can configure ephemeral storage for each runtime type in the WmlBase CR, in the spec.runtimeresources field. Each runtime key has request and limit values for ephemeral storage.

Request: The amount of ephemeral storage that is guaranteed for a pod. It is used for scheduling decisions.
Limit: The maximum ephemeral storage that the pod can use. Exceeding the limit can trigger termination or eviction.

The values must be in Kubernetes quantity formats (for example, 8Gi, 12Gi, 10240Mi).

Changing ephemeral storage requests and limits:

Prevents runtime pods from being evicted under disk pressure
Improves utilization by right-sizing storage per workload

You can control:

Per-runtime requests and limits of ephemeral storage
Cluster-wide behavior through a single CR update

Note: Values that are set in the CR do not get reverted during an upgrade.

Runtime resource types and their respective runtime definition json files

autoai:

auto_ai.kb-server.json
auto_ai.ts-server.json

autorag:

auto_rag-server.json

autoai_runtime:

autoai-ts_rt25.1-py3.12-server.json
autoai-kb_rt25.1-py3.12-server.json

genai:

genai-A25-py3.12-server.json

pmml:

pmml-3.0_4.3-multi-server.json

runtime_251:

runtime-25.1-py3.12-server.json
runtime-25.1-py3.12-cuda-server.json
runtime-25.1-py3.12-multi-server.json
runtime-25.1-r4.4-server.json

spark_mllib:

spark-mllib_3.4-multi-server.json
spark-mllib_3.5-multi-server.json

spss_modeler:

spss-modeler_online-server.json
spss-modeler_batch-server.json

tensorflow_rt251:

tensorflow_rt25.1-py3.12-server.json
tensorflow_rt25.1-py3.12-dist-server.json
tensorflow_rt25.1-py3.12-edt-server.json

training_job:

training-job-server.json
training-job-go-server.json
training-job-with-restart-server.json

wml_hpo_job:

wml-hpo-job-server.json

wml_rshiny:

wml-rshiny-server.json
wml-rshiny-rstudio-25.1-r4.4-server.json

Procedure

Back up the current CR:

oc get wmlbase -n <your-namespace> -o yaml > wmlbase-backup.yaml

Edit the WmlBase CR:
```
oc edit wmlbase -n <your-namespace>
```
Add the spec.runtimeresources.<runtime>.ephemeral block field or update it with desired request and limit values.
Apply the changes and wait for the operator to reconcile the resource.
Observe rollout and confirm that no node-level DiskPressure events appear while workloads run.

Rolling back the changes

If you need to revert the ephemeral storage configuration changes:

Run this command:
```
oc apply -f wmlbase-backup.yaml
```
Confirm that the operator reconciles and the pod restarts successfully.
Verify that the previous request and limit values are reverted in the pods.

Example

Example: contents of an example wmlbase yaml file after patching

apiVersion: wml.cpd.ibm.com/v1beta1
kind: WmlBase
metadata:
  annotations:
    meta.helm.sh/release-name: wml
    meta.helm.sh/release-namespace: ins-540
  creationTimestamp: "2026-04-15T10:08:29Z"
  finalizers:
  - wml.cpd.ibm.com/finalizer
  generation: 3
  labels:
    app.kubernetes.io/managed-by: Helm
    component-id: wml
  name: wml-cr
  namespace: ins-540
  resourceVersion: "43732421"
  uid: 769d0179-6a34-4eef-aa76-585ce5c3e10f
spec:
  blockStorageClass: managed-nfs-storage
  docker_registry_namespace_cpd: cp/cpd
  docker_registry_prefix: cp.stg.icr.io
  fileStorageClass: managed-nfs-storage
  ignoreForMaintenance: false
  imagePullSecret: ibm-entitlement-key
  runtimeresources:
    autoai:
      ephemeral:
        request: 50
        limit: 550

    autorag:
      ephemeral:
        request: 50
        limit: 1000

    autoai_runtime:
      ephemeral:
        request: 100
        limit: 2000

    genai:
      ephemeral:
        request: 100
        limit: 2000

    pmml:
      ephemeral:
        request: 100
        limit: 2000

    pytorch_onnx:
      ephemeral:
        request: 100
        limit: 2000

    runtime_251:
      ephemeral:
        request: 100
        limit: 2000

    spark_mllib:
      ephemeral:
        request: 100
        limit: 2000

    spss_modeler:
      ephemeral:
        request: 100
        limit: 2000

    tensorflow_rt251:
      ephemeral:
        request: 100
        limit: 2000

    training_job:
      ephemeral:
        request: 50
        limit: 550

    wml_hpo_job:
      ephemeral:
        request: 50
        limit: 550

    wml_rshiny:
      ephemeral:
        request: 100
        limit: 2000
  license:
    accept: true
    license: Enterprise
  non_olm_deploy: true
  version: 5.4.0
status:
  buildNumber: 67
  conditions:
  - ansibleResult:
      changed: 35
      completion: "2026-04-24T08:23:42.49423+00:00"
      failures: 0
      ok: 452
      skipped: 210
    lastTransitionTime: "2026-04-15T10:08:36Z"
    message: Awaiting next reconciliation
    reason: Successful
    status: "True"
    type: Running

What to do next

After configuring ephemeral storage, monitor your runtime pods to ensure they are not being evicted due to disk pressure. You can verify the configuration by checking the pod specifications to confirm that the ephemeral storage requests and limits match your settings