Troubleshooting
Problem
In Tiny Milvus deployments used by Watsonx Orchestrate On‑Prem, the Milvus standalone pod can repeatedly enter a CrashLoopBackOff state because the embedded etcd backend reaches its maximum supported database size. Once this condition occurs, Milvus cannot start or operate normally, leading to service disruption.
Symptom
- Milvus standalone pod shows
0/1Ready and enters CrashLoopBackOff - Deployment may report
ProgressDeadlineExceeded - Milvus pod logs show:
panic: etcdserver: mvcc: database space exceeded- etcd pod logs show:
alarm:NOSPACEserving /health false due to an alarmCause
quota-backend-bytes = 2147483648 bytes (~2 GiB)- 2 GB is a hard etcd backend limit for Tiny Milvus
- The limit cannot be increased through supported configuration, PVC expansion, or tuning
- As metadata grows, etcd eventually raises a NOSPACE alarm
- Once the alarm is raised, Milvus cannot complete required metadata operations at startup and exits, causing CrashLoopBackOff
Environment
- IBM watsonx Orchestrate / IBM Lakehouse (On‑Prem)
- Tiny Milvus (standalone deployment)
- Red Hat OpenShift
- CPD instance operands namespace
- Affected pods:
- ibm-lh-lakehouse-wo-milvus-standalone
- ibm-lh-lakehouse-wo-milvus-etcd-0
Diagnosing The Problem
oc rsh ibm-lh-lakehouse-wo-milvus-etcd-0
ETCDCTL_API=3 etcdctl endpoint status --write-out=table
ETCDCTL_API=3 etcdctl endpoint status --write-out=json
ETCDCTL_API=3 etcdctl alarm list
- etcd database size close to or at ~2 GB
- Presence of
alarm:NOSPACE - Milvus standalone pod failing during startup with
database space exceeded
Resolving The Problem
1. Increase ETCD and Standalone Milvus Resources
Ensure sufficient CPU, memory, and ephemeral storage are allocated to support normal operation and background maintenance tasks.
Recommended Resource Settings
Ephemeral Storage: 3 GB
Memory: 3 GB
These allocations ensure enough space and memory for etcd compaction and defragmentation, preventing NOSPACE alarms and write failures.
Apply RSI Patch (Stability Headroom Only) to:
Standalone Milvus pod
etcd pod (Milvus dependency)
Note: RSI patch steps are shared in the last section of this technote
2. Verify ETCD Status
Access the ETCD Pod
oc rsh ibm-lh-lakehouse-wo-milvus-etcd-0 Check ETCD Endpoint Status
ETCDCTL_API=3 etcdctl endpoint status --write-out=table ETCDCTL_API=3 etcdctl endpoint status --write-out=json Verify:
ETCD state is Running
Note the revision number from the output (required for compaction)
3. Compact ETCD
Compaction removes old revisions and reduces logical database size.
ETCDCTL_API=3 etcdctl --dial-timeout=30s --command-timeout=60s compact <REVISION_NUMBER>Replace <REVISION_NUMBER> with the value recorded in the previous step.
4. Defragment ETCD
Defragmentation reclaims physical disk space after compaction.
ETCDCTL_API=3 etcdctl --dial-timeout=30s --command-timeout=60s defrag
If the command fails, retry with increased timeout:
ETCDCTL_API=3 etcdctl --dial-timeout=30s --command-timeout=120s defrag
5. Verify ETCD Health After Maintenance
ETCDCTL_API=3 etcdctl endpoint health ETCDCTL_API=3 etcdctl endpoint status --write-out=table ETCDCTL_API=3 etcdctl endpoint status --write-out=json
6. Disarm ETCD NOSPACE Alarm
If an NOSPACE alarm was triggered earlier, disarm it to restore write operations.
ETCDCTL_API=3 etcdctl alarm disarm
7. Restart Milvus Pods
Restart Milvus to ensure it reconnects cleanly to the recovered etcd state.
oc delete pod ibm-lh-lakehouse-xxxxxxx Replace xxxxxxx with the actual Milvus pod suffix.
Important:
This procedure restores etcd operability, but it does not increase the fixed ~2 GB Tiny Milvus etcd backend limit. Recurrence can only be prevented through strict ingestion control and, preferably, by moving to external knowledge sources supported by watsonx Orchestrate, since the backend limit cannot be expanded. Stop uploads when usage approaches ~90%.
Reaching out to Support
The 2GB limit for Etcd quota-backend-bytes is supposed to work with large number of collections and vectors.
Attached is a script to record the metrics. If you need to reach out to support run this script.
get_milvus_usage_stats.py__3.txt
Run below command against a wo-conversation-controller pod:
cat get_milvus_usage_stats.py | oc exec -i wo-conversation-controller-xxxxxxxxxx-xxxxx -- python3 - --output-format csv > milvus_stats.csv
Recommendation:
Use External Knowledge Sources. Tiny Milvus is intended for experimentation and early‑stage usage, not sustained production growth. Within its fixed 2 GB limit, compaction and defragmentation are the only supported ways to reclaim the space.
Related Information
Document Location
Worldwide
Product Synonym
Watson Orchestrate
Was this topic helpful?
Document Information
Modified date:
15 May 2026
UID
ibm17271869