Monitoring
Prometheus pod crashes - OOMKilled
Symptoms
The Prometheus pod remains in crashing status. When you issue the following command to get pod details, the Prometheus container terminates with reason, OOMKilled.
kubectl describe po prometheus-monitoring-prometheus-0 -n kube-system
Causes
The problem is that Prometheus is experiencing a high workload. High volumes of metrics data require more memory than what is currently available on the Prometheus container.
Resolving the problem
The following options can resolve this problem.
-
You can reduce the existing Prometheus workload. Reduce the scrape frequency by decreasing the value of the scrape_Interval
-
You can increase the memory limits on the Prometheus container.
-
Your Prometheus container continues to crash and no error messages appear in the logs. This situation might indicate that too much data remains in the
/prometheus/prometheus-db/data/wal pathinside your Prometheus container. To work around this problem, delete the/walfolder from the volume.
Note: Deleting the /wal folder can lead to data loss.