OOM Killer

Troubleshooting

Problem

What is an OOM Killer?

When the system is running out of memory, the Kernel selects a process to kill and does so by sending a signal (either a SIGTERM followed by a SIGKILL, or directly a SIGKILL). SIGKILL is the hard OOM killer signal we address in this article. Please note that in the case of an OOM Killer (as opposed to a soft OOM, when the JVM runs out of memory) there will be no heap dump generated.

How to identify that an OOM Killer took place?

If the DSE process stops running and doesn't log any clues in the system.log/debug.log files regarding the reasons why, then check the system error messages (generally in /var/log/messages*) for any signs of an OOM Killer. The line in question would look like this:

Day 01 01:23:45 hostname kernel: Out of memory: Kill process 1234 (java) score 789 or sacrifice child
Day 01 01:23:45 hostname kernel: Killed process 1234, UID 567, (java) total-vm:30122500kB, anon-rss:28457764kB, file-rss:171432kB

A score of 789 means the process was using 78.9% of the memory when it was killed.

The most common command to list the latest OOM Killers is:

grep -i 'killed process' /var/log/messages

How to configure an OOM Killer?

The database nodes can be configured so the system will exhibit certain behaviors when an OOM Killer takes place.

Automatically rebooting the system after an OOM Killer may be desired, instead of waiting for an administrator intervention. A reboot will restore the database if it is configured to automatically start after a reboot. Configuring alerts so an administrator is automatically notified if the DSE node goes down is strongly desired. Finding the root cause of the OOM Killer is important, as it can happen again.

The following settings will cause the system to panic and reboot in an out-of-memory condition. The sysctl commands will set this in real time, and appending the settings to sysctl.conf will allow these settings to survive reboots. The X for kernel.panic is the number of seconds before the system should be rebooted. This setting should be adjusted to meet the needs of your environment.

sysctl vm.panic_on_oom=1
sysctl kernel.panic=X
echo "vm.panic_on_oom=1" >> /etc/sysctl.conf
echo "kernel.panic=X" >> /etc/sysctl.conf

One can also tune the system so a certain process is more/less likely to be killed by an OOM killer. If multiple processes are running on the system (for example DSE and OpsCenter), one might want to select for one process to be killed first over the other.

To make a process less likely to be killed first, run:

echo -15 > /proc/<pid>/oom_adj

To make a process more likely to be killed first, run:

echo 10 > /proc/<pid>/oom_adj

Replace <pid> with the ID of the process desired to be affected.

Common causes of OOM Killer are:

1. The heap size is too large (https://docs.datastax.com/en/dse/6.8/docs/managing/operations/tune-jvm…)

2. The node doesn't have enough memory (https://docs.datastax.com/en/dseplanning/docs/dse-capacity-planning.htm…)

3. Improper settings of the Kernel (https://docs.datastax.com/en/landing_page/doc/landing_page/recommendedS…). In particular the swap and Java Hugepages settings.

4. Memory fragmentation caused in general by either a JAVA bug or a product misconfiguration (like using Native memory allocation (malloc) instead of JEMalloc, JEMalloc results in less memory fragmentation than malloc)

5. Other processes running on the machine are using more memory.

6. An Operating System memory leak bug. Known examples of the supported DSE platforms:

https://bugs.centos.org/view.php?id=14303
https://bugzilla.redhat.com/show_bug.cgi?id=692251

7. Improper manual setting of the MaxDirectMemory (https://docs.datastax.com/en/dse/6.8/docs/managing/operations/change-he…)

8. A possible product issue (for existing product bugs, check the release notes).

Workload specific causes

There are multiple memory objects DSE allocates off-heap: File Cache (Chunk Cache), the Bloom filter and compression offset maps, and Row Cache.

Analytics/Spark workload:
The JVM running DSE will only allocate memory up to the heap size specified and Spark will use 70% of what is left for it's own processing.

As a general statement, for OOM errors on a spark executor, things that sometimes help are increasing parallelism of jobs, increasing spark.storage.memoryFraction, or simply adding RAM.

Search/SOLR workload:
DSE Search will have its own off-heap allocations to maintain the indexes. For instance the filter cache along with the bitsets are stored off-heap.

Check the hard commits on the sole cores: `grep "Executing hard commit on index" system.log`. If they are present, it's worth looking into the filter cache statistics as the filter cache is usually off-heap by default, so find out what the filter cache usage is. One can record filter cache stats as outlined here:

https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/mgmt…

One can get the filter cache sizes via JMX:

nodetool sjk mx -b "solr/wiki.solr:type=dseFilterCache,id=com.datastax.bdp.search.solr.FilterCacheMBean" -f usedBytes -mg
solr/wiki.solr:type=dseFilterCache,id=com.datastax.bdp.search.solr.FilterCacheMBean
0

Where it says `wiki.solr` replace that with the name of the solr core.

Troubleshooting an OOM Killer

Please follow the following common troubleshooting steps for OOM Killers:

1. Check nodetool tablestats for off-heap usage, to find out if a particular table uses a large amount of memory.

2. `nodetool info` shows there are between 7GB and 9.5GB of off-heap objects allocated on the nodes. For example:

Heap Memory (MB) : 15938.48 / 31744.00
Off Heap Memory (MB) : 14337.81

3. Check in the messages logs if there are any pauses or memory pressure (events controlled by /sys/fs/cgroup/memory/.../memory.pressure_level - see https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) prior to the OOM-Killer.

4. Check the score of the DSE process. For example:

Day 01 01:23:45 hostname kernel: Out of memory: Kill process 1234 (java) score 789 or sacrifice child
Day 01 01:23:45 hostname kernel: Killed process 1234, UID 567, (java) total-vm:30122500kB, anon-rss:28457764kB, file-rss:171432kB

A score 789 means the process was using 78.9% of the RAM when it was killed. The higher the score, the more chances this issue is related to capacity planning. The lower the score, the more chances the issue is caused by memory fragmentation or a sub-optimal kernel setting. (For more details see OOM Killer common causes)

5. Check if there are large partitions being compacted prior to the OOM Killer, as they will have to be loaded into memory and this can push up the JVM heap usage.

6. Check in the system.log if there were large GC pauses (over 500ms) prior to the OOM Killer (such pauses could indicate an overloaded system and require further troubleshooting), for example:

2020-01-00 01:23:34,001 GCInspector.java:282 - G1 Old Generation GC in 25206ms. G1 Old Gen: 4955501376 -> 43558558312; 
2020-01-00 01:23:46,001 GCInspector.java:282 - G1 Old Generation GC in 1511ms. G1 Old Gen: 48438337984 -> 47990282352; 
2020-01-00 01:23:57,001 GCInspector.java:282 - G1 Old Generation GC in 7980ms. G1 Old Gen: 48447153264 -> 48138959480;

7. Check if the process memory usage is growing over time by monitoring the `free -m` and/or sssd_nss over a 24/48 hour period (you can use a command like`ps -C sssd_nss -o size,pid,user,args`).
You could include the DSE process too:

ps -C java -o size,pid,user,args
cat oom | cut -d\] -f2,3 | sort -k4 -nr | awk '{rrss=$4*4096; total+=rrss; print $0, "Real RSS = " rrss} END {print "TOTAL: " total}

8. You can monitor total memory usage on the node for all processes (ps, top or dstat), and DSE process over time, if you identify the DSE process is growing, generate a heap dump (https://support.datastax.com/s/article/Generating-and-Analyzing-Heap-Du…) for further investigation. For example, you can add the following script to cron on all nodes to run every fifteen minutes:

date >> /tmp/ps-dse.out; ps -eo pid,cmd,%mem,%cpu --sort=-%mem | head >> /tmp/ps-dse-NODE-IP_ADDRESS.out

The output will look something like this:

Thu Jan 01 00:00:00 EST 2020
 PID CMD %MEM %CPU
18416 /usr/lib/chromium-browser/c 5.9 2.3
 6781 /usr/lib/firefox/firefox -c 5.6 9.6
32759 /usr/local/idea/jre64/bin/j 5.2 4.3
 7235 /usr/lib/firefox/firefox -c 4.7 36.7
29528 /usr/lib/slack/slack --type 1.9 8.3

Each run will only be ~ 470 bytes or about 45kb per day. When you set up the cron job make sure you use >> to redirect the output of each command to a file, instead of >. Using a single > would overwrite the file each time, but using double >> will append to the existing file, which is what we want to so we can capture a historical record.
Additionally one can use OpsCenter to monitor the Memory Usage bar graphs that indicate System, Heap and In-Memory Usage (https://docs.datastax.com/en/opscenter/6.7/opsc/online_help/opscViewInM…).

9. There are a number of other tools available for monitoring memory and system performance for investigating issues of this nature. Tools such as sar (System Activity Reporter) and dtrace (Dynamic Tracing) are quite useful for collecting specific data about system performance over time. For even more visibility, the dtrace stability and data stability probes even have a trigger for OOM conditions that will fire if the kernel kills a process due to an OOM condition.

10. If one uses the Oracle JDK, one can set up the Oracle Java Mission control "Flight recorder" to record the JVM activity when the problem is occurring. The resulting jar file can be investigated. Details about the Flight recorder can be found here: http://www.oracle.com/technetwork/java/javaseproducts/mission-control/j…

You will need JDK installed on the machine you are running the mission controller on ( this does not need to be a apache cassandra node, it could be your workstation ).

The website has links detailing how to run it but a summary of steps from an Apache Cassandra perspective would be:

As well as enabling jmx, you need to alter the `/jre_install_location/lib/management/jmxremote.access` file to enable flight recorder to run:

1 - add the following to the previously configured cassandra user so it looks like this:
 cassandra readwrite \
 create com.sun.management.*,com.oracle.jrockit.* \
 unregister
2 - Add the following lines into the cassandra-env.sh
# Enable flight recorder
JVM_OPTS="$JVM_OPTS -XX:+UnlockCommercialFeatures -XX:+FlightRecorder"
3 - restart DSE
4 - open up java mission control, in a terminal type
<java home>/jmc
5 - Select the left hand pane, right click and choose "Add new connection"
6 - Enter the IP, port and same user / password you configure for JMX and click finish
7 - Once the new connection shows up in the left hand pane you can click to start JMX monitoring or flight recording

As an FYI, the overhead on performance is cited at 1% by Oracle.

Last Reviewed Date: 2023/11/14

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCR56","label":"IBM DataStax Enterprise"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka06R000000Hc9BQAS

Was this topic helpful?

Document Information

Modified date:
30 January 2026

UID

ibm17258877

Tips