Watson Studio Local - fluentd process killed by OOM killer

Troubleshooting

Problem

The fluentd process in the pods gets killed by the Linux OOM killer. Problem can't be resolved by increasing the amount of RAM of the worker nodes. When the OOM occurs, the entire node crashes.

Below is a sample of the dmesg messages that can be found:

[Thu Jul 16 06:30:54 2020] Memory cgroup out of memory: Kill process 34835 (fluentd) score 1000 or sacrifice child
[Thu Jul 16 06:30:54 2020] Killed process 34835 (fluentd) total-vm:103616kB, anon-rss:52992kB, file-rss:10304kB, shmem-rss:0kB
[Thu Jul 16 06:30:54 2020] oom_reaper: reaped process 34835 (fluentd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Resolving The Problem

This kind of message will not be fixed by throwing memory at it. The issue with OOM messages indicates a memory threshold was breached. To correct this the best approach is to increase the memory limit:

kubectl edit ds -n sysibm-adm fluentd-es-ds

And change:

name: fluentd-elasticsearch
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
securityContext:
privileged: true
procMount: Default

resources:
limits:
memory: 400Mi

By default the limit is set to 200Mi. Change it to 400Mi and see if that stops the OOM. If not change it to 800Mi and try again.

After making the above change, the fluentd pods need to be restarted.

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGWL","label":"IBM Watson Studio Local"},"ARM Category":[{"code":"a8m0z000000bmvTAAQ","label":"Admin->Node admin"}],"ARM Case Number":"TS003883207","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Tips

Watson Studio Local - fluentd process killed by OOM killer

Troubleshooting

Problem

Resolving The Problem

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?