Troubleshooting
Problem
A pod is reporting OOMKilled status, and restarting until marked as CrashLoopBackOff status, in Cloud Pak for Security (CP4S).
Symptom
Node keeps cycling between a "Running" and "CrashLoopBackoff" state. The pod enters an OOMKill (out-of-memory kill) state before the pod enters a CrashLoopBackoff state.
A review of a failed pod details and logs might show OOMKilled in one of the following ways:
oc get pods -w | grep -i OOMKilled
storage-ingestion-pipeline-AA-BBBBBBBB-CCCCC 0/1 OOMKilled 2 (45s ago) 2m23s
oc logs -f storage-ingestion-pipeline-AA-BBBBBBBB-CCCCC
{"label":"c.i.s.q.c.s.r.InMemoryCache","level":"error","ibm_datetime":"<DATE>","thread_name":"vert.x-eventloop-thread-0","message":"Schema id not found for schema","error":{"stack":""}}
{"label":"o.a.k.c.c.ConsumerConfig","level":"warn","ibm_datetime":"<DATE>","thread_name":"vert.x-eventloop-thread-5","message":"The configuration 'apicurio.registry.as-confluent' was supplied but isn't a known config."}
{"label":"o.a.k.c.u.AppInfoParser","level":"warn","ibm_datetime":"<DATE>","thread_name":"vert.x-eventloop-thread-5","message":"Error registering AppInfo mbean"}
oc describe pod tis-AAA-BBBBBBBBBB-CCCCC
State: Running
Started: <DATE>
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: <DATE>
Finished: <DATE>
cp4s-01 cp4s-telemetry-operator-AAAAAAAAAA-BBBBB 0/1 CreateContainerError 621 4d20h
State: Waiting
Reason: CreateContainerError
Last State: Terminated
Reason: OOMKilled
Warning FailedCreatePodSandBox 5m47s (x2018 over 5h19m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = signal: killed
Cause
This OOMKilled error can be caused by a number of issues, such as:
- Pod configuration is wrong
- Pod is missing dependents, such as secrets
- Pod is consuming more memory than its resource limits allow
Resolving The Problem
Note: The following instructions are for Cloud Pak for Security (CP4S) apps and not SOAR Cases, core services, third-party apps, or custom apps.
- Verify the pod is scheduled to have enough resources.
- Open Red Hat OpenShift web user-interface (UI).
- From Developer view, select Project.
- Under Inventory, select Pods
- Select the pod receiving OOMKill status.
- Select Metrics tab.
- Verify, from the graphs, the pod is using too many resources.
- If over committed, increase pod memory setting by selecting Details tab.
- Under Owner section, select the replica set.
- Under Owner section, select the deployment config.
- Select YAML tab.
- Search for the resource you need to increase.
For example, increase request to 1 gigabyte and increase limit to 2 gigabytes:containers: - resources: limits: cpu: '10' memory: 2000Mi requests: cpu: 250m memory: 1000Mi
- Select Save.
- If the pod is not over committed on configured resources, check that the node is not over committed.
- Select Details tab.
- Under Node section, select the node.
- Verify no resource error messages are displayed:
This node’s CPU resources are overcommitted. The total CPU resource limit of all pods exceeds the node’s total capacity. Pod performance will be throttled under high load. This node’s memory resources are overcommitted. The total memory resource limit of all pods exceeds the node’s total capacity. Pods will be terminated under high load.
- If node is over committed, reschedule the pod to a more resource healthy node. If assistance is needed with scheduling, work with your Red Hat OpenShift Administrator.
If you continue to exhibit OOMKill status for supported out of the box functionality, or supported apps, after contact support.
Related Information
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTDPP","label":"IBM Cloud Pak for Security"},"ARM Category":[{"code":"a8m0z0000001h8uAAA","label":"Install or Upgrade"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
11 January 2023
UID
ibm16854285