Pods crash due to memory limit
Pods crash due to insufficient memory allotted to the resources.
If the pods crash during IBM® Cloud Private installation, the installation fails. Sometimes, the pods might crash after the installation is complete.
Symptoms
Pods crash with the error OOMKilled
.
Causes
Kubernetes resources such as DaemonSets, Deployments, and StatefulSets, are defined with memory limits. In some environments, the memory limits that are set might not be sufficient. As a result, the pods crash.
Resolving the problem
-
If the pods crash during installation, complete these steps:
-
Identify the components that are failing. If the maximum retries to install a component are complete, and the retry starts again, it is possible that the component encountered the
OOMKilled
error. You might see an installation status similar to the following status:TASK [waitfor : Waiting for MariaDB to start] ********************************** FAILED - RETRYING: Waiting for MariaDB to start (100 retries left). FAILED - RETRYING: Waiting for MariaDB to start (99 retries left). FAILED - RETRYING: Waiting for MariaDB to start (98 retries left). : FAILED - RETRYING: Waiting for MariaDB to start (2 retries left). FAILED - RETRYING: Waiting for MariaDB to start (1 retries left).
-
Update the memory limit for such components by editing the
config.yaml
file. For example, to update theMariaDB
component, change the following section in the `config.yaml file:mariadb: mariadb: resources: limits: cpu: 1000m memory: 512Mi requests: cpu: 500m memory: 128Mi
-
Retry the installation.
-
-
If the pods crash after installation, complete these steps:
-
Ensure that kubectl CLI is set up. See Accessing your cluster from the kubectl CLI.
-
To identify the pods that are crashing, run the following command.
kubectl get pods -n kube-system | grep -i crash
To see the error message, run the following command:
kubectl describe pod <pod_name> -n kube-system
Following is an example output:
containerStatuses: - containerID: docker://<CONTAINER_ID> image: <IMAGE_NAME> imageID: docker-pullable://<IMAGE_NAME_SHA> lastState: terminated: containerID: docker://<CONTAINER_ID> exitCode: 137 finishedAt: 2018-08-28T03:20:28Z reason: OOMKilled startedAt: 2018-08-28T03:15:03Z
-
Edit the kubernetes resource that is crashing.
kubectl edit <resource_type> <name> -n kube-system
Where,
<resource_type>
is a daemonset, deployment, or statefulset.Following is an example command:
kubectl edit statefulset mariadb -n kube-system
-
Increase the memory limit of the resource by editing the
resources > limits > memory
parameter.resources: limits: cpu: "1" memory: 512Mi
-