Troubleshooting
Problem
In API Connect OVA deployments, Kubernetes may fail to start properly due to a ResourceExhausted error from the container runtime. This error occurs when the container runtime (containerd) attempts to process an excessively large number of containers, resulting in a gRPC message size exceeding the allowed limit.
The error message observed is:
rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (16835417 vs. 16777216)
This condition typically arises when thousands of paused containers accumulate on the portal VM, overwhelming the container runtime.
Diagnosing The Problem
- Access the Portal VM
ssh <portal-node> sudo -i - Run
crictl podscrictl pods Example: root@subinvm1:~# crictl pods E0908 17:58:44.529175 22655 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (16835417 vs. 16777216)" filter="&PodSandboxFilter{Id:,State:nil,LabelSelector:map[string]string{},}" FATA[0000] listing pod sandboxes: rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (16835417 vs. 16777216)If the command fails with a
ResourceExhaustederror, proceed to the next step. - Check kubelet logs
journalctl -u kubelet.service --since today Example: root@subinvm1:~# journalctl -u kubelet.service --since today E0904 14:47:48.127461 3664 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (16835417 vs. 16777216)" filter="nil" - List containers sorted by image
ctr -n k8s.io containers ls | awk 'NR==1{print; next} {print $0 | "sort -k2"}' Example output : CONTAINER IMAGE RUNTIME 0005065400712b61b6aba3f1827f535b643dc3ce85a540e98329a9c4acda7c1d registry.k8s.io/pause:3.6 io.containerd.runc.v2You may observe tens of thousands of containers using the image
registry.k8s.io/pause:3.6.
Resolving The Problem
Please ensure you have proper backups. For more information about backups click here
Step 1: Clean Up Excess Paused Containers
Please ensure you have proper backups. For more information about backups click here
Create a cleanup script to remove the paused containers:
#!/bin/bash
# List all containers and filter those with image registry.k8s.io/pause:3.6
ctr -n k8s.io containers ls | grep "registry.k8s.io/pause:3.6" | awk '{print $1}' | while read -r CONTAINER_ID; do
echo "
Processing container: $CONTAINER_ID" | tee -a deletion.log
# Try to kill the task (if running)
if ctr -n k8s.io tasks kill "$CONTAINER_ID" 2>>deletion.log; then
echo "Task killed for container $CONTAINER_ID" | tee -a deletion.log
else
echo "No running task to kill for container $CONTAINER_ID" | tee -a deletion.log
fi
# Try to delete the task (if exists)
if ctr -n k8s.io tasks delete "$CONTAINER_ID" 2>>deletion.log; then
echo "Task deleted for container $CONTAINER_ID" | tee -a deletion.log
else
echo "No task to delete for container $CONTAINER_ID" | tee -a deletion.log
fi
# Delete the container
if ctr -n k8s.io containers delete "$CONTAINER_ID" 2>>deletion.log; then
echo "Container $CONTAINER_ID deleted successfully" | tee -a deletion.log
else
echo "Failed to delete container $CONTAINER_ID" | tee -a deletion.log
fi
done
Step 2: Execute the Script
- Copy the script to the portal VM.
- Set executable permission:
chmod +x delete_pause_containers_with_logging.sh - Run the script:
./delete_pause_containers_with_logging.sh - Save the output by copying the terminal log into a text file for reference.
Step 3: Reboot the System
reboot
Step 4: Verify System Health
sudo -i
apic status
kubectl get pods -A
Confirm that Kubernetes components are running and pods are in a healthy state.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"ARM Category":[{"code":"a8mKe000000CaZXIA0","label":"API Connect-\u003EAPIC Platform - Other"}],"ARM Case Number":"TS020270900","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.0.5"}]
Was this topic helpful?
Document Information
Modified date:
11 September 2025
UID
ibm17244712