General Page
-
Pending: The pod is accepted but not yet running, often waiting to be scheduled or for resources (e.g., images, volumes) to be provisioned.
- Implications:
- Scheduling Issues: Insufficient nodes, CPU, memory, or node taints preventing scheduling.
- Image Pull Failures: Network issues, registry downtime, or missing image pull secrets.
- Storage Delays: PersistentVolumeClaim (PVC) binding issues due to misconfigured storage classes or insufficient storage capacity.
- Cluster Health: Scheduler or API server issues.
- Actions:
- Check the pod’s events for issues like image pull errors or scheduling failures:
oc get pods -n <namespace>oc describe pod <pod-name> -n <namespace>
- Check the pod’s events for issues like image pull errors or scheduling failures:
- Implications:
-
Running: The pod is bound to a node, and all containers are created and running.
- Implications:
- The pod is operational, but monitor for resource contention or node health issues. Application-level issues (e.g., failing readiness probes) may exist.
- Actions:
- Verify logging/monitoring systems capture pod metrics:
oc get pods -n <namespace>oc logs <pod-name> -n <namespace>
- Verify logging/monitoring systems capture pod metrics:
- Implications:
-
Succeeded: All containers in the pod have terminated successfully (exit code 0) and won’t restart.
- Actions:
- Confirm the job’s output or logs to ensure task completion.
- Actions:
-
Failed: At least one container terminated with a non-zero exit code, and no containers are running.
- Implications:
- Resource Limits: OOMKilled (exit code 137) due to memory limits or CPU throttling.
- Node Issues: Disk pressure, network failures, or node crashes may cause failures.
- Configuration Errors: Missing secrets, config maps, or storage volumes can trigger failures.
- Image Issues: Invalid or inaccessible container images.
- Actions:
- Use
oc describe pod <pod-name>oroc logs <pod-name>to investigate failure reasons.
- Use
- Implications:
-
Unknown: The pod’s state cannot be determined, typically due to communication failures between the node and the control plane.
- Implications:
- Node Failure: The node may be down, unreachable, or experiencing kubelet issues.
- Network Issues: Cluster networking problems preventing status updates.
- Control Plane Issues: API server or etcd failures.
- Actions:
- Check node status:
oc get nodes. - Restart kubelet:
systemctl restart kubelet. - Investigate network connectivity and check control plane components:
oc get pods -n openshift-kube-apiserveroc get clusteroperators.
- Evict pods to reschedule:
oc delete pod <pod-name> --force.
- Check node status:
- Implications:
-
CrashLoopBackOff (Condition, not a phase): A container is repeatedly crashing, and Kubernetes is waiting before restarting it.
- Implications:
- Resource Constraints: Memory or CPU limits causing crashes.
- Node Issues: Disk or network problems affecting stability.
- Configuration Issues: Missing or incorrect secrets, config maps, or environment variables.
- Actions:
- Check logs:
oc logs <pod-name> --previous. - Review resource usage:
oc adm top pod <pod-name>. - Validate node health:
oc describe node <node-name>. - Ensure existence of secrets/config maps:
oc get secret <secret-name>oroc get configmap <configmap-name>.
- Check logs:
- Implications:
-
Completed (Similar to Succeeded): The pod has finished successfully, and all containers exited with code 0.
- Actions:
- Review logs or output to confirm job success.
- Actions:
-
ContainerCreating: The pod is being created, often pulling images or mounting volumes.
- Implications:
- Image Pull Delays: Network latency or registry issues.
- Storage Issues: Slow volume provisioning or misconfigured storage classes.
- Actions:
- Check events:
oc describe pod <pod-name>. - Verify registry connectivity and validate storage provisioner:
oc get storageclassoc describe pvc.
- Optimize image size or use a local registry.
- Check events:
- Implications:
-
Error: The pod encountered an error during creation or execution, often due to misconfiguration.
- Implications:
- Configuration Issues: Invalid image, missing resources, or permissions.
- Cluster Issues: Issues with underlying infrastructure components.
- Actions:
- Review events:
oc describe pod <pod-name>. - Check RBAC permissions:
oc describe sa <service-account-name>. - Validate infrastructure resources.
- Review events:
- Implications:
-
Terminating: The pod is being deleted, either manually, via scaling, or eviction.
- Implications:
- Deletion Delays: Finalizers, graceful termination, or network issues.
- Node Issues: Eviction due to node maintenance or resource pressure.
- Actions:
- Check finalizers:
oc get pod <pod-name> -o yaml. - Force deletion:
oc delete pod <pod-name> --force. - Review node events:
oc describe node <node-name>. - Ensure proper node drainage:
oc adm drain <node-name>.
- Check finalizers:
- Implications:
Was this topic helpful?
Document Information
Modified date:
20 June 2025
UID
ibm17236474