Pods fail with CrashLoopBackOff

Pods remain in CrashLoopBackOff status and do not recover.

Symptoms

Pod fails with message similar to the following:

unexpected error watching template /etc/nginx/template/nginx.tmpl: no space left on device

Causes

No space left on device.

Resolving the problem

  1. Determine whether the problem is a file system issue.
    • Run the following command to determine if disk space usage is full: df -h
    • Run the following command to determine if inode space usage is full: df -i
    • Run the following command to see if there is an unreleased fd that is marked as deleted: lsof
  2. Use command, lsof | grep inotify | wc -l to check inotify usage. Use command sysctl fs.inotify.max_user_watches to check the current values. You may have reached the limit on the total number of inotify watches. You can increase the limit in fs.inotify.max_user_watches and restart the pods.
    # sysctl fs.inotify.max_user_watches=524288
    fs.inotify.max_user_watches = 524288
    # kubectl delete pod nginx-ingress-lb-amd64-6j9zm -n kube-system
    pod "nginx-ingress-lb-amd64-6j9zm" deleted