Troubleshooting storage limit errors

If data exceeds the ephemeral storage limit (3 GIB), cpctl utility commands can fail. Learn how to diagnose and resolve this issue.

About this task

You can receive Error: command terminated with exit code 137 errors from cpctl if Guardium® Insights files exceed the ephemeral storage limit. The ephemeral storage contains various internal files, such as MustGather files, core dumps, and slons.

Take the following steps to diagnose and resolve this issue.

Procedure

  1. To diagnose the issue, get the name of cp-serviceability pod. For example,
    $ oc get pods | grep sysqa-cp-serviceability
    sysqa-cp-serviceability-d8f8f94cd-jj9mm                0/1     Evicted     0          5d17h
    Note: The pod might be in an Evicted state.
  2. Check the events for that pod, for example,
    oc describe pod sysqa-cp-serviceability-d8f8f94cd-jj9mm
  3. The events section is at the end of the describe pod output. Check for any issues that are related to ephemeral storage, for example:
    Events:
      Type     Reason   Age   From     Message
      ----     ------   ----  ----     -------
      Warning  Evicted  46m   kubelet  Pod ephemeral local storage usage exceeds the total limit of containers 3Gi.
      Normal   Killing  46m   kubelet  Stopping container sysqa-cp-serviceability

    If you see the Pod ephemeral local storage usage exceeds the total limit of containers 3Gi message, then storage is too full to hold the files produced by cpctl command.

    If the command fails without an error message, contact IBM Support.

What to do next

The storage might be full from running multiple cpctl commands. As a first step, clean up the old files.

  1. Use the oc delete pod command to delete the pod. For example,
    oc delete pod sysqa-cp-serviceability-d8f8f94cd-jj9mm
    
  2. The pod is deleted and a new, empty pod is created.
  3. Run the following command to connect cpctl to the new pod.
    ./cpctl load
  4. Run cpctl command again after the new pod is running.

If you still get an error, you might have a situation in which the the size of files from one command is exceeding the limit.

  1. To increase the limit, run the following command:
    oc edit guardiuminsights
    
  2. Under the spec section, add a section for cp-serviceability.
    cp-serviceability:
      resources:
        limits:
          ephemeral-storage: xGi
        requests:
          ephemeral-storage: xGi
    

    Where xGi is the new storage limit (in GIB). Increase the size by increments and retest the cpctl command each time. That is, increase the size to 5Gi, then 10Gi, and then to 20Gi.

  3. Wait until the change propagates down to the cp-serviceability pod. Use the following command under ephemeral storage to see whether the new pod is available.
     oc describe pod <cp-serviceability pod>
  4. Run the following command to connect cpctl to the new, larger, pod.
    ./cpctl load
  5. Run cpctl command again after the new pod is running.

If the problem persists even after you specify 20Gi ephemeral storage, contact IBM Support.