IBM Support

MustGather: Performance, hang, or high CPU issues with WebSphere Application Server on Linux on Containers



If you are experiencing performance, hang, or high CPU issues with WebSphere Application Server on Linux on Containers (e.g. OpenShift or Kubernetes), this MustGather will assist you in collecting the data necessary to diagnose and resolve the issue.

Resolving The Problem

This tool gathers diagnostics without requiring any tool installation nor restarts of containers. It does this by using worker node debug pods to gather diagnostics on the worker node(s) rather than within the container(s).
  1. Note: This tool requires that you are logged in with a user that has the cluster-admin superuser privilege.
  2. Ensure that you have the oc command on your PATH and that you are logged into your cluster.
  3. Determine the name of the deployment or pod that you want to gather diagnostics for and the namespace it's in.
  4. Download a helper script:
    1. macOS or Linux:
    2. Windows: containerdiag.bat
  5. On macOS or Linux, make the script executable:
    chmod +x
  6. On macOS, remove the download quarantine:
    xattr -d
  7. Start the diagnostics:
    1. For WebSphere Liberty or OpenLiberty, use the script and replace $DEPLOYMENT with the deployment name:
      ./ -d $DEPLOYMENT
      (containerdiag.bat on Windows)
    2. For WebSphere Application Server traditional, use the script and replace $DEPLOYMENT with the deployment name:
      ./ -d $DEPLOYMENT
      (containerdiag.bat on Windows)
  8. After the command completes on each pod, the output will instruct you how to download the diagnostics in a new terminal window. For example: Files are ready for download. Download with the following command in another window:
      oc cp worker1-debug:/tmp/ --namespace=ffzhc74l4c
    After the download is complete, type OK and press ENTER: 
  9. After the download is complete, go back to the original terminal window and type "OK" and press enter. The script will continue iterating over the other pods or finish if there are no more pods.
  10. Upload the containerdiag*.tar.gz file(s) to the support case.

  1. cannot gather a Liberty server dump on vanilla Kubernetes clusters (i.e. not OpenShift) due to a permissions difference between oc debug node and kubectl debug node. If the cluster is OpenShift, ensure you're using oc instead of kubectl.
  2. and can only gather files (e.g. logs, configuration, javacores, etc.) from the container's ephemeral filesystem and cannot gather files from mounted persistent volumes. You may separately access such files through the underlying filesystem instead (e.g. NFS). You may vote on the feature request to support this if it affects you.
  1. By default, the script uses the image which is downloaded from the Red Hat registry into your cluster's container registry and executed. Therefore, the first time you run this, it may spend a long time after the output To use host binaries, run `chroot /host` as it is probably downloading the image.
  2. If your cluster does not have internet connectivity to, you may download the image locally, push the image to your cluster's container registry, and then use the -i option to use your cluster's image (see an example of how this may be done); for example: -i image-registry.openshift-image-registry.svc/ibm/containerdiag -d $DEPLOYMENT
  3. If you have any concerns using the image, you may build the image yourself using the source Containerfile and then use the -i option to use your custom image.
  4. To target a specific pod, use -p $POD instead of -d $DEPLOYMENT; for example: -p $POD
  5. By default, your "current" namespace is used. You may override this with -n $NAMESPACE; for example: -d $DEPLOYMENT -n $NAMESPACE
  6. The containerdiag image is built for Linux on the Intel 64-bit (x86_64), POWER 64-bit (ppc64le), IBM z 64-bit (s390x), and ARMv8 64-bit (arm64/aarch64) platforms.
  7. The and scripts support the following options (in seconds) to control the underlying (and a -c option to specify the javacore directory):
    -c Path to javacores (default /output/javacore*)
    -f Configuration directory (default /config)
    -l Logs directory (default /logs)
    For example, to change the SCRIPT_SPAN to 60 seconds: -d $DEPLOYMENT -s 60
  8. This tool also has other capabilities such as:
    1. Run tcpdump for a number of seconds: -d $DEPLOYMENT -q -0 $SECONDS
    2. Run the Linux perf native stack sampler for a number of seconds: -d $DEPLOYMENT -q -d $SECONDS
    3. Note that -q is needed in the above examples because otherwise each pod name would be passed as an argument which would cause the commands to fail.
    4. Any other command available in the underlying image.
  9. You may see some output repeated such as the "Files are ready for download" prompt. This is expected. The reason is that oc debug may time out after 1 minute of no input or output and if you haven't downloaded the files by then, the pod would have been deleted and the files would be gone; therefore, we periodically output some content so that the pod is not destroyed until you've completed the download.
  10. The debug pod automatically completes once the download is complete, so no further cleanup is required. You may find *debug* pods in the pod list with a status of "Completed" and a "Ready" column value of 0/1. The latter means that 0 containers are running for that pod. This is expected and a general feature of Kubernetes to keep completed pods around in case an administrator wants to look at their logs. They should be automatically deleted by Kubernetes garbage collection when disk usage exceeds certain thresholds, although you may delete them manually if you'd like.
  11. When downloading containerdiag*.tar.gz, you may receive an "EOF" error. Ensure you are running a version of kubectl or oc that matches your cluster version. Try again or try with --retries=-1. If the problem persists, open a support case with your vendor.
Version History
  1. April 26, 2023: Handle containerd (e.g. AKS/EKS 1.25) and add -l (logs) and -f (config) directory override options for and
  2. March 1, 2023: Do not assume replicas are active/ready. Diagnostics are still useful (and particularly so) if pods are not in those states.
  3. February 27, 2023: Fix Windows batch script handling of arguments with an asterisk in them
  4. November 8, 2022: Add -c option to and (and new to allow overriding the location of javacores in the container
  5. June 27, 2022: First version
This tool is provided as is without any warranty or support. Please report any issues through the GitHub repository and we'll try to resolve any issues as time permits.

Document Location


[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m3p000000F7ylAAC","label":"IBM WebSphere Liberty-All Platforms-\u003EHang Performance CPU"},{"code":"a8m50000000CdBVAA0","label":"WebSphere Application Server traditional-All Platforms-\u003EHang Performance CPU"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSA3RN","label":"IBM Semeru Runtimes"},"ARM Category":[],"Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"ARM Category":[],"Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
20 May 2024