IBM Cloud Pak for AIOps MustGather tool

Learn how to use the IBM Cloud Pak® for AIOps MustGather tool to gather information that is needed and relevant to a problem before you open a case with IBM Support.

Important: This MustGather tool gathers data for IBM Cloud Pak for AIOps deployments. For gathering data for Infrastructure Automation, see Infrastructure Automation data collection for IBM Support cases.

The IBM Cloud Pak for AIOps MustGather tool is a wrapper for running oc commands. With this tool, you can complete data collection tasks and view analytics on the gathered data. After you run the tool to gather data, upload the output data file that is under your /tmp (or another directory if you used the ``-o` option) to the IBM Support website.

The tool generates a report that shows critical data about the status and health of the cluster and installation. This report allows the user view an overview of their cluster to help spot and solve multiple problems. The report can provide basic cluster analysis to identify cluster node resource utilization, warnings, and problematic resources. The report can also provide incident analysis, such as to identify whether Pod A is in CrashLoopBackOff.

The tool supports data collection for the following types of data:

Data type collection mode
Data type Collection mode
oc get and oc describe (or kubectl) commands This data is collected through the primary data collection modes.
Pod logs This data is collected through the primary data collection modes.
Application logs within pods This data collection is supported by the CPFILES secondary data collection mode.
Command output for commands that run through pods This data collection is supported by the MANUALCOLLECT secondary data collection mode.
Command output that run on client machines (custom script) This data collection is supported by the CMDEXEC secondary data collection mode.
Logs or command output that run on individual cluster nodes, such as journal, systemctl, iptables This data collection is supported by the CLUSTERNODES secondary data collection mode.

Command syntax

waiops-mustgather.sh -[a|b|c|l] [-k resourname] [-z modules] [-d] [-L numlines] [-C cmdexec-file:cmdexec-env] [-E cmdexec-envvar] [-T timeout-value] [-m manual-collection-param] [-n manual-collection-envvar] [-j objname] [-w manual-collection-exec-timeout] [-e extra-namespaces:extra-resources] [-x product-version/path-to-config-file] [-A compliance-param] [-P plugins] [-Z plugins-envvar] [-Q plugins-timeout] [-F cpfiles-param] [-K] [-V view-only-module] [-y] [-J] [-p] [-t] [-g] [-f] [-R] [-S] [-s] [-u node-admin-username] [-i path-to-sshkey] [-o output-directory] [-G filename-regex:pattern] [-W primary-mode-var] [-N] [-I] [-X] [-Y] [-U] [-M] [-D] [-v] [-h]

Running the tool

Primary use cases

Installation related problems

To gather data for installation related problems, run the following command:

waiops-mustgather.sh -O install

The alias -O install is expanded to -cypDfd that uses the primary comprehensive (-c) mode and clusternode (d) mode to cause the tool to gather data for installation related problems.

To gather data for general non-installation related problems, run the following command:

waiops-mustgather.sh -O general

The alias -O general is expanded to -cypDf that is used to gather data for common problems.

Overall operation related problems

To gather data for overall IBM Cloud Pak for AIOps operation related problems, run the following command:

waiops-mustgather.sh -O aimgr

To gather raw data for IBM Cloud Pak for AIOps from its various data sources, such as Kafka, Flink, ElasticSearch and Postgres, run the following command:

waiops-mustgather.sh -O aimgr-data

To learn more about what aliases are available, use the command waiops-mustgather.sh -O .listalias

Important: The ALIAS mode (-O xxx) is used to expand an alias into a predefined command. If there is a need to change or add any option, you need to revert to the conventional way of running the MustGather tool by providing all the options or flags manually. For example, if a secret YAML is needed, you need to use -cypDfS (plus any other options or flags).

Log anomaly detection related problems

To gather data for log anomaly related problems, run the following command:

waiops-mustgather.sh -O lad

Metric anomaly detection related problems

To gather data for metric anomaly related problems, run the following command:

waiops-mustgather.sh -O mad

Lifecycle related problems

To gather data for Lifecycle related problems, run the following command:

waiops-mustgather.sh -O lifecycle

Datalayer related problems

To gather data for Datalayer related problems, run the following command:

waiops-mustgather.sh -O datalayer

ChatOps related problems

To gather data for ChatOps related problems, run the following command:

waiops-mustgather.sh -O chatops

Healthcheck related problems

Use the healtcheck option to gather data to check whether your installation was successful or to analyze how IBM Cloud Pak for AIOps is running.

For more information about using this option, see Running the MustGather Healthcheck.

Additional use cases

Run a custom script to complete corrective actions

If you want to complete a corrective action, such as to delete a pod to restart it, you can write a script and run the following command to run the script:

waiops-mustgather.sh -C <script>
waiops-mustgather.sh -R -C <script>

Where <script> is the script that contains the corrective action to run.

The preceding command will only run your script. To perform data collection before running the script, you can opt for any of the primary modes.”

Notes: If needed, you can run the secondary data collection modes MANUALCOLLECT, EXTRA, CPFILES within your custom script.

The following examples show how to trigger the MANUALCOLLECT, EXTRA, and CPFILES modes:

  • To make sure the correction actions are successful, you can trigger MANUALCOLLECT to collect some data within the script

    echo "ALL##NS4PROD=aimanager##pod##teams######" >> $CMD_EXEC_RESULT_MCCFG_FILE
    
  • You can also trigger EXTRA mode to collect namespace data within the script

    echo "openshift-insights" > $CMD_EXEC_RESULT_EXTRANS_FILE
    
  • If you need to copy some files from some pods to make sure your script ran successfully:

    echo "NS4PROD=aimanager##^iaf-system-kafka####/opt/kafka/config/log4j.properties" >> $CMD_EXEC_RESULT_CPFILES_FILE
    

For an example custom script and some of the variables and functions that you can include within it, see Example: custom script.

Run baseline data collection

To run a baseline data collection, use any of the primary data collection modes, such as -cypDf, for example:

waiops-mustgather.sh -cypf -C <script> -e <namespace> -F 'NS4PROD=<prodname>,<podname-regex>,<container-name>,<filename>'

With the preceding command, you can run a baseline data collection that is followed by running a corrective action through a custom script. You can also check the result of the corrective action by using the -e option where <namespace> is the namespace where the custom script runs. You can also opt to copy files from the restarted pod through the -F option.

Notes:

  • Only -c is the primary data collection mode. Other options are supported, but should be used together.
  • If you want to run another round of data collection after your custom script runs, use a secondary data collection mode (other than clusternodes or cmdexec).

Pod specific data collection

If you have to run commands on a specific pod, but you do not know the full name of the pod, such when you know only the UUID portion, you can use the manualcollect mode:

waiops-mustgather.sh -R -m ps-out:[configfile]

Example (<configfile>):

@ TAG##NAMESPACE##OBJTYPE##OBJNAME##CONTAINER##EXECPREP##EXECCMD
ps-out##NS4PROD=aimanager##pod##^cp4waiops-postgres-keeper######hostname;ps -ef

The podname is a regex. The tool determines the full name of the pod automatically.

[OUTPUT - found in file 4-MANUALCOLLECT/<namespace>/<podname>.bash.exec]
cp4waiops-postgres-keeper-0
UID          PID    PPID  C STIME TTY          TIME CMD
stolon         1       0  1 Nov09 ?        02:09:11 stolon-keeper --data-dir /stolon-data
stolon        29       1  0 Nov09 ?        00:00:00 [create-template] <defunct>
stolon        54       1  0 Nov09 ?        00:15:06 postgres -D /stolon-data/postgres -c unix_socket_directories=/tmp
stolon        59      54  0 Nov09 ?        00:00:09 postgres: checkpointer
stolon        60      54  0 Nov09 ?        00:00:10 postgres: background writer
stolon        61      54  0 Nov09 ?        00:01:00 postgres: walwriter
stolon        62      54  0 Nov09 ?        00:00:19 postgres: autovacuum launcher
stolon        63      54  0 Nov09 ?        00:02:47 postgres: stats collector
stolon        64      54  0 Nov09 ?        00:00:00 postgres: logical replication launcher
stolon   2905619      54  0 05:39 ?        00:00:00 postgres: cp4aiops cp4aiops 10.254.12.46(50562) idle
stolon   2906589      54  0 05:43 ?        00:00:00 postgres: cp4aiops cp4aiops 10.254.12.46(53384) idle
stolon   2918996       0  0 06:34 ?        00:00:00 bash -c hostname;ps -ef
stolon   2919006 2918996  0 06:34 ?        00:00:00 ps -ef

Product specific data collection

If you need to collect product-specific data, you can run the plugins secondary collection mode:

waiops-mustgather.sh -P aimanager

Installation verification

If you just want to check whether an installation is successful:

waiops-mustgather.sh -x X -DR

The preceding command scans the product namespaces for missing resources or objects based on the configuration file that is specified in the <MUSTGATHER-DIR>/missingobj/config/missingobj-waiops-<prodver>.cfg file.

Copying pod files and directories

If you need to copy files from a pod, you can run a command similar to the following command:

waiops-mustgather.sh -F ‘NS4PROD=aimanager##aimanager-aio-log-anomaly-detector-####/opt/ai4it/*.log*’

The preceding command copies files and directories that match the filename wildcard of /opt/ai4it/*.log* from pods that match regex aimanager-aio-log-anomaly-detector- in the auto-resolved IBM Cloud Pak for AIOps namespace.

Complex command

If needed, you can run a more complex command:

waiops-mustgather.sh -aypf -e ibm-common-services,cs-control,openshift-operators -P aimanager -G 'ALL:error|fail|exception|login' -C /tmp/test.sh -E 'TEST1=uname -a##TEST2=date' -m cqlsh:manualcollect.csv -n 'SQL="describe tables"' -F spark

Miscellaneous use cases

Scale down the MustGather data collection

If the cluster that you are working on is slow or throttles a lot, you can scale down the MustGather through the selective primary collection mode:

waiops-mustgather.sh -Xy -k deployments,sts,pods,jobs,pvc

The preceding command disables custom data collection (-X), enables YAML collection (-y) and runs the MustGather tool with the selective primary collection mode (-k) to collect data on resources such as deployments, statefulsets, pods, jobs, and pvc.

Command help

If you need help with running the MustGather tool, you can use the tool help command when you run the tool to view more details about the command:

waiops-mustgather.sh -h