IBM Cloud Pak for AIOps MustGather tool
Learn how to use the IBM Cloud Pak® for AIOps MustGather tool to gather information that is needed and relevant to a problem before you open a case with IBM Support.
Important: This MustGather tool gathers data for IBM Cloud Pak for AIOps deployments. For gathering data for Infrastructure Automation, see Infrastructure Automation data collection for IBM Support cases.
The IBM Cloud Pak for AIOps MustGather tool is a wrapper for running oc
commands. With this tool, you can complete data collection tasks and view analytics on the gathered data. After you run the tool to gather data, upload the output
data file that is under your /tmp
(or another directory if you used the ``-o` option) to the IBM Support website.
The tool generates a report that shows critical data about the status and health of the cluster and installation. This report allows the user view an overview of their cluster to help spot and solve multiple problems. The report can provide basic cluster analysis to identify cluster node resource utilization, warnings, and problematic resources. The report can also provide incident analysis, such as to identify whether Pod A is in CrashLoopBackOff.
The tool supports data collection for the following types of data:
Data type | Collection mode |
---|---|
oc get and oc describe (or kubectl ) commands |
This data is collected through the primary data collection modes. |
Pod logs | This data is collected through the primary data collection modes. |
Application logs within pods | This data collection is supported by the CPFILES secondary data collection mode. |
Command output for commands that run through pods | This data collection is supported by the MANUALCOLLECT secondary data collection mode. |
Command output that run on client machines (custom script) | This data collection is supported by the CMDEXEC secondary data collection mode. |
Logs or command output that run on individual cluster nodes, such as journal, systemctl, iptables | This data collection is supported by the CLUSTERNODES secondary data collection mode. |
Command syntax
waiops-mustgather.sh -[a|b|c|l] [-k resourname] [-z modules] [-d] [-L numlines] [-C cmdexec-file:cmdexec-env] [-E cmdexec-envvar] [-T timeout-value] [-m manual-collection-param] [-n manual-collection-envvar] [-j objname] [-w manual-collection-exec-timeout] [-e extra-namespaces:extra-resources] [-x product-version/path-to-config-file] [-A compliance-param] [-P plugins] [-Z plugins-envvar] [-Q plugins-timeout] [-F cpfiles-param] [-K] [-V view-only-module] [-y] [-J] [-p] [-t] [-g] [-f] [-R] [-S] [-s] [-u node-admin-username] [-i path-to-sshkey] [-o output-directory] [-G filename-regex:pattern] [-W primary-mode-var] [-N] [-I] [-X] [-Y] [-U] [-M] [-D] [-v] [-h]
- For details about the command options, see Data collection modes for the IBM Cloud Pak for AIOps MustGather tool.
- For details about running a custom script with the tool, see Run a custom script to complete corrective actions
Running the tool
Primary use cases
- Installation related problems
- Application related problems
- IBM Cloud Pak for AIOps related problems
- Log anomaly detection related problems
- Metric anomaly detection related problems
- Lifecycle related problems
- Datalayer related problems
- ChatOps related problems
- Healthcheck related problems
Installation related problems
To gather data for installation related problems, run the following command:
waiops-mustgather.sh -O install
The alias -O install
is expanded to -cypDfd
that uses the primary comprehensive (-c
) mode and clusternode (d
) mode to cause the tool to gather data for installation related problems.
To gather data for general non-installation related problems, run the following command:
waiops-mustgather.sh -O general
The alias -O general
is expanded to -cypDf
that is used to gather data for common problems.
Overall operation related problems
To gather data for overall IBM Cloud Pak for AIOps operation related problems, run the following command:
waiops-mustgather.sh -O aimgr
To gather raw data for IBM Cloud Pak for AIOps from its various data sources, such as Kafka, Flink, ElasticSearch and Postgres, run the following command:
waiops-mustgather.sh -O aimgr-data
To learn more about what aliases are available, use the command waiops-mustgather.sh -O .listalias
Important: The ALIAS mode (-O xxx
) is used to expand an alias into a predefined command. If there is a need to change or add any option, you need to revert to the conventional way of running the MustGather tool
by providing all the options or flags manually. For example, if a secret YAML is needed, you need to use -cypDfS
(plus any other options or flags).
Log anomaly detection related problems
To gather data for log anomaly related problems, run the following command:
waiops-mustgather.sh -O lad
Metric anomaly detection related problems
To gather data for metric anomaly related problems, run the following command:
waiops-mustgather.sh -O mad
Lifecycle related problems
To gather data for Lifecycle related problems, run the following command:
waiops-mustgather.sh -O lifecycle
Datalayer related problems
To gather data for Datalayer related problems, run the following command:
waiops-mustgather.sh -O datalayer
ChatOps related problems
To gather data for ChatOps related problems, run the following command:
waiops-mustgather.sh -O chatops
Healthcheck related problems
Use the healtcheck option to gather data to check whether your installation was successful or to analyze how IBM Cloud Pak for AIOps is running.
For more information about using this option, see Running the MustGather Healthcheck.
Additional use cases
- Run script to complete corrective actions
- Run baseline data collection
- Pod specific data collection
- Product specific data collection
- Installation verification
- Copying pod files and directories
- Complex command
Run a custom script to complete corrective actions
If you want to complete a corrective action, such as to delete a pod to restart it, you can write a script and run the following command to run the script:
waiops-mustgather.sh -C <script>
waiops-mustgather.sh -R -C <script>
Where <script>
is the script that contains the corrective action to run.
The preceding command will only run your script. To perform data collection before running the script, you can opt for any of the primary modes.”
Notes: If needed, you can run the secondary data collection modes MANUALCOLLECT
, EXTRA
, CPFILES
within your custom script.
The following examples show how to trigger the MANUALCOLLECT, EXTRA, and CPFILES modes:
-
To make sure the correction actions are successful, you can trigger MANUALCOLLECT to collect some data within the script
echo "ALL##NS4PROD=aimanager##pod##teams######" >> $CMD_EXEC_RESULT_MCCFG_FILE
-
You can also trigger EXTRA mode to collect namespace data within the script
echo "openshift-insights" > $CMD_EXEC_RESULT_EXTRANS_FILE
-
If you need to copy some files from some pods to make sure your script ran successfully:
echo "NS4PROD=aimanager##^iaf-system-kafka####/opt/kafka/config/log4j.properties" >> $CMD_EXEC_RESULT_CPFILES_FILE
For an example custom script and some of the variables and functions that you can include within it, see Example: custom script.
Run baseline data collection
To run a baseline data collection, use any of the primary data collection modes, such as -cypDf
, for example:
waiops-mustgather.sh -cypf -C <script> -e <namespace> -F 'NS4PROD=<prodname>,<podname-regex>,<container-name>,<filename>'
With the preceding command, you can run a baseline data collection that is followed by running a corrective action through a custom script. You can also check the result of the corrective action by using the -e
option where <namespace>
is the namespace where the custom script runs. You can also opt to copy files from the restarted pod through the -F
option.
Notes:
- Only
-c
is the primary data collection mode. Other options are supported, but should be used together. - If you want to run another round of data collection after your custom script runs, use a secondary data collection mode (other than
clusternodes
orcmdexec
).
Pod specific data collection
If you have to run commands on a specific pod, but you do not know the full name of the pod, such when you know only the UUID portion, you can use the manualcollect
mode:
waiops-mustgather.sh -R -m ps-out:[configfile]
Example (<configfile>
):
@ TAG##NAMESPACE##OBJTYPE##OBJNAME##CONTAINER##EXECPREP##EXECCMD
ps-out##NS4PROD=aimanager##pod##^cp4waiops-postgres-keeper######hostname;ps -ef
The podname is a regex. The tool determines the full name of the pod automatically.
[OUTPUT - found in file 4-MANUALCOLLECT/<namespace>/<podname>.bash.exec]
cp4waiops-postgres-keeper-0
UID PID PPID C STIME TTY TIME CMD
stolon 1 0 1 Nov09 ? 02:09:11 stolon-keeper --data-dir /stolon-data
stolon 29 1 0 Nov09 ? 00:00:00 [create-template] <defunct>
stolon 54 1 0 Nov09 ? 00:15:06 postgres -D /stolon-data/postgres -c unix_socket_directories=/tmp
stolon 59 54 0 Nov09 ? 00:00:09 postgres: checkpointer
stolon 60 54 0 Nov09 ? 00:00:10 postgres: background writer
stolon 61 54 0 Nov09 ? 00:01:00 postgres: walwriter
stolon 62 54 0 Nov09 ? 00:00:19 postgres: autovacuum launcher
stolon 63 54 0 Nov09 ? 00:02:47 postgres: stats collector
stolon 64 54 0 Nov09 ? 00:00:00 postgres: logical replication launcher
stolon 2905619 54 0 05:39 ? 00:00:00 postgres: cp4aiops cp4aiops 10.254.12.46(50562) idle
stolon 2906589 54 0 05:43 ? 00:00:00 postgres: cp4aiops cp4aiops 10.254.12.46(53384) idle
stolon 2918996 0 0 06:34 ? 00:00:00 bash -c hostname;ps -ef
stolon 2919006 2918996 0 06:34 ? 00:00:00 ps -ef
Product specific data collection
If you need to collect product-specific data, you can run the plugins secondary collection mode:
waiops-mustgather.sh -P aimanager
Installation verification
If you just want to check whether an installation is successful:
waiops-mustgather.sh -x X -DR
The preceding command scans the product namespaces for missing resources or objects based on the configuration file that is specified in the <MUSTGATHER-DIR>/missingobj/config/missingobj-waiops-<prodver>.cfg
file.
Copying pod files and directories
If you need to copy files from a pod, you can run a command similar to the following command:
waiops-mustgather.sh -F ‘NS4PROD=aimanager##aimanager-aio-log-anomaly-detector-####/opt/ai4it/*.log*’
The preceding command copies files and directories that match the filename wildcard of /opt/ai4it/*.log*
from pods that match regex aimanager-aio-log-anomaly-detector-
in the auto-resolved IBM Cloud Pak for AIOps
namespace.
Complex command
If needed, you can run a more complex command:
waiops-mustgather.sh -aypf -e ibm-common-services,cs-control,openshift-operators -P aimanager -G 'ALL:error|fail|exception|login' -C /tmp/test.sh -E 'TEST1=uname -a##TEST2=date' -m cqlsh:manualcollect.csv -n 'SQL="describe tables"' -F spark
Miscellaneous use cases
Scale down the MustGather data collection
If the cluster that you are working on is slow or throttles a lot, you can scale down the MustGather through the selective primary collection mode:
waiops-mustgather.sh -Xy -k deployments,sts,pods,jobs,pvc
The preceding command disables custom data collection (-X
), enables YAML collection (-y
) and runs the MustGather tool with the selective primary collection mode (-k
) to collect data on resources
such as deployments
, statefulsets
, pods
, jobs
, and pvc
.
Command help
If you need help with running the MustGather tool, you can use the tool help
command when you run the tool to view more details about the command:
waiops-mustgather.sh -h