Advanced tips for using the IBM Cloud Pak for AIOps MustGather tool
Learn more about advanced techniques that you can use with the IBM Cloud Pak® for AIOps MustGather tool to gather information for opening a case with IBM® Support.
Tip: Use manual collection mode to run commands
If you need to run oc exec
commands on pods, you can use the manual collection (MANUALCOLLECT
) mode. This mode can be configured to complete specific tasks, such as to "get all Elasticsearch indexes".
The MustGather Tool does include some default task configurations. For more information, review the manualcollect/<version>/mc.cfg
file within the MustGather tool installation package.
Example command:
# ./waiops-mustgather.sh -R -m aimgr-indices:mc.cfg
Example output:
===================================================
[MANUALCOLLECT] OUTPUT OF EXECCMD
===================================================
[CFGLINE = lifecycle-policy##cp4waiops##pod##^aiops-ir-lifecycle-policy-registry-svc-######curl -u$(cat $SYSTEMAUTH_BINDING_DIR/username):$(cat $SYSTEMAUTH_BINDING_DIR/password) http://localhost:$PORT/policyregistry.ibm-netcool-prod.aiops.io/v1alpha/system/cfd95b7e-3bc7-4006-a4a8-a73a79c71255]
[FILE = /tmp/waiops-manualcollect-07112022-190154/4-MANUALCOLLECT/TAG=lifecycle-policy/cp4waiops/aiops-ir-lifecycle-policy-registry-svc-7587b88cdb-rk594.bash.exec]
[POD = aiops-ir-lifecycle-policy-registry-svc-7587b88cdb-rk594]
Defaulted container "aiops-ir-lifecycle-policy-registry-svc" out of: aiops-ir-lifecycle-policy-registry-svc, policyregistry-checkdb (init)
[
{
"tenantid": "cfd95b7e-3bc7-4006-a4a8-a73a79c71255",
"policyid": "80301820-406a-11ed-bf66-4dfc9e76db68",
"configuration": "{\"executionPriority\":80,\"state\":\"enabled\",\"spec\":{\"triggers\":[{\"entityId\":\"alert\",\"triggerId\":\"aiops.ibm.com/trigger/alert-pre-update\",\"arguments\":{\"condition\":\"{{#if (exists alert.sender.name)}} {{ alert.sender.name }} == \\\"Log Anomaly\\\" and not {{ isEmpty alert.insights }} {{else}} false {{/if}}\"}},{\"entityId\":\"alert\",\"triggerId\":\"aiops.ibm.com/trigger/alert-pre-create\",\"arguments\":{\"condition\":\"{{#if (exists alert.sender.name)}} {{ alert.sender.name }} == \\\"Log Anomaly\\\" and not {{ isEmpty alert.insights }} {{else}} false {{/if}}\"}}],\"actions\":[{\"actionId\":\"aiops.ibm.com/action/internal/partition\",\"arguments\":{\"partitionKey\":{\"$variable\":\"alert.id\"},\"actions\":[{\"actionId\":\"aiops.ibm.com/action/internal/array/foreach\",\"arguments\":{\"array\":{\"$variable\":\"alert.insights\"},\"condition\":\"{{element.type}} == \\\"aiops.ibm.com/insight-type/lad/templates\\\"\",\"foreach\":[{\"actionId\":\"aiops.ibm.com/action/internal/insights/aggregate\",\"arguments\":{\"aggregationKey\":{\"$variable\":\"alert.id\"},\"insightDetails\":{\"$variable\":\"element.details\"},\"insightType\":{\"$variable\":\"element.type\"},\"insightId\":{\"$variable\":\"element.id\"}},\"output\":\"alert.insights.[@id=insight-lad]\"}]}}]}}]},\"hash\":\"7d1f1b4fbb433259aaf0bf2ffc3305de06e41d98\",\"revision\":\"ee305a657825c73944447ec2ffdedd6d6519530d\",\"hotfields\":[]}",
"entityid": "[\"alert\",\"alert\"]",
"metadata": "{\"labels\":{\"ibm.com/is-default\":\"true\"},\"name\":\"Aggregate log anomaly detection count vectors\",\"description\":\"Aggregates the log message count vectors output by log anomaly events, across the event occurences contributing to a given alert.\\n\",\"createdBy\":{\"id\":\"system\",\"type\":\"system\"},\"lastUpdatedBy\":{\"id\":\"system\",\"type\":\"system\",\"changeDetails\":\"Created\"},\"lastUpdatedTimestamp\":\"2022-09-30T02:49:27.969Z\",\"creationTimestamp\":\"2022-09-30T02:49:27.969Z\"}",
"triggerid": "[\"aiops.ibm.com/trigger/alert-pre-update\",\"aiops.ibm.com/trigger/alert-pre-create\"]"
},
...
Tip: Use CPFILES
to copy files from pods
If you need to copy specific files from pods, you can use the CPFILES
option.
For example, the following command uses the MustGather tool to pull the /conf/redis/redis.conf
file from all pods that match regex expression ^c-example-redis-m-
. This example uses the following parameters:
NS4PROD=
is a special keyword that instructs the tool to auto-resolve the namespace for productaimanager
.keyword
andregex
(avoid hardcoding) are used to allow the configuration to be used across different environments without the need to reconfigure.
Example command:
[file cpfiles/cpfiles-waiops-320.cfg in the package]
test##NS4PROD=aimanager##^c-example-redis-m-####/conf/redis/redis.conf
Example output:
# pwd
/tmp/waiops-cpfiles-01042022-173635/8-CPFILES/waiops32
# ls -l c-example-redis-m-0/
total 56
-rw-r--r-- 1 root root 51281 Apr 1 17:36 '^conf^redis^redis.conf'
-rw-r--r-- 1 root root 78 Apr 1 17:36 '^conf^redis^redis.conf.ls'
# ls -l c-example-redis-m-1
total 56
-rw-r--r-- 1 root root 51306 Apr 1 17:36 '^conf^redis^redis.conf'
-rw-r--r-- 1 root root 78 Apr 1 17:36 '^conf^redis^redis.conf.ls'
# ls -l c-example-redis-m-2
total 56
-rw-r--r-- 1 root root 51306 Apr 1 17:36 '^conf^redis^redis.conf'
-rw-r--r-- 1 root root 78 Apr 1 17:36 '^conf^redis^redis.conf.ls'
Tip: Run custom scripts with the MustGather tool
You can run your own script with the MustGather Tool, such as for completing corrective actions before the tool collects data. After completing the custom script actions, the tool then proceeds with the data collection.
- Custom script variable
- Custom script functions
- Example Command
- Example script
- Example script functions
- Example script results
Custom script variable
The following tables list some of the variables that you can include within a custom script:
Variable | Description |
---|---|
$OC_CMD |
The system command used to collect data. You can use either an oc or kubectl command. |
$OC_WHOAMI |
The current user. |
$AWK |
The AWK used in the tool, which can be either awk or gawk (preferred). |
$OLDIFS |
The original IFS. |
$PRODUCT_VERSION |
The product version detected during runtime (on site). |
$MUSTGATHER_DIRNAME |
The directory name where the MustGather tool is installed. |
$OUTDIR |
The directory where the output data is written. |
$PROD_NAMESPACES_REFFILE |
The product-to-namespace mapping reference file (Do not change). |
$PROD_NAMESPACES_UNIQUE_REFFILE |
The unique product namespaces reference file (Do not change). |
$PROD_NAMESPACES_INVALID_STR |
A constant string value that represents the failed getNamespaceToProdMapping() function. |
$CMD_EXEC_OUTPUT_DIR |
The directory where the output of CMDEXEC is written to. |
$CMD_EXEC_PRODUCT_NSFILE |
A copy of the original product namespaces reference file that needs to be used by the inherited getNamespaceToProdMapping() function. |
$CMD_EXEC_RESULT_MCCFG_FILE |
A file that the custom script can write to for running the MANUALCOLLECT secondary mode. |
$CMD_EXEC_RESULT_EXTRANS_FILE |
A file that the custom script can write to for running the EXTRA (namespace) secondary mode. |
$CMD_EXEC_RESULT_CPFILES_FILE |
A file that the custom script can write to for running the CPFILES secondary mode. |
Custom script functions
The following tables list some of the functions that you can include within a custom script:
Functions | Description |
---|---|
getProdToNamespacesMapping() |
A function to retrieve product namespace through provided product name. Syntax: getProdToNamespacesMapping <product-name> $CMD_EXEC_PRODUCT_NSFILE Returns: - Product namespaces of the given product name or PROD_NAMESPACES_INVALID_STR . - Return status 0 for successful execution and 1 for a failed execution. Product namespaces: - aimanager = IBM Cloud Pak for AIOps - noihybrid = IBM Netcool Operations Insight (hybrid) - ics = IBM Cloud Pak foundational services |
getNamespaceToProdMapping() |
A function to retrieve product name through provided product namespace. Syntax: getNamespaceToProdMapping <product-namespace> $CMD_EXEC_PRODUCT_NSFILE Returns: Product name of the given product namespace. |
getResourceInstances() |
A function to retrieve a list of resource instances. Syntax: getResourceInstances <resource type> <resource name regex> <resource namespace> Where <resource namespace> is optional and defaults to –all-namespaces . Returns: - A list of resources that match the provided details or a blank string if the command fails or no resource is detected. - Return status 0 for successful execution and 1 for a failed execution. |
getUniqueProdNamespaces() |
A function to retrieve a list of unique product namespaces. Syntax: getUniqueProdNamespaces $CMD_EXEC_PRODUCT_NSFILE Returns: A list of unique product namespaces or blank string if failed. |
checkUserPriv() |
A function to check whether the current user has privilege to complete specific actions (verb) on a specific resource in a specific namespace. Syntax checkUserPriv <verb> <resource type> <resource namespace> Returns: 0 to indicate the user has the required privileges and 1 to indicate that user does not have the required privilege. <verb> reference: Kubernetes API Concepts. |
isPodOk() |
A function to determine whether a pod is in RUNNING state and all its containers are up. Syntax: isPodOk <namespace> <podname> Returns: 0 for an OK pod and 1 for a pod that is not OK. |
isPodLineOk() |
A function to determine whether a pod is in RUNNING state and all its containers are up. Syntax: isPodLineOk <’oc get –no-headers’ output of the pod> Returns: 0 for an OK pod and 1 for an pod that is not OK. |
Example Command
waiops-mustgather.sh -R -C /tmp/elite.sh##/tmp/env.in -Y
The previous command uses an environment variable file /tmp/env.in
to pass in variables into the script.
POD=teams
CMD=date > /tmp/date.out
Example script
The following code shows an example custom script:
NOI_NAMESPACE=$(getProdToNamespacesMapping noi $CMD_EXEC_PRODUCT_NSFILE)
if [ ${#NOI_NAMESPACE} -gt 0 ]
then
echo "[$(date)] Detected namespace = [$NOI_NAMESPACE]"
IMPACTGUI_POD=$(getResourceInstances pod impactgui $NOI_NAMESPACE)
if [ ${#IMPACTGUI_POD} -gt 0 ]
then
echo "[$(date)] Detected IMPACT GUI pod = [$IMPACTGUI_POD]"
oc delete pod -n $NOI_NAMESPACE $IMPACTGUI_POD
WAITFORPOD=0
while [ $WAITFORPOD -eq 0 ]
do
if isPodOk "$NOI_NAMESPACE" "$IMPACTGUI_POD"
then
OUTFILE=$($OC_CMD exec -n $NOI_NAMESPACE $IMPACTGUI_POD -- /opt/IBM/tivoli/impact/bin/nci_collect_logs | grep 'Netcool/Impact log file is' | $AWK '{ print $7 }')
if [ ${#OUTFILE} -gt 0 ]
then
echo "[$(date)] Creating COLLECTLOG timestamp file..."
echo "ALL##$NOI_NAMESPACE##pod##$(getResourceInstances pod impactgui $NOI_NAMESPACE)####touch /tmp/collectlog" > $CMD_EXEC_RESULT_MCCFG_FILE
echo "[$(date)] Detected IMPACT GUI logfile = [$OUTFILE]"
echo "$NOI_NAMESPACE##$IMPACTGUI_POD####$OUTFILE" > $CMD_EXEC_RESULT_CPFILES_FILE
WAITFORPOD=1
else
echo "[$(date)] Unable to locate IMPACT GUI logfiles!"
exit 1
fi
else
echo "[$(date)] IMPACT GUI pod is not running!"
sleep 5
fi
done
else
echo "[$(date)] Unable to locate IMPACT GUI pod!"
exit 1
fi
else
echo "[$(date)] Unable to determine NOI namespace!"
exit 1
fi
Example script functions
When you are creating your custom script, you can use the following functions:
-
The function
getProdToNamespacesMapping
can be used in anyCMDEXEC
script to obtain product namespaces:NS=$(getProdToNamespacesMapping aimanager $CMD_EXEC_PRODUCT_NSFILE) echo "NAMESPACE=$NS"
-
The Function
getResourceInstances
can be used in anyCMDEXEC
script to obtain resource instances based on a provided regex expression such to complete a corrective action:for MYPOD in $(getResourceInstances pod $POD $NS) do MYCMD="oc exec $MYPOD -n $NS -- bash -c '$CMD'" echo $MYCMD eval $MYCMD done
-
The
MANUALCOLLECT
mode can be used to collect data within the script, such as to ensure that the correction actions are successful:echo "ALL##NS4PROD=aimanager##pod##teams####" >> $CMD_EXEC_RESULT_MCCFG_FILE
-
The
EXTRA
mode can be used to collect namespace data within the script:echo "openshift-insights" > $CMD_EXEC_RESULT_EXTRANS_FILE
-
The
CPFILES
option can be used to copy files from pods to verify that your script ran successfully:echo "NS4PROD=aimanager##^iaf-system-kafka####/opt/kafka/config/log4j.properties" >> $CMD_EXEC_RESULT_CPFILES_FILE
Example script results
-
When the command
date > /tmp/date.out
runs in the pod:oc exec -it
Output:
aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp -- bash -c 'ls -l /tmp; cat /tmp/date.out' Defaulted container "chatops-teams-integrator" out of: chatops-teams-integrator, controller-is-ready (init) total 12 -rw-r--r--. 1 default root 29 Apr 2 01:00 date.out -rwx------. 1 root root 701 Sep 14 2021 ks-script-vzv5jj6c -rwx------. 1 root root 291 Sep 14 2021 ks-script-y5xeen0d Sat Apr 2 01:00:25 UTC 2022
-
Through
MANUALCOLLECT
, the data of the pod that matched the regexteams
is collected. You can verify this by running the following commands:-
Print the working directory:
# pwd
Output:
/tmp/waiops-cmdexec-01042022-175950/4-MANUALCOLLECT/TAG=ALL/waiops32
-
List the directory files:
# ls -l
Output:
total 4 -rw-r--r-- 1 root root 180 Apr 1 18:00 oc_pod-aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp.out drwxr-xr-x 2 root root 164 Apr 1 18:00 pod
-
List the pods files:
total 12 -rw-r--r-- 1 root root 1551 Apr 1 18:00 aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp~chatops-teams-integrator.log -rw-r--r-- 1 root root 7315 Apr 1 18:00 aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp.desc
-
-
Through the
EXTRA
mode, data about the namespaceopenshift-insights
is collected. You can verify this by running the following commands:-
Print the working directory:
# pwd
Output:
/tmp/waiops-cmdexec-01042022-175950/5-EXTRA/openshift-insights
-
List the directory files:
# ls -l
Output:
total 28 drwxr-xr-x 2 root root 132 Apr 1 18:00 configmaps drwxr-xr-x 2 root root 36 Apr 1 18:00 deployment.apps -rw-r--r-- 1 root root 1738 Apr 1 18:00 oc_all.out -rw-r--r-- 1 root root 194 Apr 1 18:00 oc_configmaps.out -rw-r--r-- 1 root root 328 Apr 1 18:00 oc_pod_scc.out -rw-r--r-- 1 root root 52 Apr 1 18:00 oc_pvc.out -rw-r--r-- 1 root root 1427 Apr 1 18:00 oc_secrets.out -rw-r--r-- 1 root root 155 Apr 1 18:00 oc_serviceaccounts.out drwxr-xr-x 2 root root 117 Apr 1 18:00 pod drwxr-xr-x 2 root root 35 Apr 1 18:00 pvc drwxr-xr-x 2 root root 86 Apr 1 18:00 replicaset.apps drwxr-xr-x 2 root root 4096 Apr 1 18:00 secrets drwxr-xr-x 2 root root 26 Apr 1 18:00 service drwxr-xr-x 2 root root 107 Apr 1 18:00 serviceaccounts
-
-
Through CPFILES, the file
/opt/kafka/config/log4j.properties
is copied from the pods that matches the regex expression^iaf-system-kafka
. You can verify this by running the following commands:-
Print the working directory:
# pwd
Output:
/tmp/waiops-cmdexec-01042022-175950/8-CPFILES/waiops32/iaf-system-kafka-0
-
List the directory files:
# ls -l
Output:
total 12 -rw-r--r-- 1 root root 4674 Apr 1 18:00 '^opt^kafka^config^log4j.properties' -rw-r--r-- 1 root root 77 Apr 1 18:00 '^opt^kafka^config^log4j.properties.ls'
-
View the first lines of the file to verify the file contents:
# head ^opt^kafka^config^log4j.properties
Output:
# Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software
-