Advanced tips for using the IBM Cloud Pak for AIOps MustGather tool

Learn more about advanced techniques that you can use with the IBM Cloud Pak® for AIOps MustGather tool to gather information for opening a case with IBM® Support.

Tip: Use manual collection mode to run commands

If you need to run oc exec commands on pods, you can use the manual collection (MANUALCOLLECT) mode. This mode can be configured to complete specific tasks, such as to "get all Elasticsearch indexes".

The MustGather Tool does include some default task configurations. For more information, review the manualcollect/<version>/mc.cfg file within the MustGather tool installation package.

Example command:

# ./waiops-mustgather.sh -R -m aimgr-indices:mc.cfg

Example output:

===================================================
[MANUALCOLLECT] OUTPUT OF EXECCMD
===================================================

[CFGLINE = lifecycle-policy##cp4waiops##pod##^aiops-ir-lifecycle-policy-registry-svc-######curl -u$(cat $SYSTEMAUTH_BINDING_DIR/username):$(cat $SYSTEMAUTH_BINDING_DIR/password) http://localhost:$PORT/policyregistry.ibm-netcool-prod.aiops.io/v1alpha/system/cfd95b7e-3bc7-4006-a4a8-a73a79c71255]

[FILE = /tmp/waiops-manualcollect-07112022-190154/4-MANUALCOLLECT/TAG=lifecycle-policy/cp4waiops/aiops-ir-lifecycle-policy-registry-svc-7587b88cdb-rk594.bash.exec]

[POD = aiops-ir-lifecycle-policy-registry-svc-7587b88cdb-rk594]

Defaulted container "aiops-ir-lifecycle-policy-registry-svc" out of: aiops-ir-lifecycle-policy-registry-svc, policyregistry-checkdb (init)

[

  {

    "tenantid": "cfd95b7e-3bc7-4006-a4a8-a73a79c71255",

    "policyid": "80301820-406a-11ed-bf66-4dfc9e76db68",

    "configuration": "{\"executionPriority\":80,\"state\":\"enabled\",\"spec\":{\"triggers\":[{\"entityId\":\"alert\",\"triggerId\":\"aiops.ibm.com/trigger/alert-pre-update\",\"arguments\":{\"condition\":\"{{#if (exists alert.sender.name)}} {{ alert.sender.name }} == \\\"Log Anomaly\\\" and not {{ isEmpty alert.insights }} {{else}} false {{/if}}\"}},{\"entityId\":\"alert\",\"triggerId\":\"aiops.ibm.com/trigger/alert-pre-create\",\"arguments\":{\"condition\":\"{{#if (exists alert.sender.name)}} {{ alert.sender.name }} == \\\"Log Anomaly\\\" and not {{ isEmpty alert.insights }} {{else}} false {{/if}}\"}}],\"actions\":[{\"actionId\":\"aiops.ibm.com/action/internal/partition\",\"arguments\":{\"partitionKey\":{\"$variable\":\"alert.id\"},\"actions\":[{\"actionId\":\"aiops.ibm.com/action/internal/array/foreach\",\"arguments\":{\"array\":{\"$variable\":\"alert.insights\"},\"condition\":\"{{element.type}} == \\\"aiops.ibm.com/insight-type/lad/templates\\\"\",\"foreach\":[{\"actionId\":\"aiops.ibm.com/action/internal/insights/aggregate\",\"arguments\":{\"aggregationKey\":{\"$variable\":\"alert.id\"},\"insightDetails\":{\"$variable\":\"element.details\"},\"insightType\":{\"$variable\":\"element.type\"},\"insightId\":{\"$variable\":\"element.id\"}},\"output\":\"alert.insights.[@id=insight-lad]\"}]}}]}}]},\"hash\":\"7d1f1b4fbb433259aaf0bf2ffc3305de06e41d98\",\"revision\":\"ee305a657825c73944447ec2ffdedd6d6519530d\",\"hotfields\":[]}",

    "entityid": "[\"alert\",\"alert\"]",

    "metadata": "{\"labels\":{\"ibm.com/is-default\":\"true\"},\"name\":\"Aggregate log anomaly detection count vectors\",\"description\":\"Aggregates the log message count vectors output by log anomaly events, across the event occurences contributing to a given alert.\\n\",\"createdBy\":{\"id\":\"system\",\"type\":\"system\"},\"lastUpdatedBy\":{\"id\":\"system\",\"type\":\"system\",\"changeDetails\":\"Created\"},\"lastUpdatedTimestamp\":\"2022-09-30T02:49:27.969Z\",\"creationTimestamp\":\"2022-09-30T02:49:27.969Z\"}",

    "triggerid": "[\"aiops.ibm.com/trigger/alert-pre-update\",\"aiops.ibm.com/trigger/alert-pre-create\"]"
  },
...

Tip: Use CPFILES to copy files from pods

If you need to copy specific files from pods, you can use the CPFILES option.

For example, the following command uses the MustGather tool to pull the /conf/redis/redis.conf file from all pods that match regex expression ^c-example-redis-m-. This example uses the following parameters:

  • NS4PROD= is a special keyword that instructs the tool to auto-resolve the namespace for product aimanager.
  • keyword and regex (avoid hardcoding) are used to allow the configuration to be used across different environments without the need to reconfigure.

Example command:

[file cpfiles/cpfiles-waiops-320.cfg in the package]
test##NS4PROD=aimanager##^c-example-redis-m-####/conf/redis/redis.conf

Example output:

# pwd
/tmp/waiops-cpfiles-01042022-173635/8-CPFILES/waiops32
# ls -l c-example-redis-m-0/
total 56
-rw-r--r-- 1 root root 51281 Apr  1 17:36 '^conf^redis^redis.conf'
-rw-r--r-- 1 root root    78 Apr  1 17:36 '^conf^redis^redis.conf.ls'
# ls -l c-example-redis-m-1
total 56
-rw-r--r-- 1 root root 51306 Apr  1 17:36 '^conf^redis^redis.conf'
-rw-r--r-- 1 root root    78 Apr  1 17:36 '^conf^redis^redis.conf.ls'
# ls -l c-example-redis-m-2
total 56
-rw-r--r-- 1 root root 51306 Apr  1 17:36 '^conf^redis^redis.conf'
-rw-r--r-- 1 root root    78 Apr  1 17:36 '^conf^redis^redis.conf.ls'

Tip: Run custom scripts with the MustGather tool

You can run your own script with the MustGather Tool, such as for completing corrective actions before the tool collects data. After completing the custom script actions, the tool then proceeds with the data collection.

Custom script variable

The following tables list some of the variables that you can include within a custom script:

Custom script variables
Variable Description
$OC_CMD The system command used to collect data. You can use either an oc or kubectl command.
$OC_WHOAMI The current user.
$AWK The AWK used in the tool, which can be either awk or gawk (preferred).
$OLDIFS The original IFS.
$PRODUCT_VERSION The product version detected during runtime (on site).
$MUSTGATHER_DIRNAME The directory name where the MustGather tool is installed.
$OUTDIR The directory where the output data is written.
$PROD_NAMESPACES_REFFILE The product-to-namespace mapping reference file (Do not change).
$PROD_NAMESPACES_UNIQUE_REFFILE The unique product namespaces reference file (Do not change).
$PROD_NAMESPACES_INVALID_STR A constant string value that represents the failed getNamespaceToProdMapping() function.
$CMD_EXEC_OUTPUT_DIR The directory where the output of CMDEXEC is written to.
$CMD_EXEC_PRODUCT_NSFILE A copy of the original product namespaces reference file that needs to be used by the inherited getNamespaceToProdMapping() function.
$CMD_EXEC_RESULT_MCCFG_FILE A file that the custom script can write to for running the MANUALCOLLECT secondary mode.
$CMD_EXEC_RESULT_EXTRANS_FILE A file that the custom script can write to for running the EXTRA (namespace) secondary mode.
$CMD_EXEC_RESULT_CPFILES_FILE A file that the custom script can write to for running the CPFILES secondary mode.

Custom script functions

The following tables list some of the functions that you can include within a custom script:

Custom script functions
Functions Description
getProdToNamespacesMapping() A function to retrieve product namespace through provided product name.
Syntax: getProdToNamespacesMapping <product-name> $CMD_EXEC_PRODUCT_NSFILE
Returns:
- Product namespaces of the given product name or PROD_NAMESPACES_INVALID_STR.
- Return status 0 for successful execution and 1 for a failed execution.

Product namespaces:
- aimanager = IBM Cloud Pak for AIOps
- noihybrid = IBM Netcool Operations Insight (hybrid)
- ics = IBM Cloud Pak foundational services
getNamespaceToProdMapping() A function to retrieve product name through provided product namespace.
Syntax: getNamespaceToProdMapping <product-namespace> $CMD_EXEC_PRODUCT_NSFILE
Returns: Product name of the given product namespace.
getResourceInstances() A function to retrieve a list of resource instances.
Syntax: getResourceInstances <resource type> <resource name regex> <resource namespace>
Where <resource namespace> is optional and defaults to –all-namespaces.
Returns:
- A list of resources that match the provided details or a blank string if the command fails or no resource is detected.
- Return status 0 for successful execution and 1 for a failed execution.
getUniqueProdNamespaces() A function to retrieve a list of unique product namespaces.
Syntax: getUniqueProdNamespaces $CMD_EXEC_PRODUCT_NSFILE
Returns: A list of unique product namespaces or blank string if failed.
checkUserPriv() A function to check whether the current user has privilege to complete specific actions (verb) on a specific resource in a specific namespace.
Syntax checkUserPriv <verb> <resource type> <resource namespace>
Returns: 0 to indicate the user has the required privileges and 1 to indicate that user does not have the required privilege.
<verb> reference: Kubernetes API Concepts.
isPodOk() A function to determine whether a pod is in RUNNING state and all its containers are up.
Syntax: isPodOk <namespace> <podname>
Returns: 0 for an OK pod and 1 for a pod that is not OK.
isPodLineOk() A function to determine whether a pod is in RUNNING state and all its containers are up.
Syntax: isPodLineOk <’oc get –no-headers’ output of the pod>
Returns: 0 for an OK pod and 1 for an pod that is not OK.

Example Command

waiops-mustgather.sh -R -C /tmp/elite.sh##/tmp/env.in -Y

The previous command uses an environment variable file /tmp/env.in to pass in variables into the script.

POD=teams
CMD=date > /tmp/date.out

Example script

The following code shows an example custom script:

NOI_NAMESPACE=$(getProdToNamespacesMapping noi $CMD_EXEC_PRODUCT_NSFILE)

if [ ${#NOI_NAMESPACE} -gt 0 ]
then
  echo "[$(date)] Detected namespace = [$NOI_NAMESPACE]"

  IMPACTGUI_POD=$(getResourceInstances pod impactgui $NOI_NAMESPACE)

  if [ ${#IMPACTGUI_POD} -gt 0 ]
  then
    echo "[$(date)] Detected IMPACT GUI pod = [$IMPACTGUI_POD]"
    oc delete pod -n $NOI_NAMESPACE $IMPACTGUI_POD

    WAITFORPOD=0

    while [ $WAITFORPOD -eq 0 ]
    do
      if isPodOk "$NOI_NAMESPACE" "$IMPACTGUI_POD"
      then
        OUTFILE=$($OC_CMD exec -n $NOI_NAMESPACE $IMPACTGUI_POD -- /opt/IBM/tivoli/impact/bin/nci_collect_logs | grep 'Netcool/Impact log file is' | $AWK '{ print $7 }')

        if [ ${#OUTFILE} -gt 0 ]
        then
          echo "[$(date)] Creating COLLECTLOG timestamp file..."
          echo "ALL##$NOI_NAMESPACE##pod##$(getResourceInstances pod impactgui $NOI_NAMESPACE)####touch /tmp/collectlog" > $CMD_EXEC_RESULT_MCCFG_FILE

          echo "[$(date)] Detected IMPACT GUI logfile = [$OUTFILE]"
          echo "$NOI_NAMESPACE##$IMPACTGUI_POD####$OUTFILE" > $CMD_EXEC_RESULT_CPFILES_FILE

          WAITFORPOD=1
        else
          echo "[$(date)] Unable to locate IMPACT GUI logfiles!"
          exit 1
        fi
      else
        echo "[$(date)] IMPACT GUI pod is not running!"
        sleep 5
      fi
    done
  else
    echo "[$(date)] Unable to locate IMPACT GUI pod!"
    exit 1
  fi
else
  echo "[$(date)] Unable to determine NOI namespace!"
  exit 1
fi

Example script functions

When you are creating your custom script, you can use the following functions:

  • The function getProdToNamespacesMapping can be used in any CMDEXEC script to obtain product namespaces:

    NS=$(getProdToNamespacesMapping aimanager $CMD_EXEC_PRODUCT_NSFILE)
    echo "NAMESPACE=$NS"
    
  • The Function getResourceInstances can be used in any CMDEXEC script to obtain resource instances based on a provided regex expression such to complete a corrective action:

    for MYPOD in $(getResourceInstances pod $POD $NS)
    do
    MYCMD="oc exec $MYPOD -n $NS -- bash -c '$CMD'"
    echo $MYCMD
    eval $MYCMD
    done
    
  • The MANUALCOLLECT mode can be used to collect data within the script, such as to ensure that the correction actions are successful:

    echo "ALL##NS4PROD=aimanager##pod##teams####" >> $CMD_EXEC_RESULT_MCCFG_FILE
    
  • The EXTRA mode can be used to collect namespace data within the script:

    echo "openshift-insights" > $CMD_EXEC_RESULT_EXTRANS_FILE
    
  • The CPFILES option can be used to copy files from pods to verify that your script ran successfully:

    echo "NS4PROD=aimanager##^iaf-system-kafka####/opt/kafka/config/log4j.properties" >> $CMD_EXEC_RESULT_CPFILES_FILE
    

Example script results

  • When the command date > /tmp/date.out runs in the pod:

    oc exec -it
    

    Output:

    aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp -- bash -c 'ls -l /tmp; cat /tmp/date.out'
    Defaulted container "chatops-teams-integrator" out of: chatops-teams-integrator, controller-is-ready (init)
    total 12
    -rw-r--r--. 1 default root  29 Apr  2 01:00 date.out
    -rwx------. 1 root    root 701 Sep 14  2021 ks-script-vzv5jj6c
    -rwx------. 1 root    root 291 Sep 14  2021 ks-script-y5xeen0d
    Sat Apr  2 01:00:25 UTC 2022
    
  • Through MANUALCOLLECT, the data of the pod that matched the regex teams is collected. You can verify this by running the following commands:

    1. Print the working directory:

      # pwd
      

      Output:

      /tmp/waiops-cmdexec-01042022-175950/4-MANUALCOLLECT/TAG=ALL/waiops32
      
    2. List the directory files:

      # ls -l
      

      Output:

      total 4
      -rw-r--r-- 1 root root 180 Apr  1 18:00 oc_pod-aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp.out
      drwxr-xr-x 2 root root 164 Apr  1 18:00 pod
      
    3. List the pods files:

      total 12
      -rw-r--r-- 1 root root 1551 Apr  1 18:00 aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp~chatops-teams-integrator.log
      -rw-r--r-- 1 root root 7315 Apr  1 18:00 aimanager-aio-chatops-teams-integrator-545bc7d4d-9w8fp.desc
      
  • Through the EXTRA mode, data about the namespace openshift-insights is collected. You can verify this by running the following commands:

    1. Print the working directory:

      # pwd
      

      Output:

      /tmp/waiops-cmdexec-01042022-175950/5-EXTRA/openshift-insights
      
    2. List the directory files:

      # ls -l
      

      Output:

      total 28
      drwxr-xr-x 2 root root  132 Apr  1 18:00 configmaps
      drwxr-xr-x 2 root root   36 Apr  1 18:00 deployment.apps
      -rw-r--r-- 1 root root 1738 Apr  1 18:00 oc_all.out
      -rw-r--r-- 1 root root  194 Apr  1 18:00 oc_configmaps.out
      -rw-r--r-- 1 root root  328 Apr  1 18:00 oc_pod_scc.out
      -rw-r--r-- 1 root root   52 Apr  1 18:00 oc_pvc.out
      -rw-r--r-- 1 root root 1427 Apr  1 18:00 oc_secrets.out
      -rw-r--r-- 1 root root  155 Apr  1 18:00 oc_serviceaccounts.out
      drwxr-xr-x 2 root root  117 Apr  1 18:00 pod
      drwxr-xr-x 2 root root   35 Apr  1 18:00 pvc
      drwxr-xr-x 2 root root   86 Apr  1 18:00 replicaset.apps
      drwxr-xr-x 2 root root 4096 Apr  1 18:00 secrets
      drwxr-xr-x 2 root root   26 Apr  1 18:00 service
      drwxr-xr-x 2 root root  107 Apr  1 18:00 serviceaccounts
      
  • Through CPFILES, the file /opt/kafka/config/log4j.properties is copied from the pods that matches the regex expression ^iaf-system-kafka. You can verify this by running the following commands:

    1. Print the working directory:

      # pwd
      

      Output:

      /tmp/waiops-cmdexec-01042022-175950/8-CPFILES/waiops32/iaf-system-kafka-0
      
    2. List the directory files:

      # ls -l
      

      Output:

      total 12
      -rw-r--r-- 1 root root 4674 Apr  1 18:00 '^opt^kafka^config^log4j.properties'
      -rw-r--r-- 1 root root   77 Apr  1 18:00 '^opt^kafka^config^log4j.properties.ls'
      
    3. View the first lines of the file to verify the file contents:

      # head ^opt^kafka^config^log4j.properties
      

      Output:

      # Licensed to the Apache Software Foundation (ASF) under one or more
      # contributor license agreements.  See the NOTICE file distributed with
      # this work for additional information regarding copyright ownership.
      # The ASF licenses this file to You under the Apache License, Version 2.0
      # (the "License"); you may not use this file except in compliance with
      # the License.  You may obtain a copy of the License at
      #
      #    http://www.apache.org/licenses/LICENSE-2.0
      #
      # Unless required by applicable law or agreed to in writing, software