IBM Support

PowerHA: How To Use QMGTOOLS MustGather Data Capture Tool (for High Availability Only)

Troubleshooting


Problem

This document demonstrates how to use QMGTOOLS properly for High Availability.

Resolving The Problem

This document demonstrates how to use QMGTOOLS to properly collect data for High Availability.
*****************************************************************************************************************************************************

READ THIS FIRST

Important Note: The MustGather Data Capture tool is provided as-is. The iGSC HAS team assists users having difficulty that uses the tool for HA data capture. It is recommended to keep current on build versions as they become available!  Use QMGTOOLS/CMPVER command, or GO MG menu option 13 to check for a current build.  Also, use a profile such as QSECOFR when collecting the data to avoid conflicts by lack of authority.


The MustGather Data Capture tool was created by IBM Support to help automatically collect the many pieces of debug data that is critical to fixing complexed High Availability problems more quickly. For instructions on how to obtain and install QMGTOOLS, refer to the document  MustGather: How To Obtain and Install QMGTOOLS and keep it current

What does the tool collect? It collects close to everything needed to help determine root cause. There are a few exceptions; however, the tool evolves and items are added as needed. When there are pieces that cannot be collected by the tool automatically, Support will ask for those specific pieces outside of the data collected by the tool.

Important Note 1: Currently, there are no special requirements to use this tool. There are no PTFs required for HA data collection.

Important Note 2: If QMGTOOLS is not currently installed with the latest build, the following screen captures may not be accurate and therefore the latest QMGTOOLS should be installed before going any further.  See instructions above on how to obtain QMGTOOLS.

****************************************************************************************************************************************************

With the QMGTOOLS library successfully restored, now the data capture can begin. There are some things to make note of.  If the user profile being used to collect data resides on all nodes in the environment with the same password and has access to all objects, data collection will work much better.  Also, if this user profile is able to FTP between all nodes in the environment, the data collection tool only needs to be installed on a single node and is able to capture data from all nodes in the environment.  If the user profile is unable to FTP between all nodes, the tool must be installed on each system individually and data collection needs to run individually.  It is recommended to use a user profile with high authority to avoid problems during data collection with regard to not being authorized.  Using QSECOFR is desirable.  Also ensure that QSECOFR is in good standing with being able to access Start Service Tools (SST), and not expired or disabled.
***Preferred Method*** Collecting data for a single node:
There is an easy shortcut that can be used without going through menu options by using the following command on each node in a cluster.
QMGTOOLS/SYSSNAP OUTPUT(*IFS) COLLECTDFT(Y) LICLOGS(Y) PALS(Y) QHST(Y/Y) SRVDOCS(Y) COLHADB2(Y) DAYSPRV(2)                                              
*Note - This command is assuming that data is being collected shortly after an incident, the parameter for "DAYSPRV" can be adjusted if more days data is needed.
If using the QMGTOOLS menu to collect for a single node:
To start using the tool in all cases, issue the ADDLIBLE LIB(QMGTOOLS) command, followed by GO MG. This will bring the user to the main MustGather Data Capture menu.
1.  ADDLIBLE LIB(QMGTOOLS)
ADDLIBLE LIB(QMGTOOLS)
2.  GO MG
GO MG
MustGather Data Capture menu
MustGather Data Capture Menu
3.  From the main MG menu, select option 2 -HA (High Availability) data collection
Menu option 2 for High Availability
4.  Select option 3 - Dump cluster data on local node only
Menu option 3 to dump local node only
5.  On the DMPCLUINF screen, leave the defaults.  Support will instruct if the system snapshot is not needed.
DMPCLUINF screen
Data collection starts as soon as the user presses Enter, because it is a local data capture only. There is no need for user profiles and passwords to be entered. Notice the status indicator at the bottom of the screen is telling us that collection is "Initializing." This text field will change as the collection is being performed.
Status indicator of collection at bottom of DMPCLUINF screen
Once data is collected, a message like the one below will be seen:
Screen showing zip file created and it's location
Pressing Enter will take the user back to the collection menu and posts the completion message at the bottom of the screen. The .zip file will reside in the directory /tmp/HA_DATA as specified in the DMPCLUINF screen previously.
Back at menu with zip location
***Not Preferred Method***Collecting data for all nodes:
*Note -  Only use this method if specifically asked by support or development.
*Note - There is not a single command option available to perform this task, so it has to be performed from the menu.
*Note - Do to this option ending up with multiple zip files inside of a single zip file, it is not preferred.
Similar to the single node capture, start out first by adding the QMGTOOLS into the library list using the ADDLIBLE command and then enter into the QMGTOOLS menu by entering GO MG.
1.  Select option 2 - HA (High Availability) data collection
Menu option 2 for High Availability
2.  Select option 1 - Collect and retrieve cluster data from multiple nodes
Menu option 1 to collect for all nodes
3.  On the DMPCLU screen, leave the defaults.  The option to 'Dump cluster trace only' defaults to N and Support will advise when it should be changed to Y. Collect system snapshot defaults to Y.  Again, Support can advise when to change the default values.
DMPCLU screen
The following screen provides the option of entering the user profile and password for the different nodes in the cluster; however, if a common profile exists, pressing F6 for Options allows a single user and password to be used for access to all nodes for the data collection. Once the user profile, password, and password confirmation is populated, press F1 followed by F1 again, and the data collection will begin.
4.  Press F6
Press F6 for more options
5.  Type Y and add a User ID and password. Confirm the password and press F1:
*Note - The default collection uses standard FTP, there are now options to support SSL FTP
Advanced options
6.  Press F1 again to start the data collection:
Pressing F1 a second time starts collection
Screen will show collection starting:
Collection begins with submitting jobs
Once submitted the screen will look similar to the screen capture below, note that the status is blank and there is a F5 to Refresh:
F5 to Refresh
Notice the collection for the nodes is on step 1 of 15. Pressing F5 periodically will progressively update the status.
F5 to refresh showing collection status changes
7.  Once the data collection is done, pressing F5 will show a status of Done and update will show at the bottom of the screen that gives the location of the data (/tmp/HA_DATA). Press F1 to finish up and compress the data into a .zip file.
Press F1 to finish up
Upon completion of the data collection, all data will be wrapped up into a zip file in the /tmp/HA_DATA directory. The .zip file will be called /tmp/CLUDOCS.zip. It will look similar to the following screen capture:
Screen showing data saved and location
8.  Pressing Enter, the user is brought back to collection menu, and message at the bottom shows the name of the .zip file and which directory it resides in:
Screen showing zip file created and it's location
After all of the data has been collected, a Case needs to be opened and the data can be uploaded IBM.
Additional Information

There are additional options listed for the HA data capture:

 
5. Cluster Debug Tool (Internal IBM only) This is to be used by Support representatives only and requires additional licensing which most users will not have on their systems.
7. Dump SST macros XSM/Cluster This is typically used by Support when only certain AA macros are needed.
8. Collect SBG (Solution Based GUI) data This is typically only needed after general HA data has been collected and IBM Support has requested this collection.
9. Node Status Trap

This is used to start a LIC trace while nodes are active and this monitors for when a node becomes partition and will capture the LIC trace for analysis. Needs to be started on each independent node. Refer to the below link for information:

QMGTOOLS: Node Status Trap (NODESTSTRP) In QMGTOOLS

11. Additional XSM tools This is used by Support only.
12. Compare HA PTFs from IBM public FTP site. If the user's system can connect to public FTP site, this option will compare all PTFs on user's system to most currently known and complete Recommended Fixes PTF listing and then will provide a report back to the user with the missing PTFs, along with the complete listing.
13. QHASTOOLS help menu This help menu
14. Collect admin domain debug data

Refer to this link for additional information:

QMGTOOLS: Administrative Domain Debug Data


 

        [{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.1.0"}]

        Historical Number

        670936263

        Document Information

        Modified date:
        24 January 2023

        UID

        nas8N1010374