IBM Support

Collecting communications trace data for IBM i PowerHA Cluster partition issues

Troubleshooting


Problem

In an IBM i cluster environment, a node can transition to a status of Partition. It is difficult to capture the cause of this by only looking at cluster logging data. A TRCCNN is needed to attempt to capture the communications issue.

Resolving The Problem

To gather the TRCCNN data for a node transitioning to Partition state, the following will need to be done:
  1. Review the active or prior QCSTCTL joblogs on the nodes in the cluster to identify which messages are being posted. The messages that are typically seen are CPFBB20 and CPFBB22.
  2. Assuming CPFBB20 and/or CPFBB22 are the messages being seen in the QCSTCTL joblogs on the nodes in the cluster, with all nodes currently in an Active state in the cluster, issue the following command on all nodes to start the traces:
    TRCCNN SET(*ON) TRCTYPE(*IP) TRCTBL(COMMISSUE) SIZE(2000 *MB) WCHMSG((CPFBB20) (CPFBB22)) WCHMSGQ((*JOBLOG)) WCHJOB((QSYS/QCSTCTL))
  3. Allow this trace to continue running. It will stop once a CPFBB20 or CPFBB22 is received in the QCSTCTL joblog. One of these messages should appear in the joblog of all nodes at around the same time, which should stop the TRCCNN tracing.
  4. To find the resulting spooled file on each node, you should identify the time of the CPFBB20 or CPFBB22 in the QCSTCTL joblog, and review the history log at that time.  In the history log, a message like the following will be seen:
    Watch session QSCCNN0052 ended. Reason code: X'02'
    Job 802998/QUSER/QSCWCHPS ended on 01/19/24 at 12:26:00; .027 seconds used;
  5. Perform a WRKJOB for the above mentioned job and display it's joblog. It will reference a message like this:
    CPF3485    Completion              00   01/19/24  12:25:59.891696  QSPCPYF      QSYS        076C     QSCCNN      QSYS
                                         From user . . . . . . . . . :   USER
                                         Message . . . . :   1352 records copied to file QASCTRCC in QTEMP.
                                         Cause . . . . . :   1352 records were copied from spooled file QPCSMPRT number
                                           1 job 806929/USER/QPRTJOB created on system XXXXXXX on 01/19/24 12:25:59
                                           to physical file QASCTRCC in QTEMP member QASCTRCC.
  6. Look at spooled files for the QPRTJOB mentioned in that CPF3485 message. (in the above example, it's 806929/USER/QPRTJOB)
  7. Upload the spooled files you found from each node to the support case
  8. As soon as possible after the event, gather and upload a SYSSNAP from each node in the cluster to the support case:
    a) Ensure QMGTOOLS is installed and up to date:
    https://www.ibm.com/support/pages/mustgather-how-obtain-and-install-qmgtools-and-keep-it-current
    b) Collect a system snapshot and upload the resulting .zip file to this case (this may take several minutes to finish):
    QMGTOOLS/SYSSNAP OUTPUT(*IFS) LICLOGS(Y) COLHADB2(Y)

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000000CM2AAM","label":"Communications-\u003ETRCCNN Utility"},{"code":"a8m3p000000F8x5AAC","label":"High Availability-\u003ECluster"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions"}]

Document Information

Modified date:
03 April 2024

UID

ibm17145798