IBM Support

Troubleshooting clstat command issues

Troubleshooting


Problem

The clstat command (/usr/es/sbin/cluster/clstat) is a helpful SNMP-based tool that you can use for cluster status monitoring in PowerHA SystemMirror for AIX. However, the command can easily fail if the related configurations have any problem, and the error message doesn't usually give useful clues. This technote is to provide the procedue to troubleshoot common clstat problems step by step.
Note: After finishing each step, you should wait for a couple minutes and try clstat again, and once it succeeds you might not need to go through the left steps.

Symptom

The common problems can be a returned error, continuous unstable output, and so on.
For example, the following error can often be seen:
# /usr/es/sbin/cluster/clstat
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.
Refer to the PowerHA SystemMirror Administration Guide for more information.

Cause

The common issues are with any failed or incorrectly configured component in PowerHA cluster and AIX operating system, which could be involved in the following topology:
The topology of clstat-related components

Environment

All versions of PowerHA SystemMirror for AIX.

Resolving The Problem

► Step 1: Check whether the cluster service is running.
Run the following command to see whether the clstrmgr manager is at ST_STABLE state:
# lssrc -ls clstrmgrES | grep state
Current state: ST_STABLE
If not, start the cluster service.
► Step 2: Check whether snmpd and clinfoES are active.
2.1. Run the following command to see whether snmpd is active:
# lssrc -s snmpd
Subsystem         Group            PID          Status
 snmpd            tcpip            4456602      active
If not, run the following commands to start it:
# startsrc -s snmpd
2.2. Run the following command to see whether clinfoES is active:
# lssrc -s clinfoES
Subsystem         Group            PID          Status
 clinfoES         cluster          17170492     active
If not, run the following command to start it:
# startsrc -s clinfoES
► Step 3: Check IPv6 related configuration.
3.1. If IPv6 is not used, comment the IPv6 line in /usr/es/sbin/cluster/etc/clhosts files:
#::1     # PowerHA SystemMirror
And then restart clinfoES:
# stopsrc -s clinfoES;sleep 2;startsrc -s clinfoES
3.2. If IPv6 is used, add the following line into /etc/snmpdv3.conf file:
COMMUNITY public    public     noAuthNoPriv ::  0    
Note: if a different community (other than public) is used, substitute the name of that community for the word public.
And then restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 4: Check /etc/snmpdv3.conf file.
4.1. Add the following two lines into /etc/snmpdv3.conf if they don't exist:
VACM_VIEW       defaultView     1.3.6.1.4.1.2.3.1.2.1.5 - included -
smux     1.3.6.1.4.1.2.3.1.2.1.5      clsmuxpd_password
Note: 1.3.6.1.4.1.2.3.1.2.1.5 can be replaced by risc6000clsmuxpd.

4.2. Remove the following line from /etc/snmpdv3.conf if it exists (unless ATM network interface is configured):
smux      1.3.6.1.4.1.2.3.1.2.3.1.1    muxatmd_password

4.3. Restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 5: Check whether port 199 is used by smux.

5.1. Run the following netstat command to see who is using port 199.
# netstat -Aan | grep 199 | grep LISTEN
f1000e000f1a7bb8 tcp        0      0  *.199                 *.*                   LISTEN

5.2. Use the following rmsock command to find the owner of the socket:
# rmsock f1000e000f1a7bb8 tcpcb
The socket 0xf1000e000f1a7808 is being held by proccess 17170506 (snmpdv3ne).
Note: Use the first field in the netstat output (which is the memory address of the socket) as the first argument and use tcpcb as the second argument to the rmsock command.
The sock should be normally occupied by snmpdv3ne. If not, that means other process is using port 199, which will cause snmp problem, then need to check with the process owner and kill that process if possible.
5.3. After kill the other process that was using port 199, restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 6: Check whether snmpd is listening at the smux port and if the cluster manager is connected.
6.1. Run the following netstat command to list active sockets that use the smux port:
# netstat -Aa | grep smux
f1000e0010a743b8 tcp        0      0  *.smux                *.*                   LISTEN
f1000e000f1a4bb8 tcp4       0      0  loopback.smux         loopback.33941        ESTABLISHED
f1000e000f1b23b8 tcp4       0      0  loopback.33941        loopback.smux         ESTABLISHED
6.2. Use the following rmsock commands to find the owners of the smux sockets in LISTEN and ESTABLISHED state:
# rmsock f1000e0010a743b8 tcpcb
The socket 0xf1000e0010a74008 is being held by proccess 17170500 (snmpdv3ne).
# rmsock f1000e000f1b23b8 tcpcb
The socket 0xf1000e000f1b2008 is being held by proccess 11993168 (clstrmgr).
Note: Use the first field in the netstat output (which is the memory address of the socket) as the first argument and use tcpcb as the second argument to the rmsock commands.
6.3. It is supposed to see a socket being held by snmpdv3ne in LISTEN state, and a socket being held by clstrmgr in ESTABLISHED state. If not, restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
 
► Step 7: If the problem still happens, enable snmp debug level and check the debug log for clues.

7.1. Enable snmp debug level by modifying below line in /etc/snmpdv3.conf:
logging        size=100000                    level=0
To
logging        size=5000000                   level=4
7.2. Restart snmpd to make it effective:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd

7.3. The debug log is usually stored in /usr/tmp/snmpdv3.log (the log location can be modified in /etc/snmpdv3.conf). The debug log will also be collected when you collect snap -e or clsnap -L data. You can check the log for clues, or further contact IBM support for help.
Team: PowerHA SystemMirror & AIX CommApps
Feedback:
aix_feedback@wwpdl.vnet.ibm.com

Document Location

Worldwide

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSPHQG","label":"PowerHA SystemMirror"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"CommApps","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
19 May 2019

UID

ibm10884716