Troubleshooting clstat command issues

Troubleshooting

Problem

The clstat command (/usr/es/sbin/cluster/clstat) is a helpful SNMP-based tool that you can use for cluster status monitoring in PowerHA SystemMirror for AIX. However, the command can easily fail if the related configurations have any problem, and the error message doesn't usually give useful clues. This technote is to provide the procedue to troubleshoot common clstat problems step by step.

Note: After finishing each step, you should wait for a couple minutes and try clstat again, and once it succeeds you might not need to go through the left steps.

Symptom

The common problems can be a returned error, continuous unstable output, and so on.

For example, the following error can often be seen:

# /usr/es/sbin/cluster/clstat
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the PowerHA SystemMirror Administration Guide for more information.

Cause

The common issues are with any failed or incorrectly configured component in PowerHA cluster and AIX operating system, which could be involved in the following topology:

The topology of clstat-related components

Environment

All versions of PowerHA SystemMirror for AIX.

Resolving The Problem

► Step 1: Check whether the cluster service is running.

Run the following command to see whether the clstrmgr manager is at ST_STABLE state:

# lssrc -ls clstrmgrES \| grep state Current state: ST_STABLE

If not, start the cluster service.

► Step 2: Check whether snmpd and clinfoES are active.

2.1. Run the following command to see whether snmpd is active:

# lssrc -s snmpd Subsystem Group PID Status snmpd tcpip 4456602 active

If not, run the following commands to start it:

# startsrc -s snmpd

2.2. Run the following command to see whether clinfoES is active:

# lssrc -s clinfoES Subsystem Group PID Status clinfoES cluster 17170492 active

If not, run the following command to start it:

# startsrc -s clinfoES

► Step 3: Check IPv6 related configuration.

3.1. If IPv6 is not used, comment the IPv6 line in /usr/es/sbin/cluster/etc/clhosts files:

#::1 # PowerHA SystemMirror

And then restart clinfoES:

# stopsrc -s clinfoES;sleep 2;startsrc -s clinfoES

3.2. If IPv6 is used, add the following line into /etc/snmpdv3.conf file:

COMMUNITY public public noAuthNoPriv :: 0

Note: if a different community (other than public) is used, substitute the name of that community for the word public.

And then restart snmpd and refresh clstrmgrES:

# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES

► Step 4: Check /etc/snmpdv3.conf file.

4.1. Add the following two lines into /etc/snmpdv3.conf if they don't exist:

VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included - smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password

Note: 1.3.6.1.4.1.2.3.1.2.1.5 can be replaced by risc6000clsmuxpd.

4.2. Remove the following line from /etc/snmpdv3.conf if it exists (unless ATM network interface is configured):

smux 1.3.6.1.4.1.2.3.1.2.3.1.1 muxatmd_password

4.3. Restart snmpd and refresh clstrmgrES:

# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES

► Step 5: Check whether port 199 is used by smux.

5.1. Run the following netstat command to see who is using port 199.

# netstat -Aan \| grep 199 \| grep LISTEN f1000e000f1a7bb8 tcp 0 0 .199 .* LISTEN

5.2. Use the following rmsock command to find the owner of the socket:

# rmsock f1000e000f1a7bb8 tcpcb The socket 0xf1000e000f1a7808 is being held by proccess 17170506 (snmpdv3ne).

Note: Use the first field in the netstat output (which is the memory address of the socket) as the first argument and use tcpcb as the second argument to the rmsock command.

The sock should be normally occupied by snmpdv3ne. If not, that means other process is using port 199, which will cause snmp problem, then need to check with the process owner and kill that process if possible.

5.3. After kill the other process that was using port 199, restart snmpd and refresh clstrmgrES:

# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES

► Step 6: Check whether snmpd is listening at the smux port and if the cluster manager is connected.

6.1. Run the following netstat command to list active sockets that use the smux port:

# netstat -Aa \| grep smux f1000e0010a743b8 tcp 0 0 .smux .* LISTEN f1000e000f1a4bb8 tcp4 0 0 loopback.smux loopback.33941 ESTABLISHED f1000e000f1b23b8 tcp4 0 0 loopback.33941 loopback.smux ESTABLISHED

# netstat -Aa | grep smux
f1000e0010a743b8 tcp        0      0  *.smux                *.*                   LISTEN
f1000e000f1a4bb8 tcp4       0      0  loopback.smux         loopback.33941        ESTABLISHED
f1000e000f1b23b8 tcp4       0      0  loopback.33941        loopback.smux         ESTABLISHED

6.2. Use the following rmsock commands to find the owners of the smux sockets in LISTEN and ESTABLISHED state:

# rmsock f1000e0010a743b8 tcpcb The socket 0xf1000e0010a74008 is being held by proccess 17170500 (snmpdv3ne). # rmsock f1000e000f1b23b8 tcpcb The socket 0xf1000e000f1b2008 is being held by proccess 11993168 (clstrmgr).

# rmsock f1000e0010a743b8 tcpcb
The socket 0xf1000e0010a74008 is being held by proccess 17170500 (snmpdv3ne).
# rmsock f1000e000f1b23b8 tcpcb
The socket 0xf1000e000f1b2008 is being held by proccess 11993168 (clstrmgr).

Note: Use the first field in the netstat output (which is the memory address of the socket) as the first argument and use tcpcb as the second argument to the rmsock commands.

6.3. It is supposed to see a socket being held by snmpdv3ne in LISTEN state, and a socket being held by clstrmgr in ESTABLISHED state. If not, restart snmpd and refresh clstrmgrES:

# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES

► Step 7: If the problem still happens, enable snmp debug level and check the debug log for clues.

7.1. Enable snmp debug level by modifying below line in /etc/snmpdv3.conf:

logging size=100000 level=0

logging size=5000000 level=4

7.2. Restart snmpd to make it effective:

# stopsrc -s snmpd;sleep 2;startsrc -s snmpd

7.3. The debug log is usually stored in /usr/tmp/snmpdv3.log (the log location can be modified in /etc/snmpdv3.conf). The debug log will also be collected when you collect snap -e or clsnap -L data. You can check the log for clues, or further contact IBM support for help.

Team: PowerHA SystemMirror & AIX CommApps
Feedback: aix_feedback@wwpdl.vnet.ibm.com

Related Information

IBM Knowledge Center: Monitoring clusters with clstat

IBM Knowledge Center: Troubleshooting SNMP-based status commands

PowerHA SystemMirror cluster data collection

Open a support case through IBM Support Community

Open a support case through telephone support

Document Location

Worldwide

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSPHQG","label":"PowerHA SystemMirror"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"CommApps","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Tips

Troubleshooting clstat command issues