Troubleshooting
Problem
The clstat command (/usr/es/sbin/cluster/clstat) is a helpful SNMP-based tool that you can use for cluster status monitoring in PowerHA SystemMirror for AIX. However, the command can easily fail if the related configurations have any problem, and the error message doesn't usually give useful clues. This technote is to provide the procedue to troubleshoot common clstat problems step by step.
Note: After finishing each step, you should wait for a couple minutes and try clstat again, and once it succeeds you might not need to go through the left steps.
Symptom
The common problems can be a returned error, continuous unstable output, and so on.
For example, the following error can often be seen:
# /usr/es/sbin/cluster/clstat
Failed retrieving cluster information.There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.Refer to the PowerHA SystemMirror Administration Guide for more information.
Cause
The common issues are with any failed or incorrectly configured component in PowerHA cluster and AIX operating system, which could be involved in the following topology:

Environment
All versions of PowerHA SystemMirror for AIX.
Resolving The Problem
► Step 1: Check whether the cluster service is running.
Run the following command to see whether the clstrmgr manager is at ST_STABLE state:
# lssrc -ls clstrmgrES | grep state Current state: ST_STABLE |
If not, start the cluster service.
► Step 2: Check whether snmpd and clinfoES are active.
2.1. Run the following command to see whether snmpd is active:
# lssrc -s snmpd Subsystem Group PID Status snmpd tcpip 4456602 active |
If not, run the following commands to start it:
# startsrc -s snmpd
2.2. Run the following command to see whether clinfoES is active:
# lssrc -s clinfoES Subsystem Group PID Status clinfoES cluster 17170492 active |
If not, run the following command to start it:
# startsrc -s clinfoES
► Step 3: Check IPv6 related configuration.
3.1. If IPv6 is not used, comment the IPv6 line in /usr/es/sbin/cluster/etc/clhosts files:
#::1 # PowerHA SystemMirror |
And then restart clinfoES:
# stopsrc -s clinfoES;sleep 2;startsrc -s clinfoES
3.2. If IPv6 is used, add the following line into /etc/snmpdv3.conf file:
COMMUNITY public public noAuthNoPriv :: 0 |
Note: if a different community (other than public) is used, substitute the name of that community for the word public.
And then restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 4: Check /etc/snmpdv3.conf file.
4.1. Add the following two lines into /etc/snmpdv3.conf if they don't exist:
VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included - smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password |
Note: 1.3.6.1.4.1.2.3.1.2.1.5 can be replaced by risc6000clsmuxpd.
4.2. Remove the following line from /etc/snmpdv3.conf if it exists (unless ATM network interface is configured):
smux 1.3.6.1.4.1.2.3.1.2.3.1.1 muxatmd_password |
4.3. Restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 5: Check whether port 199 is used by smux.
5.1. Run the following netstat command to see who is using port 199.
# netstat -Aan | grep 199 | grep LISTEN f1000e000f1a7bb8 tcp 0 0 *.199 *.* LISTEN |
5.2. Use the following rmsock command to find the owner of the socket:
# rmsock f1000e000f1a7bb8 tcpcb The socket 0xf1000e000f1a7808 is being held by proccess 17170506 (snmpdv3ne). |
Note: Use the first field in the netstat output (which is the memory address of the socket) as the first argument and use tcpcb as the second argument to the rmsock command.
The sock should be normally occupied by snmpdv3ne. If not, that means other process is using port 199, which will cause snmp problem, then need to check with the process owner and kill that process if possible.
5.3. After kill the other process that was using port 199, restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 6: Check whether snmpd is listening at the smux port and if the cluster manager is connected.
6.1. Run the following netstat command to list active sockets that use the smux port:
# netstat -Aa | grep smux f1000e0010a743b8 tcp 0 0 *.smux *.* LISTEN f1000e000f1a4bb8 tcp4 0 0 loopback.smux loopback.33941 ESTABLISHED f1000e000f1b23b8 tcp4 0 0 loopback.33941 loopback.smux ESTABLISHED |
6.2. Use the following rmsock commands to find the owners of the smux sockets in LISTEN and ESTABLISHED state:
# rmsock f1000e0010a743b8 tcpcb The socket 0xf1000e0010a74008 is being held by proccess 17170500 (snmpdv3ne). # rmsock f1000e000f1b23b8 tcpcb The socket 0xf1000e000f1b2008 is being held by proccess 11993168 (clstrmgr). |
Note: Use the first field in the netstat output (which is the memory address of the socket) as the first argument and use tcpcb as the second argument to the rmsock commands.
6.3. It is supposed to see a socket being held by snmpdv3ne in LISTEN state, and a socket being held by clstrmgr in ESTABLISHED state. If not, restart snmpd and refresh clstrmgrES:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd;refresh -s clstrmgrES
► Step 7: If the problem still happens, enable snmp debug level and check the debug log for clues.
7.1. Enable snmp debug level by modifying below line in /etc/snmpdv3.conf:
logging size=100000 level=0 |
To
logging size=5000000 level=4 |
7.2. Restart snmpd to make it effective:
# stopsrc -s snmpd;sleep 2;startsrc -s snmpd
7.3. The debug log is usually stored in /usr/tmp/snmpdv3.log (the log location can be modified in /etc/snmpdv3.conf). The debug log will also be collected when you collect snap -e or clsnap -L data. You can check the log for clues, or further contact IBM support for help.
Team: PowerHA SystemMirror & AIX CommApps
Feedback: aix_feedback@wwpdl.vnet.ibm.com
Feedback: aix_feedback@wwpdl.vnet.ibm.com
Related Information
Document Location
Worldwide
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSPHQG","label":"PowerHA SystemMirror"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"CommApps","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Was this topic helpful?
Document Information
Modified date:
19 May 2019
UID
ibm10884716