A fix is available
APAR status
Closed as program error.
Error description
************************************************************** * USERS AFFECTED: * Systems running the AIX 6100-09 Technology Level * with bos.cluster.rte below the 6.1.9.101 level. ************************************************************** * PROBLEM DESCRIPTION: * After reboot of one node, the CAA cluster state * may be inconsistent in a cluster using multicast * communication mode, if there is an issue with * multicast communication, but unicast communication * is working. * 'lscluster -m' of node1: * ------------------------ * Calling node query for all nodes... * Node query number of nodes examined: 2 * * Node name: node1 * Cluster shorthand id for node: 1 * ... * State of node: UP NODE_LOCAL * ... * Node name: node2 * Cluster shorthand id for node: 2 * ... * State of node: DOWN * ... * 'lscluster -m' of node2: * ------------------------ * Calling node query for all nodes... * Node query number of nodes examined: 2 * * Node name: node1 * Cluster shorthand id for node: 1 * ... * State of node: UP * ... * Node name: node2 * Cluster shorthand id for node: 2 * ... * State of node: UP NODE_LOCAL * ... * In the above example node2 was the last node, which * has been rebooted. * syslog.caa of node1 looks like: * ------------------------------- * ... * <timestamp> node1 caa:info unix: kcluster_lock.c * count_active_nodes 200 num_nodes_active 2 * *up_node_cnt 1 db_node_cnt 1 * <timestamp> node1 caa:err|error unix: * kcluster_clusterwide.c * kcluster_clusterwide 841 clusterwide query * node timeout: cmd = 0x20, from node id = 2 * ... * <timestamp> node1 caa:err|error unix: * kcluster_clusterwide.c * kcluster_clusterwide 841 clusterwide query * node timeout: cmd = 0x20, from node id = 2 * ... * syslog.caa of node2 looks like: * ------------------------------- * ... * <timestamp> node2 caa:info unix: kcluster_syscalls.c * _xcluster_create 2614 * Clusterwide locking services are starting. * ... * <timestamp> node2 caa:info unix: kcluster_lock.c * count_active_nodes 200 num_nodes_active 2 * *up_node_cnt 0 db_node_cnt 1 * <timestamp> node2 caa:info unix: kcluster_lock.c * wait_on_node_bringup 255 All nodes are active. * ... * <timestamp> node2 caa:info unix: kcluster_lock.c * count_active_nodes 200 num_nodes_active 2 * *up_node_cnt 0 db_node_cnt 1 * <timestamp> node2 caa:info unix: kcluster_lock.c * xcluster_lock 607 xcluster_lock: lock * 2 acquired, num_nodes_active: 2 * <timestamp> node2 caa:info unix: kcluster_lock.c * xcluster_lock 608 xcluster_lock: nodes * which responded: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * ... * <timestamp> node2 caa:info clusterÝ2490836¨: caa_config.c * cl_th_sock 5317 258 Node node1 * is DOWN, and we are not trying to JOIN it or STOP it. * Skipping. * ... ************************************************************** * RECOMMENDATION: * Install APAR IV82627. **************************************************************
Local fix
Use unicast communication mode.
Problem summary
************************************************************** * USERS AFFECTED: * Systems running the AIX 6100-09 Technology Level * with bos.cluster.rte below the 6.1.9.101 level. ************************************************************** * PROBLEM DESCRIPTION: * After reboot of one node, the CAA cluster state * may be inconsistent in a cluster using multicast * communication mode, if there is an issue with * multicast communication, but unicast communication * is working. * 'lscluster -m' of node1: * ------------------------ * Calling node query for all nodes... * Node query number of nodes examined: 2 * * Node name: node1 * Cluster shorthand id for node: 1 * ... * State of node: UP NODE_LOCAL * ... * Node name: node2 * Cluster shorthand id for node: 2
Problem conclusion
If it is known that a certain number of nodes is heartbeating to the repository, do not attempt to acquire clusterwide locks until the number of nodes gossiping is equal to it.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
IV82627
Reported component name
AIX 610 STD EDI
Reported component ID
5765G6200
Reported release
610
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Submitted date
2016-03-11
Closed date
2016-03-11
Last modified date
2016-11-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
AIX 610 STD EDI
Fixed component ID
5765G6200
Applicable component levels
R610 PSY U868845
UP16/10/25 I 1000
PTF to Fileset Mapping
U868845 bos.cluster.rte 6.1.9.200
U869156 bos.cluster.rte 6.1.9.101
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Document Information
Modified date:
17 December 2021