Troubleshooting
Problem
JGroup cluster isn't load balancing or node will not rejoin cluster when using TCP mode for cluster communications
Symptom
Node will not process BP's after shutdown/startup. Entire cluster must be restarted in order to get node to successfully join cluster again. This occurs if using TCP as opposed to UDP for cluster communications
GIS cluster is set to run on JGroups communications (node to node) rather than IP multicasting but the nodes don't seem to be load balancing.
jgroups_cluster.properties file configuration is key for node to node communications to work properly and allow the nodes to distribute load.
Distribution Threshold is set to 2% but the load is not being shared. The node in question does show as active. queueWatcher reveals that over 1500 BPs are in queue 4 on Node 2 and Node 1 at that same time has nothing in queue 4 waiting.
queueWatcher results:
Cluster Node Information for:
node2
NodeInfoNotificationBus toString()
Sent:0
NotSent:0: Received:0
ClusterID:00000001- 54- 30 -76 5-8 78 58 5a
46 32 46 41 32 53 32 50 72 t0vxxxzf2fa2s2pr 00000011 79 58 54 4c 68 58 34 2b 32
5a 45 3d 0d 0a yxtlhx4 2ze.
NodeName:node2
listenerPort:9056
VMID:node2:9056
addr:gotsth91/172.26.36.114
suspect:false
BPExec:true
nodeRole:
load 0:2147483647
load 1:0
load 2:0
load 3:0
load
4:1580
load 5:0
load 6:0
load 7:0
load 8:0
load 9:0
-NodeInfo
Array-
Cluster Node Information for: node1
NodeInfoNotificationBus toString()
Sent:0 NotSent:0:
Received:0
ClusterID:00000001- 54- 30 -76 5-8 78 58 5a 46 32 46
41 32 53 32 50 72 t0vxxxzf2fa2s2pr 00000011 79 58 54 4c 68 58 34 2b 32 5a 45 3d
0d 0a yxtlhx4 2ze.
NodeName:node1
listenerPort:9056
VMID:node1:9056
addr:gotsth90/172.26.44.90
suspect:false
BPExec:true
nodeRole:
load 0:2147483647
load 1:0
load 2:0
load 3:0
load
4:0
load 5:0
load 6:0
load 7:0
load 8:0
load 9:0
-NodeInfo Array-
In Noapp.log. Repeated over and over:
[2008-09-25 13:01:24.356] ALL
000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to complete
[2008-09-25 13:01:24.666] ALL 000000000000 GLOBAL_SCOPE Send waiting for
cluster configuration to complete
[2008-09-25 13:01:24.976] ALL
000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to complete
[2008-09-25 13:01:25.286] ALL 000000000000 GLOBAL_SCOPE Send waiting for
cluster configuration to complete
[2008-09-25 13:01:25.596] ALL
000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to comp
Log InLog in to view more of this document
Historical Number
NFX2954
Was this topic helpful?
Document Information
Modified date:
11 February 2020
UID
swg21558406