IBM Support

JGroup cluster isn't load balancing or node will not rejoin cluster when using TCP mode for cluster communications

Troubleshooting


Problem

JGroup cluster isn't load balancing or node will not rejoin cluster when using TCP mode for cluster communications

Symptom

Node will not process BP's after shutdown/startup. Entire cluster must be restarted in order to get node to successfully join cluster again. This occurs if using TCP as opposed to UDP for cluster communications

GIS cluster is set to run on JGroups communications (node to node) rather than IP multicasting but the nodes don't seem to be load balancing.

jgroups_cluster.properties file configuration is key for node to node communications to work properly and allow the nodes to distribute load.

Distribution Threshold is set to 2% but the load is not being shared. The node in question does show as active. queueWatcher reveals that over 1500 BPs are in queue 4 on Node 2 and Node 1 at that same time has nothing in queue 4 waiting.

queueWatcher results:

Cluster Node Information for: node2

NodeInfoNotificationBus toString()

Sent:0 NotSent:0: Received:0

ClusterID:00000001- 54- 30 -76 5-8 78 58 5a 46 32 46 41 32 53 32 50 72 t0vxxxzf2fa2s2pr 00000011 79 58 54 4c 68 58 34 2b 32 5a 45 3d 0d 0a yxtlhx4 2ze.

NodeName:node2

listenerPort:9056

VMID:node2:9056

addr:gotsth91/172.26.36.114

suspect:false

BPExec:true

nodeRole:

load 0:2147483647

load 1:0

load 2:0

load 3:0

load 4:1580

load 5:0

load 6:0

load 7:0

load 8:0

load 9:0


-NodeInfo Array-

Cluster Node Information for: node1

NodeInfoNotificationBus toString()

Sent:0 NotSent:0: Received:0

ClusterID:00000001- 54- 30 -76 5-8 78 58 5a 46 32 46 41 32 53 32 50 72 t0vxxxzf2fa2s2pr 00000011 79 58 54 4c 68 58 34 2b 32 5a 45 3d 0d 0a yxtlhx4 2ze.

NodeName:node1

listenerPort:9056

VMID:node1:9056

addr:gotsth90/172.26.44.90

suspect:false

BPExec:true

nodeRole:

load 0:2147483647

load 1:0

load 2:0

load 3:0

load 4:0

load 5:0

load 6:0

load 7:0

load 8:0

load 9:0


-NodeInfo Array-

Error Message

In Noapp.log. Repeated over and over:

[2008-09-25 13:01:24.356] ALL 000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to complete
[2008-09-25 13:01:24.666] ALL 000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to complete
[2008-09-25 13:01:24.976] ALL 000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to complete
[2008-09-25 13:01:25.286] ALL 000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to complete
[2008-09-25 13:01:25.596] ALL 000000000000 GLOBAL_SCOPE Send waiting for cluster configuration to comp

[{"Product":{"code":"SS3JSW","label":"IBM Sterling B2B Integrator"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Historical Number

NFX2954

Document Information

Modified date:
11 February 2020

UID

swg21558406