Troubleshooting
Problem
A Nirvana cluster sporadically fails. Realms started to delay inter realm communications and auto generated thread dumps show threads either in runnable, waiting or blocked state when the cluster went unstable. Clients were all getting disconnected.
Realms started to delay inter realms communications:
Delaying inter realm communications to server
Realms then started to disconnect clients because cluster failed:
Disconnecting client \ due to cluster failure and client registered interest
Then some realms went offline
Cluster> Changing state from nSlaveState to OfflineState
Then cluster was able to be formed:
Cluster> Found existing Master in cluster as server, setting local state to that of cluster
and failed again after trying to revover from master as stated by a thread dump threads were waiting:
"Scheduler Worker Pool:9" daemon prio=5 tid=0xd0 waiting on com.pcbsys.foundation.yb@608ce0b2 WAITING
at java.lang.Object.wait () (Native Method)
at java.lang.Object.wait (Object.java:503)
at com.pcbsys.foundation.xh.c ()
at com.pcbsys.foundation.xh.run ()
at com.pcbsys.foundation.uh.k ()
at com.pcbsys.foundation.ci.run ()
"Scheduler Worker Pool:8" daemon prio=5 tid=0xcf waiting on com.pcbsys.foundation.yb@608ce0b2 WAITING
at java.lang.Object.wait () (Native Method)
at java.lang.Object.wait (Object.java:503)
at com.pcbsys.foundation.xh.c ()
at com.pcbsys.foundation.xh.run ()
at com.pcbsys.foundation.uh.k ()
at com.pcbsys.foundation.ci.run ()
"Scheduler Worker Pool:7" daemon prio=5 tid=0xce waiting on com.pcbsys.foundation.yb@608ce0b2 WAITING
at java.lang.Object.wait () (Native Method)
at java.lang.Object.wait (Object.java:503)
at com.pcbsys.foundation.xh.c ()
at com.pcbsys.foundation.xh.run ()
at com.pcbsys.foundation.uh.k ()
at com.pcbsys.foundation.ci.run ()
Document Location
Worldwide
Log InLog in to view more of this document
Was this topic helpful?
Document Information
Modified date:
20 March 2025
UID
ibm17210531