APAR status
Closed as fixed if next.
Error description
In a Big SQL cluster with HA and automatic failover enabled, if the entire primary head node disconnects temporarily from the rest of the cluster, resulting in the interruption of both Zookeeper and Big SQL HA connections, the standby may initiate a takeover by force. If the primary head node then reconnects and is still online, the result can be a split brain scenario. . This vulnerability can be mitigated by introducing a small delay before the standby initiates a takeover by force, to ensure that the primary head node is actually down.
Local fix
A split brain situation is an undesirable but not-unexpected possibility when the standby head node takes over while the primary is disconnected. If the reliability of a cluster may trigger frequent takeovers, the best course of action to avoid split brain situations will be to disable failover automation (with the downside being that the head node goes down in an actual failure until manual intervention occurs).
Problem summary
Please see problem discription.
Problem conclusion
Temporary fix
Comments
APAR Information
APAR number
PH14925
Reported component name
IBM BIG SQL
Reported component ID
5737E7400
Reported release
600
Status
CLOSED FIN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-07-29
Closed date
2020-09-09
Last modified date
2020-09-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"600"}]
Document Information
Modified date:
10 September 2020