Analytics split-brain
An Elasticsearch engine of API Connect analytics that is running correctly should have a single master, but sometimes a management cluster has multiple masters. This condition is called analytics split-brain. Multiple masters results in different log information being maintained by each server.
Identifying analytics split-brain
- Check to see if the analytics data reporting rate decreased on the master server
- In an analytics split-brain condition, the analytics data that is normally indexed on a single Elasticsearch cluster is indexed across multiple clusters. Because the analytics data is indexed on different clusters, one of the first indicators of the split-brain condition is a significant decrease in the rate of analytics data on what was originally the only cluster.
- See if you received an email notification from API Connect
-
Approximately 15 minutes after the network disruption is resolved and the analytics split-brain condition begins, API Connect sends an email notification to the cloud administrator, cloud owner, and topology administrator of the affected API Connect management node. The email contains the following information:
- The management node where the split-brain condition was detected, including information about the URLs for which servers need to be restarted.
- The time when the condition was first detected.
- A link to this topic to provide instructions for resolving the issue.
After the initial notification email, two reminder emails are sent out per day until the condition is resolved.
- Invoke the REST API or run the command to view details about the nodes
- You can identify analytics split-brain by invoking the
get _cat nodeREST API on each of the servers to get detailed statistics about each node. - You can identify analytics split-brain by entering the
stat show analytics nodesin the command-line interface. This also returns the details about the nodes.
- You can identify analytics split-brain by invoking the
Resolving the analytics split-brain condition
During the analytics split-brain condition, the unique analytics data is sent to different Elasticsearch masters. You cannot fully merge the unique information from the multiple masters together during the recovery. This causes you to lose data that was written to all of the masters during the split-brain state, except the one that you select to continue using as the master. The data that is on the selected master becomes the basis for all of the analytics data in the future. The faster you resolve the analytics split-brain condition, the less analytics data is lost.
To resolve the condition, it is important to determine which nodes to restart. The analytics Elasticsearch cluster membership must match the number of management cluster members, and it generally minimizes the analytics data loss when you restart the fewest number of nodes. When you restart a system, avoid restarting the primary node. If you need to restart multiple nodes, restart them as soon as you can after one another, starting with secondary master. See the example below for more details.
- Restart the management server
- An analytics split-brain condition is automatically resolved when you restart the management node to fix another issue that occurred as a result of the network disruption, such as cloud dissociation. If you restart the management node, no additional action is required.
- Use the REST API to restart the Elasticsearch server
- If you do not want to restart your management server, you can restart only the Elasticsearch
server by invoking the REST API to restart by completing the following steps:
Sample analytics split-brain notification email
Master: 9.20.153.94
Nodes in the cluster: 9.20.153.94 9.20.153.96
Elasticsearch rest API restart URL (use POST request): /v1/analytics_ops/es_restart?ip=9.20.153.94
Manager Server: 9.20.153.95
Master: 9.20.153.95
Nodes in the cluster: 9.20.153.95 9.20.153.97 9.20.153.98
Elasticsearch rest API restart URL (use POST request): /v1/analytics_ops/es_restart?ip=9.20.153.95
Manager Server: 9.20.153.96
Master: 9.20.153.94
Nodes in the cluster: 9.20.153.94 9.20.153.96
Elasticsearch rest API restart URL (use POST request): /v1/analytics_ops/es_restart?ip=9.20.153.96
Manager Server: 9.20.153.97
Master: 9.20.153.95
Nodes in the cluster: 9.20.153.95 9.20.153.97 9.20.153.98
Elasticsearch rest API restart URL (use POST request): /v1/analytics_ops/es_restart?ip=9.20.153.97
Manager Server: 9.20.153.98
Master: 9.20.153.94
Nodes in the cluster: 9.20.153.95 9.20.153.97 9.20.153.98
Elasticsearch REST API restart URL (use POST request): /v1/analytics_ops/es_restart?ip=9.20.153.98
- Restart 9.20.153.94. This resolves the issue of having dual masters. When it restarts, it is no longer identified as a master node.
- Restart the following nodes in any order:
- 9.20.153.96
- 9.20.153.98