The JASocket package makes it easy to create distributed, scalable software. JAConfig, in turn, makes it easy to manage in production.
The Config DB
JAConfig provides a fully replicated non-transactional, eventually consistent, key/value pair database for maintaining both configuration data and operator passwords. The database also provides change notifications, so servers can react to configuration changes. Every node in the cluster has a copy of this database both on disk and in memory, ensuring that the database is fully robust and supports fast queries. And there is neither a separate log file nor any need for a recovery mechanism--on startup, if the database is not valid its contents are discarded.
The underlying assumption of the database is that changes are infrequent, and that the system clocks of all the nodes in the cluster all have roughly the same time. Key/value pairs in the database always carry the timestamp of when the last change was made. Changes are propagated across all nodes in the cluster and shared when a node connects to another node. For each key, the change with the latest timestamp is retained.
The database is peer-based. So there is no single point of failure. And because there is no master copy, there are no warm or hot backups and fallover time is effectively 0. On the flip side, status information is completely out of scope, as frequent updates will break the underlying assumptions.
A cluster can be split into 2 or more smaller clusters by something as simple as a loose cable. If these smaller clusters act independently, inconsistent results can occur. This is managed by knowing the total number of host computers in the cluster and only allowing some activities to occur the the number of hosts currently connected to a given cluster is equal to or grater than (totalNumberOfHosts / 2) + 1. Clusters connected to this number of hosts have what is called a quorum and as there can not be two sub-clusters with a quorum of hosts, at most only one sub-cluster will be active.
Note that we are talking about a quorum of hosts rather than a quorum of nodes. Multiple nodes can run on each host and indeed there may be a large host which runs many of the nodes in a cluster. So if the quorum was based on nodes, it is possible that all the nodes of the quorum are running on the same host, which creates a single point of failure.
Cluster and Host Server Managers
At this time there are two types of server managers, the Cluster Manager and the Host Manager. These managers are started (and monitored) by a kingmaker server, which runs in every node. The kingmaker servers are responsible for having one cluster manager running in the cluster and one host manager running on every host.
The cluster manager uses the data in the config database to start and monitor a number of cluster servers, where each type of cluster server has only a single instance running somewhere in the cluster. The cluster manager and all the cluster servers stop running when the node is not a part of the active cluster (a cluster with a quorum of hosts).
Similarly, the host managers use the data in the config database to start and monitor a number of host servers, where each type of host server has only a single instance running on each host. Unlike the cluster manager and servers, the host manager and servers are unaffected by quorum considerations.
The cluster and host managers use a server named ranker to determine which node to use when starting a server. A simple ranker is provided which provides a list of nodes ordered by the number of servers running on each node. Alternative ranker implementations can be used as they are developed.