Installing Apache Cassandra and Apache ZooKeeper
Apache Cassandra is a highly scalable and fault tolerant database. Apache ZooKeeper is a centralized service for providing distributed synchronization.
Apache Cassandra
In the Global Mailbox system, Cassandra is used for metadata replication. It stores all system data except for the payload.
Typically data centers are physically organized in racks of servers. Racks provide physical separation of computing hardware. Cassandra is a rack aware database. It ensures that 2 copies of the data is not written within the same rack. This ensures higher fault tolerance if an entire rack goes down. Cassandra uses seed nodes as contact points through which the clients can connect to Cassandra to query for Cassandra configuration information and installation topology. There must be one seed node in each data center. At least one seed node must be started before non-seed nodes.
Based on your requirements of high availability and replication, you can install and configure multiple Cassandra nodes. You can install Cassandra by using the Installation Manager user interface mode or the silent installation mode.
At installation time, you must define all the data centers, racks, and Cassandra nodes required in your topology. For each Cassandra server in your topology, you must specify which data center and which rack the server is in. Data center names and rack names are arbitrary. Your administrators might have already named the racks and data centers. If not, choose an arbitrary name. Ensure that the physical relationship between racks and servers is maintained. You must complete installing all the Global Mailbox prerequisites in all data centers before installing or upgrading the first Global Mailbox and Sterling B2B Integrator nodes.
A Cassandra node is configured based on the information that you provide during installation. Additionally, the internode_compression is set to none and the data, commitlog, saved_cache, and log directories are configured to be subdirectories of the directory where Cassandra is installed.
Apache ZooKeeper
In the Global Mailbox system, ZooKeeper is used to create a barrier that halts the message upload process, until the message is replicated in another data center.
Zookeeper elects a server node to act as a leader. Client nodes (Global Mailbox) interact only with this leader node. However, coordination itself (the creation of a barrier) is delegated to a follower node. If a leader loses its connection, a new leader node is elected. Clients then connect to the newly elected leader.
Based on your requirements of high availability and replication, you can install and configure multiple ZooKeeper nodes. A ZooKeeper cluster is known as an ensemble. ZooKeeper requires most of the nodes in an ensemble to be functional to provide the required services. Therefore, it is suggested to have an odd number of ZooKeeper nodes in your Global Mailbox topology. For example, if your cluster has four nodes, ZooKeeper can handle failure of a single node. If two nodes fail, the remaining two nodes, do not form a majority. However, if your cluster has five nodes, ZooKeeper can handle failure of two nodes, as the remaining three nodes form a majority. You can install ZooKeeper by using the Installation Manager user interface mode or the silent installation mode.
You must define all nodes of the Zookeeper ensemble, across all data centers, when installing a node. This information is required so that the entire ensemble is aware of each of the other Zookeeper nodes. You must complete all of your Global Mailbox prerequisite nodes in all data centers before installing or upgrading the first Global Mailbox and Sterling B2B Integrator nodes.
Cassandra and ZooKeeper can be installed together or separately. It is suggested to install them together on the same machine.