Functionality achieved by replication
Availability, performance, consistency, and durability of Global Mailbox are achieved by the replication feature.
- Metadata about the file. For example, file size and file name.
- The actual content of the file, the payload.
- Metadata replication
- Apache Cassandra is used to store and maintain message metadata.
- Payload replication
- If the payload size is less than a set threshold, Cassandra is used to manage and replicate the payload. If the payload size is more than the set threshold, the replication server is used to manage and replicate the payload. Using Cassandra improves performance for small files due to less overhead from eliminating the replication server and the shared disk from the replication flow.
It is expected of a distributed system like Global Mailbox to be available most of the times, to perform at the best, to maintain data consistency, and to be durable. However, there are various trade-offs that must be considered by you, based on your business requirements, before configuring each functionality.
Availability
Availability refers to the proportion of time during which a particular service or a system in Global Mailbox is operating at functional capacity. For instance, we consider the payload replication service available only when users can successfully upload new payloads to Global Mailbox. The metadata replication service, on the other hand, is available only when sufficient number of Cassandra instances are online to service read and write requests.
Performance
- The response time of individual operations executed
- The aggregate throughput of operations executed at some level of granularity. For example, within a user session, all concurrent sessions within a data center, or all concurrent global sessions.
Minimizing response time and maximizing throughput are both desirable performance goals. In general, it is possible that some operations might have high response times but scale well across concurrent sessions, that is, be high-throughput. Similarly, low-throughput operations might complete quickly in the context of an individual session.
Consistency
Consistency defines the congruency of visible states within a computer system. A distributed system, comprised of different nodes can provide strong consistency only if every component can observe the same state in the same order. Weak consistency, on the other hand, does not provide this guarantee. Apache Cassandra supports a consistency model known as eventual consistency. There are chances, that an eventually consistent system might provide an outdated answer to a query. However, after a sufficient period of time passes, during which no component failures occur, all components in an eventually consistency system respond to a query with the same answer.
Though an eventually consistent system has potential benefits to performance and availability, it is suggested not to configure eventual consistency in Cassandra due to data integrity related issues in Global Mailbox.
Durability
Durability is the guarantee that any operation that completed successfully is not lost, rolled back, or changed due to a component failure. Durability is often compromised in distributed systems to improve performance and availability.
Configuration options
Payload replication and metadata replication provide different levels of availability, performance, consistency, and durability functionality. Availability, performance, consistency, and durability can be configured for payload replication. However for metadata replication, only availability, performance, and durability can be configured. You must retain the default consistency level for metadata replication.