Cassandra Disk and Memory Usage - Adding Memory Resources
- Cassandra kernel file system cache
- Cassandra JVMs perform large amounts of disk I/O operations, for example, writing new data, compacting existing SSTables, and reading for queries. Cassandra relies on the kernel file system cache for optimizing reads. Recent and frequently used files are kept in the memory cache. It would be impractical to keep all of the hundreds of GBs of metric data Cassandra stores in memory at scale. Therefore, it is impossible to completely eliminate disk reads. By default, Cassandra containers are given 16 GB of RAM, set in their Kubernetes resource requests and limits. After the JVM, approximately 6 GB remains for file caching. This file cache size satisfies many of the most common reads from memory, for example: recent metric data or frequently accessed tables like topology or events. Operations like SSTable compaction, are also generally able to be completed from memory. SSTable compaction involves merging the many immutable files Cassandra stores data in into a smaller number of larger files.
- Metric summarization and baselines
- Metric summarization and baselines in particular can be heavy on the disk reads. For more information on metric summarization, see Configuring summarization.
- Cassandra I/O
- Cassandra I/O is unique compared to many traditional databases. SSTable compaction results in large amounts of reads/writes, but the read/writes are generally sequential I/O since it is reading and writing straight through files. This makes it ideal for traditional hard disk drive arrays that do well on sequential operations. However, queries can be much more random in their I/O pattern, for example, searching the headers of SSTables and pulling specific rows of data from the tables. For this reason, it is recommended by the Cassandra community to tune down the read ahead setting. This helps to reduce the wasted I/O reading data that is never used in queries, for more information, see Optimizing disk performance for Cassandra. The result is that the disks needed to support Cassandra, need the ability to do both large amounts of reads/writes, as well as handle random I/O well. This is why we do not recommend any network based storage solutions, as their performance and ability to handle the bandwidth is generally insufficient.
- Increase the Cassandra memory request and limit
- In containers like Docker and Kubernetes, the kernel filesystem cache is limited to the space within the container. For example, if you run Cassandra on a system with 64 GB of RAM, but the Cassandra container memory limit is 16 GB, the additional 48 GB of RAM will be unavailable to Cassandra to optimize the disk I/O. In order to take advantage of the additional RAM on the system, the Cassandra memory request and limit need to be increased.