Performing major compactions in Cassandra

Transactions must be compacted or the speed of transactions decreases. You must routinely perform major compactions in Cassandra to maintain Global Mailbox performance.

Before you begin

  • Locate nodetool, a binary bundled with Cassandra.
  • Verify that JAVA_HOME is set to the location of IBM JDK 8.

About this task

When possible, rely on minor compactions instead of major compactions to address performance concerns. Minor compactions are triggered automatically when you perform a flush. Schedule compactions often enough such that operations are optimized and compactions are not overlapping. Schedule compactions in your Cassandra instances according to your business requirements and transaction characteristics.

To perform a major compaction:

Procedure

  1. Monitor the average transactions time to gauge how often to perform major compactions.
  2. Run nodetool, with the following command: nodetool --host <hostname> compact
    By default, host connects to the local Cassandra instance.
  3. Run this command against each Cassandra node individually.
    Important: Only one compaction can be performed at a time. Attempting to execute multiple compactions simultaneously results in compaction failures. This causes compactions to take longer.
    Tip: Doing compactions frequently, to keep the number of tombstones low or empty, does not result in the fastest compaction time. A major compaction consolidates all existing SSTables into a single SSTable. During compaction, there is a temporary spike in disk space usage and disk I/O because the old and new SSTables co-exist. A major compaction can cause considerable disk I/O.