Troubleshooting
Problem
nodetool garbagecollect is a tool that cleans unneeded data out of sstables. It works with LeveledCompactionStrategy and SizeTieredCompactionStrategy.
In Apache Cassandra, when data is updated or deleted, records on the disk are not immediately touched. Instead, new data, or "tombstones" (deletion markers) are written to new files on disk. When reading data, Apache Cassandra reads through the relevant files on disk (and data in memory) and returns results based on the most recent writes. Over time, compaction can clean up out-of-date or logically deleted data.
nodetool garbagecollect uses single-sstable compactions to do the same: clean up out-of-date records (old versions of records that have been updated) or logically deleted data. For each sstable, it will write a new sstable with unneeded data cleaned out, if possible.
By default, garbage collect will clean out rows or partitions that have been deleted or updated with newer data. To also clean up deleted or updated cell values, specify the '-g CELL' option. (This will be more i/o and cpu intensive.)
Also by default, garbagecollect currently processes two sstables at a time. This can be adjusted using the '-j' or '--jobs' option. Increasing this will speed up the operation, but require more system resources.
In some cases, garbagecollect can be a useful first step in cleaning out unneeded tombstones. For more details on this, and cleaning tombstones in general, see "cleaning tombstones".
Last Reviewed Date: 2023/12/20
Document Location
Worldwide
Historical Number
ka0Ui0000000Jc9IAE
Was this topic helpful?
Document Information
Modified date:
30 January 2026
UID
ibm17258676