IBM Support

Running out of space in an Apache Cassandra node

Troubleshooting


Problem

Running out of space in an Apache Cassandra node


Summary

This article provides different solution approaches when node(s)/cluster become unresponsive as a consequence of full use of disk capacity or when noticing a steep increment in the use of disk space potentially due to large broken snapshots hard links. In a generic way the solutions provided here will help to clear disk space and allow compaction if this has been blocked.


Applies to

All DSE/Apache Cassandra versions


Symptoms

a) Full use of disk capacity:
Caused by: java.io.IOException: No space left on device

 

b) Use of disk space increased due to large snapshots and compaction not performing:

$nodetool <options> cfstats --<keyspace>.<table> -h
Read Count: 0
  Read Latency: NaN ms.
  Write Count: 20050
  Write Latency: 0.08548014962593516 ms.
  Pending Flushes: 0
    Table: nhanes
    SSTable count: 1
    Space used (live): 13.75 MB
    Space used (total): 13.75 MB
    Space used by snapshots (total): 0 bytes
    ...


Cause

Snapshots are generated manually or automatically enabling incremental backups to keep a copy of SSTables at a certain point in time via hard links, over the time these hard links can become unmanageable so manual intervention is required to move or remove them. Apache Cassandra itself has the internal mechanism of compaction. As SSTables are immutable, compaction is the process that allows SSTables to be reorganized by recreating new reduced SSTables (by elimination of Tombstones) and in order to do so, considerable use of space on disk is required. In general, compaction strategies require ~50% free space to operate “safely".
Therefore compaction when generating new SSTables can break these hard links, furthermore compaction won’t work if not enough free space available on disk due to e.g. large snapshots due to many hard links broken.


Solution(s)

1) Ghost snapshots
There is a particular case, overall when using OpsCenter after creating snapshots and the underlying SSTable is compacted, those initially snapshots created won’t refer to any SSTable anymore, creating what is known as “ghost snapshots”.
 

a) Find (ghost) snapshots

$ for i in `sudo find /media/cassandra -xdev -type f -size +30G -exec ls -i {} ';' | grep -i snap | awk '{print $1}' | sort -u`; do sudo find /media/cassandra -inum $i -exec ls -ltrh {} ';'; done

/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-24-04-00-00-UTC/ba-41181-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-23-04-00-00-UTC/ba-41181-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/ba-41181-bti-Data.db
...
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-20-04-00-01-UTC/ba-39693-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-21-04-00-01-UTC/ba-39693-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-22-04-00-00-UTC/ba-39693-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-20-04-00-01-UTC/ba-22855-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-21-04-00-01-UTC/ba-22855-bti-Data.db

Note:
+30G: this option find snapshots over 30G, please tune it according to your case and/or compaction strategy

 

b) Identify (ghost) snapshots
From the above, the 3 first results are not ghost snapshots (in bold), these can be identified because SSTable is associated with corresponding snapshots names.

123/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-24-04-00-00-UTC/ba-41181-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-23-04-00-00-UTC/ba-41181-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/ba-41181-bti-Data.db

On the other hand, ghost snapshots are the ones without SSTable available.

12345/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-20-04-00-01-UTC/ba-39693-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-21-04-00-01-UTC/ba-39693-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-22-04-00-00-UTC/ba-39693-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-20-04-00-01-UTC/ba-22855-bti-Data.db
/etc/dse/cassandra/data/keyspace/table_name-UUI/snapshots/opscenter_033ecb0c-2021-02-21-04-00-01-UTC/ba-22855-bti-Data.db

 

2) Reduce disk space used

a) Run nodetool command to clear each of the (ghost) snapshots identified
$ nodetool clearsnapshot -t <snapshot_name>

b) Reduce data if you can, e.g. by inserting tombstones in your cluster
$ ALTER TABLE <keyspace>.<table> WITH gc_grace_seconds = 3600

Note:
Special considerations need to be taken into account when running the above command, so use it as a spare resource. Prior to use the above command, initially do the following:
 

  • Evaluate severity of Tombstones

$ sstablemetadata <sstable_name> <filenames>

SSTable max local deletion time: 2147483647
Compression ratio: -1.0
Estimated droppable tombstones: 0.0
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1445871179392, position=18397674)
Estimated tombstone drop times:
...

 

  • Clean unneeded data out of SSTables (works only with LCS and STCS compaction strategies)

$ nodetool <options> garbagecollect <keyspace_name> <table_name>
 

3) Add node(s) capacity

a) Add nodes and run nodetool cleanup afterwards
b) Increase disk size of the node(s)

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCR56","label":"IBM DataStax Enterprise"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka06R000001MXl4QAG

Document Information

Modified date:
30 January 2026

UID

ibm17259050