Restoring a Cassandra node

The data on Apache Cassandra is replicated. Although a complete failure might be rare, data might get corrupted. In some cases, the hardware might crash, and the data might be lost. Therefore, it is necessary to take regular backups by taking snapshots of all Cassandra nodes. The snapshots can be used to roll back the Cassandra nodes to a known good state.

About this task

You can restore your Cassandra cluster to an earlier state at a specific point in time by using the snapshot feature of Cassandra.

Attention: It is best practice to restore failed or corrupted nodes by synchronizing data from the remaining nodes. However, if data sync is not possible, or if you want to restore nodes to a known earlier state, you can restore the nodes by using the snapshot feature. When you use the snapshot, you might lose the data that was added to the cluster after the snapshot was taken.

Procedure

To restore a Cassandra node, complete the following steps:

  1. Stop the Cassandra node.
  2. Clear the commit log by removing all the files in the commit log directory, such as rm /<installation directory>/apache-cassandra/commitlog/*.
  3. Remove the database files for the required keyspaces, such as rm /root/IBM/GlobalMailbox/apache-cassandra/data/{keyspace}/tablename-UUID/*.db.
  4. Copy the latest snapshot directory contents for each keyspace to the data directory of the keyspace, such as cp -praf /<installation directory>/apache-cassandra/data/{keyspace}/tablename-UUID/snapshots/{snapshot_dir}/* /<installation directory>/apache-cassandra/data/{keyspace}/tablename-UUID.
  5. Restart the node.