Restoring a Cassandra node
The data on Apache Cassandra is replicated. Although a complete failure might be rare, data might get corrupted. In some cases, the hardware might crash, and the data might be lost. Therefore, it is necessary to take regular backups by taking snapshots of all Cassandra nodes. The snapshots can be used to roll back the Cassandra nodes to a known good state.
About this task
You can restore your Cassandra cluster to an earlier state at a specific point in time by using the snapshot feature of Cassandra.
Attention: It is best practice to restore failed or corrupted nodes by synchronizing
data from the remaining nodes. However, if data sync is not possible, or if you want to restore
nodes to a known earlier state, you can restore the nodes by using the snapshot feature. When you
use the snapshot, you might lose the data that was added to the cluster after the snapshot was
taken.
Procedure
To restore a Cassandra node, complete the following steps:
- Stop the Cassandra node.
- Clear the commit log by removing all the files in the commit log directory, such as rm /<installation directory>/apache-cassandra/commitlog/*.
-
Remove the database files for the required keyspaces, such as
rm /root/IBM/GlobalMailbox/apache-cassandra/data/{keyspace}/tablename-UUID/*.db
. -
Copy the latest snapshot directory contents for each keyspace to the data directory of the
keyspace, such as
cp -praf /<installation directory>/apache-cassandra/data/{keyspace}/tablename-UUID/snapshots/{snapshot_dir}/* /<installation directory>/apache-cassandra/data/{keyspace}/tablename-UUID
. - Restart the node.