Troubleshooting
Problem
Cloning
The origin DC nodes' token ranges, DC name, schema, and user data, will be transferred to respective nodes in a target cluster.
This article is intended to be used on DCs of equal size. These instructions are based on the Restoring a snapshot into a new cluster and Restoring from a Snapshot articles.
Scope
- Apache Cassandra (All versions)
- DataStax Enterprise (All versions)
Initial Setup
- Install the software on the target hosts and complete the initial configuration, but do not start the software.
- To get started with DSE, you can follow the Initializing a Cluster document.
- For Apache Cassandra, Start here.
- The IP addresses of the target nodes should not match the addresses of the original hosts if they're in the same subnet.
- Copy the token ranges from each node of the original cluster.
- One approach is to use the following application to collect the node topology, including the ranges: https://github.com/brendancicchi/cassandra-dse-helper-scripts/tree/master/copy-cluster-topology-info
- A second approach is to use the command nodetool ring | grep -w ip_address_of_node | awk '{print $NF ","}' | xargs
nodetool ring | grep -w $(hostname -i) | awk '{if(NR>1) printf ","; printf $NF}' | xargs
-3556133768331742023, -5146138320845640661, -7357022569604768134,- 958965020118584651, 1915711180939940766, 3851384939123314782, 6363153785365930833, 8198677122720708643
- A third approach is to use the command cqlsh -e "select tokens from system.local;" | tail -n-3 | head -n1 | tr -d "'" | tr -d '{' | tr -d '}'
cqlsh -e "select tokens from system.local;" | tail -n-3 | head -n1 | tr -d "'" | tr -d '{' | tr -d '}'
'-3556133768331742023', '-5146138320845640661', '-7357022569604768134', '-958965020118584651', '1915711180939940766', '3851384939123314782', '6363153785365930833', '8198677122720708643'
- Configure the cassandra.yaml file on the target nodes so that each contains the list of the token ranges from a corresponding original node. This is added to the initial_tokens option.
- For example:
initial_tokens: -3556133768331742023, -5146138320845640661, -7357022569604768134,- 958965020118584651, 1915711180939940766, 3851384939123314782, 6363153785365930833, 8198677122720708643
- On the target nodes, the seeds list in the cassandra.yaml file should only contain target node IPs.
- For advice on Seeds assignment, review the article on setting Seed nodes .
- If the origin cluster uses racks, copy the rack assignments to the respective nodes in the target cluster. This is configured in cassandra-rackdc.yaml
Next, one will begin copying data over from the origin nodes. This article assumes that the Cassandra data directory is defined in /var/lib/cassandra/data. Adjust as necessary.
- First, the schema needs to be copied. Locate the system_schema keyspace directory in the origin nodes and copy it to their corresponding target nodes
- From origin node: /var/lib/cassandra/data/system_schema
- To target node: /var/lib/cassandra/data/system_schema
- Next, since each table directory name uses a UUID as a suffix, you will need to recreate the same folder structure present in the origin nodes on the target nodes. To do so, you may use the following commands:
- On a single origin host, create a list of the Cassandra data directory subdirectories:
- find /var/lib/cassandra/data -type d -print | grep -v "snapshots\|backups" > output.txt
- Copy the output.txt file to the target nodes
- On the target host, run the following to recreate the folder structure
- for line in output.txt; do mkdir -p $line; done
- On a single origin host, create a list of the Cassandra data directory subdirectories:
- Third, if the origin node is online, live sstables cannot be copied. To work around this, create a snapshot of all tables in all user-created keyspaces
- nodetool snapshot -t snapshot_name keyspace_name
- If the host is offline, then the live sstables rather than the snapshots may be copied.
- Copy the data from the snapshots in the origin hosts to the live sstable directories in the target hosts
- From origin host directory: /var/lib/cassandra/data/keyspace_name/table-UUID/snapshots/keyspace_name-snapshot_name
- To target host directory: /var/lib/cassandra/data/keyspace_name/table-UUID
- After all the required data has been copied, the target cluster can be brought online. The seed nodes will need to be started first. Start the next node 1 minute after the previous one has come online.
- Use nodetool status to keep track of node states.
Post-migration tasks
- Once all nodes in the target cluster are online, repair them to ensure that data is in sync.
- If using Racks, run a full repair of all nodes in a single rack.
- nodetool repair -full
- If not using Racks, run a primary range repair of all nodes.
- nodetool repair -pr
- If using Racks, run a full repair of all nodes in a single rack.
Final synchronization
If the target cluster needs to be fully consistent with the origin cluster, there are two tools available to make this happen:
- The Cassandra Data Migrator
- This tool copies data from one online cluster to another and can be used to fill in missing records.
- Link
- The Zero Downtime Migration Tool
- This can be used to split writes between the Origin and Target clusters. Clients will need to connect to this application instead of the Origin database.
- Link
Document Location
Worldwide
Historical Number
ka0Ui0000002iTBIAY
Was this topic helpful?
Document Information
Modified date:
30 January 2026
UID
ibm17258451