IBM Support

How to clone a cluster

Troubleshooting


Problem

Cloning 

The origin DC nodes' token ranges, DC name, schema, and user data, will be transferred to respective nodes in a target cluster.

This article is intended to be used on DCs of equal size. These instructions are based on the Restoring a snapshot into a new cluster and Restoring from a Snapshot articles. 

Scope

  • Apache Cassandra (All versions)
  • DataStax Enterprise (All versions)

Initial Setup

  1. Install the software on the target hosts and complete the initial configuration, but do not start the software.
    1. To get started with DSE, you can follow the Initializing a Cluster document.
    2. For Apache Cassandra, Start here.
    3. The IP addresses of the target nodes should not match the addresses of the original hosts if they're in the same subnet.
  2. Copy the token ranges from each node of the original cluster.
    1. One approach is to use the following application to collect the node topology, including the ranges: https://github.com/brendancicchi/cassandra-dse-helper-scripts/tree/master/copy-cluster-topology-info
    2. A second approach is to use the command nodetool ring | grep -w ip_address_of_node | awk '{print $NF ","}' | xargs

nodetool ring | grep -w $(hostname -i) | awk '{if(NR>1) printf ","; printf $NF}' | xargs

-3556133768331742023, -5146138320845640661, -7357022569604768134,- 958965020118584651, 1915711180939940766, 3851384939123314782, 6363153785365930833, 8198677122720708643

  1. A third approach is to use the command cqlsh -e "select tokens from system.local;" | tail -n-3 | head -n1 | tr -d "'" | tr -d '{' | tr -d '}'

cqlsh -e "select tokens from system.local;" | tail -n-3 | head -n1 | tr -d "'" | tr -d '{' | tr -d '}'

'-3556133768331742023', '-5146138320845640661', '-7357022569604768134', '-958965020118584651', '1915711180939940766', '3851384939123314782', '6363153785365930833', '8198677122720708643'

  1. Configure the cassandra.yaml file on the target nodes so that each contains the list of the token ranges from a corresponding original node. This is added to the initial_tokens option.
    1. For example:

initial_tokens: -3556133768331742023, -5146138320845640661, -7357022569604768134,- 958965020118584651, 1915711180939940766, 3851384939123314782, 6363153785365930833, 8198677122720708643

  1. On the target nodes, the seeds list in the cassandra.yaml file should only contain target node IPs.
    1. For advice on Seeds assignment, review the article on setting Seed nodes .
  2. If the origin cluster uses racks, copy the rack assignments to the respective nodes in the target cluster. This is configured in cassandra-rackdc.yaml
Data Copy

Next, one will begin copying data over from the origin nodes. This article assumes that the Cassandra data directory is defined in /var/lib/cassandra/data. Adjust as necessary.

  1. First, the schema needs to be copied. Locate the system_schema keyspace directory in the origin nodes and copy it to their corresponding target nodes
    1. From origin node: /var/lib/cassandra/data/system_schema
    2. To target node: /var/lib/cassandra/data/system_schema
  2. Next, since each table directory name uses a UUID as a suffix, you will need to recreate the same folder structure present in the origin nodes on the target nodes. To do so, you may use the following commands:
    1. On a single origin host, create a list of the Cassandra data directory subdirectories:
      1. find /var/lib/cassandra/data -type d -print | grep -v "snapshots\|backups" > output.txt
    2. Copy the output.txt file to the target nodes
    • On the target host, run the following to recreate the folder structure
      1. for line in output.txt; do mkdir -p $line; done
  3. Third, if the origin node is online, live sstables cannot be copied. To work around this, create a snapshot of all tables in all user-created keyspaces
    1. nodetool snapshot -t snapshot_name keyspace_name
    2. If the host is offline, then the live sstables rather than the snapshots may be copied.
  4. Copy the data from the snapshots in the origin hosts to the live sstable directories in the target hosts
    1. From origin host directory: /var/lib/cassandra/data/keyspace_name/table-UUID/snapshots/keyspace_name-snapshot_name
    2. To target host directory: /var/lib/cassandra/data/keyspace_name/table-UUID
  5. After all the required data has been copied, the target cluster can be brought online. The seed nodes will need to be started first. Start the next node 1 minute after the previous one has come online.
    1. Use nodetool status to keep track of node states.

Post-migration tasks

  1. Once all nodes in the target cluster are online, repair them to ensure that data is in sync.
    1. If using Racks, run a full repair of all nodes in a single rack.
      1. nodetool repair -full
    2. If not using Racks, run a primary range repair of all nodes.
      1. nodetool repair -pr

Final synchronization

If the target cluster needs to be fully consistent with the origin cluster, there are two tools available to make this happen:

  1. The Cassandra Data Migrator
      1. This tool copies data from one online cluster to another and can be used to fill in missing records.
      2. Link
    1. The Zero Downtime Migration Tool
      1. This can be used to split writes between the Origin and Target clusters. Clients will need to connect to this application instead of the Origin database.
      2. Link

    Document Location

    Worldwide

    [{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCR56","label":"IBM DataStax Enterprise"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

    Historical Number

    ka0Ui0000002iTBIAY

    Document Information

    Modified date:
    30 January 2026

    UID

    ibm17258451