Troubleshooting
Problem
Summary
When implementing Apache-cassandra-data-migrator, it is important not to forget that schema mapping is needed in order to make it work.
Applies to
Apache-cassandra-data-migrator up to 3.2.3
Symptoms
The typical error when mapping is not correctly in place:
ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 10000) java.lang.ArrayIndexOutOfBoundsException: 3
Cause
The mapping required in the sparkConf.properties file is not correctly set up.
In order to assess the mapping configuration, we require the following:
- full sparConf.properties file
- Table schema (source and target)
Solution
Taking as an example the following table schema and assuming it is the same for our source and target database.
CREATE TABLE cycling.cyclist_name (
id uuid PRIMARY KEY,
firstname text,
lastname text
)
Our mapping in the sparkConf.properties file should match this:
# comma-separated-partition-key,comma-separated-clustering-key,comma-separated-other-columns spark.query.origin id,firstname,lastname # comma-separated-partition-key spark.query.origin.partitionKey id # comma-separated-partition-key,comma-separated-clustering-key spark.query.target.id id # comma separated numeric data-type mapping (e.g. 'text' will map to '0') for all columns listed in "spark.query.origin" spark.query.types 9,0,0
More info about values for the spark.query.types parameter: cassandra-data-migrator/sparkConf.properties
############################################################################################################# # Following are the supported data types and their corresponding [Cassandra data-types] mapping # 0: ascii, text, varchar # 1: int # 2: bigint, counter # 3: double # 4: timestamp # 5: map (separate type by %) - Example: 5%1%0 for map<int, text> # 6: list (separate type by %) - Example: 6%0 for list<text> # 7: blob # 8: set (separate type by %) - Example: 8%0 for set<text> # 9: uuid, timeuuid # 10: boolean # 11: tuple # 12: float # 13: tinyint # 14: decimal # 15: date # 16: UDT [any user-defined-type created using 'CREATE TYPE'] # 17: varint # 18: time # 19: smallint # Note: Ignore "Frozen" while mapping Collections (Map/List/Set) - Example: 5%1%0 for frozen<map<int, text>> #############################################################################################################
Furthermore, the tuning should also include the following parameters to be modified accordingly:
spark.query.ttl.cols spark.query.writetime.cols
First of all, count all of the table columns starting from ZERO. In the example above, looking at spark.query.origin
id ---> 0 firstname ---> 1 lastname ---> 2
Any column part of the Primary Key does not have TTL (Primary Key = Partition Key(s) + Clustering Column(s)). As the schema table in the above example has only one partition key, this must not be included in the following parameters, therefore we leave this out and include only the following ones:
spark.query.ttl.cols 1,2 spark.query.writetime.cols 1,2
Also, set the following properties correctly in order for the migration into Astra DB to happen properly:
spark.target.scb file:///tmp/scb/target.zip spark.target.username ***************** spark.target.password *****************
Finally, the tool is designed for migrating millions/billions of records, but otherwise, if the volume is low, use DSBulk instead. Having said this, for smaller tables, default values for the following parameters may be reduced (otherwise the migration would go slow for a small amount of data to be transferred):
spark.numSplits spark.batchSize
As a general rule, we can use 1 split for every 10K rows. So for a table with e.g., 37K rows a numSplits value of 4 is ideal. The default numSplits is 10K as we expect this tool to be used for 1 billion or more rows.batchSize (default 10) is not related to data volume but to the schema. This is to use batch writes while avoiding multi-partition writes. When the primary key and the partition key are the same, the batchSize should always be 1 to avoid multi-partition writes.
About the following, spark.origin.host, one contact point should be stated.
Additional Resources
Mapping information clearer added from 3.2.3: https://github.com/datastax/cassandra-data-migrator/blob/3.2.3/src/reso…
Working with Apache cassandra-data-migrator. How to execute it and increase --driver-memory and --executor-memory for bigger tables: cassandra-data-migrator
Downloading Secure Connect Bundle: Working with secure connect bundle
Creating an application token. Ensure the token is created with the proper role (e.g. a Database Administrator role): Managing your Astra DB organisation
Document Location
Worldwide
Historical Number
ka0Ui0000000LndIAE
Was this topic helpful?
Document Information
Modified date:
30 January 2026
UID
ibm17258644