Troubleshooting
Problem
Summary
'Client request timeout' when running count(*) with DataStax Astra DB
Applies to
- DataStax Astra
Symptoms
When running a SELECT COUNT(*) on a table in Astra DB it may timeout.
Cause
Running COUNT(*) on a large table on Apache Cassandra-based clusters is an expensive operation as it will become a scatter-gather operation requiring several nodes and sstables. This will likely take longer than the default client request timeout value.
Workaround
Use dsbulk [1] to perform the COUNT operation:
https://www.datastax.com/blog/datastax-bulk-loader-counting
The following is an example of how to use dsbulk with Astra DB:
1) Create a table in Astra DB using NoSQLBench [2]:
./nb run driver=cql workload=cql-keyvalue tags=phase:schema-astra username=<username> password=<password> keyspace=my_ks secureconnectbundle=<path>/secure-connect-mydatabase.zip
2) Load the table with 1 million rows:
./nb run driver=cql workload=cql-keyvalue tags=phase:rampup cycles=1000k username=<username> password=<password> keyspace=my_ks secureconnectbundle=<path>/secure-connect-mydatabase.zip threads=auto cql-keyvalue: 18.65%/Running (details: min=0 cycle=186480 max=1000000) cql-keyvalue: 37.53%/Running (details: min=0 cycle=375309 max=1000000) cql-keyvalue: 56.37%/Running (details: min=0 cycle=563673 max=1000000) cql-keyvalue: 76.38%/Running (details: min=0 cycle=763760 max=1000000) cql-keyvalue: 93.69%/Running (details: min=0 cycle=936917 max=1000000) 12/5/22, 3:54:23 PM ============================================================
3) Try to run a COUNT on the table to find the number of rows:
<username>@cqlsh> select count(*) from my_ks.keyvalue;
OperationTimedOut: errors={'<hostname-<region>.db.astra.datastax.com:29042:<db>': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=<hostname>-<region>.db.astra.datastax.com:29042:<db>
This will probably fail with a timeout error.
4) Run the count operation using dsbulk:
~/dsbulk/dsbulk-1.10.0/bin/dsbulk count -k my_ks -t keyvalue -b "<path>/secure-connect-mydatabase.zip" -u <username> -p <password> --log.checkpoint.enabled false --driver.advanced.auth-provider.class PlainTextAuthProvider Username and password provided but auth provider not specified, inferring PlainTextAuthProvider A cloud secure connect bundle was provided: ignoring all explicit contact points. Operation directory: /home/username/dsbulk/dsbulk-1.10.0/bin/logs/COUNT_20221205-160302-268653 total | failed | rows/s | p50ms | p99ms | p999ms 1,000,000 | 0 | 382,526 | 115.57 | 207.62 | 228.59 Operation COUNT_20221205-160302-268653 completed successfully in 2 seconds. 1000000
This completes successfully and returns the correct number of rows.
For very large tables you may need to throttle the dsbulk connection. This can be done using the following parameters:
--driver.advanced.heartbeat.timeout 60000 --driver.advanced.connection.init-query-timeout 600000 --dsbulk.batch.maxBatchStatements 5 --dsbulk.batch.mode PARTITION_KEY --dsbulk.executor.continuousPaging.enabled false --dsbulk.schema.splits 10000C --datastax-java-driver.basic.request.page-size 1000 --dsbulk.engine.maxConcurrentQueries 12
Related Information
Why is COUNT() bad in Apache Cassandra?
https://dba.stackexchange.com/questions/314567/why-is-count-bad-in-cass…
[1] dsbulk:
https://github.com/datastax/dsbulk
[2] NoSQLBench:
https://docs.nosqlbench.io/
Last Reviewed Date: 2024/02/01
Document Location
Worldwide
Historical Number
ka0Ui0000000kIzIAI
Was this topic helpful?
Document Information
Modified date:
30 January 2026
UID
ibm17258520