Row retrieval response timeout of 10000, missing responses from nodes

Troubleshooting

Problem

Overview

This article applies to those scenarios when:

You are trying to submit a large query and received this error above
You are experiencing a workload increase
You had recently upgraded your DSE software and are experiencing difficulties
Jobs are getting aborted with the error given above

Error

ERROR [Native-Transport-Requests-10] 2021-06-17 17:13:10,909 RowResponseHandler.java:217 - Row retrieval response timeout of 10000, missing responses from nodes: [10.31.83.50]

…

Caused by: java.lang.RuntimeException: Row retrieval response timeout of 10000, missing responses from nodes: [10.202.245.150]

Analysis

What are timeouts?

Timeouts occur when you are performing a certain operation or job in your system and the operation is taking too long to complete or respond. This could be caused by submitting a query with too many responses, workload increase in your system, or configuration settings are not set to handle the amount of work being performed.

What does this error mean?

Receiving this error most likely means that the operation that is being performed is too much for the system to handle. Nodes could be overloaded with the workload and go unresponsive after a certain amount of time. This could eventually result in a large number of dropped mutations and high pending compactions.

Solution

Solution #1:

Row retrieval response is controlled by the cql_solr_query_row_timeout value that can be found in the cassandra.yaml file or dse.yaml file. The default value is usually set to 10000, however, you can experiment with this value by increasing it, doubling the default value, etc to prevent timeouts from occurring.

Solution #2:

There are various reasons that could cause a timeout to occur. Another thing to consider are the amount of pending compactions. Although it might now show up on the immediate error, this could show up on the system logs and date back even prior to the timeouts happening.

This can be reconfigured and tested by running nodetool setconcurrentcompacters <new_value>. By default, the compactor count starts at 2, however you can experiment with this value which would assist in preventing future SOLR timeouts from occurring.

Solution #3:

If the cause for the timeout is a specific large query that has been taking an abnormally long time,you may also want to change your configurations. For example, you are trying to perform a DSBulk Unload using a SOLR query, set the driver.basic.request.timout “string” in the DSBulk config file from its default “5 minutes” to a higher value.

This value is how long the DataStax driver waits for a request to complete. It is initialized at 5 minutes because the DataStax Bulk Loader is optimized for good throughput as well as good latencies -- increasing higher could improve performance.

Finally, refer to your cassandra.yaml file to the request_timeout_in_ms parameter. It is default to 10000, however experimenting and increasing this value should improve performance as well.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSQWIX","label":"DataStax Luna"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka06R000000EtZ7QAK

Was this topic helpful?

Document Information

Modified date:
30 January 2026

UID

ibm17259074

Tips