Troubleshooting

Problem

Error inserting into system_distributed.parent_repair_history

Sample error:

ERROR [Repair-Task:1] 2019-06-21 06:40:44,895 SystemDistributedKeyspace.java:406 - Error executing query INSERT INTO system_distributed.parent_repair_history (parent_id, keyspace_name, columnfamily_names, requested_ranges, started_at, options) VALUES (11111111-0000-0000-0000-888888888888, 'system_auth', { 'roles','role_permissions','role_members' }, { '(1607483561684771030,1656713833712314075]' }, toTimestamp(now()), { 'trace': 'false','forceRepair': 'false','hosts': '','parallelism': 'parallel','dataCenters': '','previewKind': 'NONE','incremental': 'false','pullRepair': 'false','primaryRange': 'false','jobThreads': '1' })

This error typically produces a stack trace similar to the following (note that the the stack trace will vary based on the DSE version):

org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.AbstractWriteHandler$1.lambda$subscribeActual$0(AbstractWriteHandler.java:158)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.cassandra.service.AbstractWriteHandler$TimeoutAction.accept(AbstractWriteHandler.java:221)
at org.apache.cassandra.service.AbstractWriteHandler$TimeoutAction.accept(AbstractWriteHandler.java:216)
at org.apache.cassandra.concurrent.TPCTimeoutTask.run(TPCTimeoutTask.java:43)
at org.apache.cassandra.concurrent.TPCHashedWheelTimer.lambda$onTimeout$0(TPCHashedWheelTimer.java:43)
at org.apache.cassandra.utils.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:498)
at org.apache.cassandra.utils.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:573)
at org.apache.cassandra.utils.HashedWheelTimer$Worker.run(HashedWheelTimer.java:329)
at org.apache.cassandra.concurrent.TPCRunnable.run(TPCRunnable.java:68)
at org.apache.cassandra.concurrent.EpollTPCEventLoopGroup$SingleCoreEventLoop.process(EpollTPCEventLoopGroup.java:920)
at org.apache.cassandra.concurrent.EpollTPCEventLoopGroup$SingleCoreEventLoop.processTasks(EpollTPCEventLoopGroup.java:892)
at org.apache.cassandra.concurrent.EpollTPCEventLoopGroup$SingleCoreEventLoop.runScheduledTasks(EpollTPCEventLoopGroup.java:980)
at org.apache.cassandra.concurrent.EpollTPCEventLoopGroup$SingleCoreEventLoop.processEvents(EpollTPCEventLoopGroup.java:774)
at org.apache.cassandra.concurrent.EpollTPCEventLoopGroup$SingleCoreEventLoop.run(EpollTPCEventLoopGroup.java:441)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

What does this error message mean?

This error is generated by running repair tasks. Repair tasks will keep track of the repair session status in system_distributed.parent_repair_history and system_distributed.repair_history tables. The repair tasks will write the repair session information to these 2 tables with a consistency level of ONE (CL=ONE).

The error indicates an update or insert query against these 2 tables from the repair task failed due to the consistency level (CL) being unable to be met.

Why does this error occur?

The error typically occurs due to the following reasons:

Overloaded nodes
Communication issues among the nodes due to a network issue

When nodes become unresponsive due to load or communication issues, the update or insert queries against these 2 tables will fail as the consistency level (CL) cannot be met.

How do you fix this error?

When this error occurs, it generally indicates the nodes in the cluster are not responsive. Users can also observe the slowness or failure of user queries.

Overloaded nodes

Examine the system.log for signs that the nodes in the cluster are overloaded to the point where the error started to occur. This can include dropped messages, long GC pauses, etc.

For example:

INFO  [ScheduledTasks:1] 2020-05-23 14:09:20,509  MessagingService.java:1273 - READ messages were dropped in last 5000 ms: 2300 internal and 136 cross node. Mean internal dropped latency: 5430 ms and Mean cross-node dropped latency: 5960 ms

WARN  [Service Thread] 2020-05-23 14:09:15,508  GCInspector.java:282 - G1 Young Generation GC in 5170ms.  G1 Eden Space: 18035507200 -> 0; G1 Old Gen: 12280584520 -> 26468408336; G1 Survivor Space: 1132462080 -> 662700032;

If the nodes in the cluster are overloaded, it is necessary to throttle the workload, check the access patterns (e.g. if running expensive queries) or add resources/nodes to better suit the cluster's needs.

Network issues

Check the output of nodetool status from all the nodes to see whether any node is in DN status

e.g.

--  Address        Load       Tokens       Owns    Host ID                               Rack
DN  10.100.100.100  4.38 GiB   64           ?       fdfc950d-6381-4c43-9bfc-ec567b06f360  rack1

Examine the system.log or debug.log for any gossip issue, for example:

INFO  [GossipTasks:1] 2020-01-04 03:55:49,320  Gossiper.java:1205 - InetAddress /10.100.100.101 is now DOWN

DEBUG [InternalResponseStage:13] 2020-07-02 05:24:15,203  Gossiper.java:1213 - Failed to receive echo reply from /10.100.100.101

If a network issue occurs, simply run the following tests between the nodes to verify the connectivity:

ping

ping <ip-address of the down node>

telnet

telnet <ip-address of the down node> 7000

telnet <ip-address of the down node> 7001

(if the node to node encryption is enabled)

If either of the above commands fail, more investigation at the network layer will be required.

Last Modified Date: December 4, 2023

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBAAS","label":"DataStax General"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka0Ui0000000H0rIAE

Was this topic helpful?

Document Information

Modified date:
30 January 2026

UID

ibm17258830

Tips

Error inserting into system_distributed.parent_repair_history

Troubleshooting

Problem

Error inserting into system_distributed.parent_repair_history

What does this error message mean?

Why does this error occur?

How do you fix this error?

Overloaded nodes

Network issues

ping

telnet

Document Location

Historical Number

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?