IBM Support

ERROR: org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe

Troubleshooting


Problem

Symptom

Stream failure error occurs when either a node is being decommissioned, replaced with a new node, or after bootstrapping in some cases. The error log will look like this: 

INFO  [StreamReceiveTask:5341] 2020-07-31 05:56:48,620  StreamResultFuture.java:180 - [Stream #44008ac0-d234-11ea-b48c-e94aabceab9f] Session with /10.192.170.115 is complete
WARN  [StreamReceiveTask:5341] 2020-07-31 05:56:48,627  StreamResultFuture.java:207 - [Stream #44008ac0-d234-11ea-b48c-e94aabceab9f] Stream failed
ERROR [main] 2020-07-31 05:56:48,628  CassandraDaemon.java:583 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed

Then, in Apache Cassandra logs, there is an occurrence of broken pipe errors:

ERROR [STREAM-OUT-/10.192.148.41] 2020-07-30 09:31:56,582  StreamSession.java:515 - [Stream #44008ac0-d234-11ea-b48c-e94aabceab9f] Streaming error occurred
java.io.IOException: Broken pipe

Analysis

 Stream failure can occur due to a variety of reasons:
  • Network failures
  • Overloaded or under-provisioned nodes
  • Running repairs
  • Long GC pauses
  • SStable corruption

The broken pipe exception suggests that the streaming failure is due to node connectivity problems, which categorizes this error under network failures.

A pipe connects two processes as a stream. One of these processes holds the read-end of the pipe, and the other holds the write-end. When the pipe is written to, data is stored in a buffer, waiting for the other processes to retrieve it. If, during either the read or write process, one end of the pipe disconnects, whether it be the read-end or write-end, the pipe process gets broken, causing the streaming failure to occur as a Broken pipe exception.

Network outages or outages from traffic congestion can cause Broken pipe issues. For most C*/DSE use cases, broken pipes occur when a node is being replaced or bootstrapped, and outages happen during the bootstrap or rebuild process.

The best course of action is to identify these outages first, ensure that network outages do not occur in the future, and retry the intended node processes.

Solution

If the network disconnect is a temporary problem or due to congested traffic, the intended processes, such as decommissioning and replacing nodes, can be retried again or during off-peak hours. If retries still fail while the network is up and running, perform a rolling restart of the cluster and retry.

If an outage occurs during a node bootstrapping and that previous outage is causing the current broken pipe connectivity issue, re-bootstrap the node to ensure that the nodes are fully connected.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBAAS","label":"DataStax General"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka0Ui0000000M5NIAU

Document Information

Modified date:
30 January 2026

UID

ibm17258639