Troubleshooting
Problem
Let's say you have 3 brokers in the bootstrap list and the first broker is not reachable and the request is made from the client to the first broker. This request will get a time-out because the first broker is not reachable and the pipeline will not make a retry to the next available broker, instead, it will fail with an above “timeout” error.
In an ideal scenario, the client/pipeline should traverse through all the brokers in the list before marking it as a failure.
server1:9092,server2:9092,server3:9092
This is a known issue in kafka library v2.6 and below which is external to the SDC.
Solution
There are two solutions to this problem:
Solution 1 (Recommended)
Upgrade the Kafka client library in the stage to 2.7 or above and tune socket timeouts accordingly. In this version, Kafka introduced two new configurations (see below) that make socket timeout to be controlled on the client side.
socket.connection.setup.timeout.max.mssocket.connection.setup.timeout.ms
b. To upgrade the Kafka library to 2.7 or above, you can go to Kafka consumer origin or Kafka producer stage > Configuration > General > and choose the appropriate Stage Library above 2.7 as shown in the below image.
Solution 2
Decrease the TCP retry value from the sdc running server-side file /proc/sys/net/ipv4/tcp_syn_retries to 3. (The default is 6)
Symptom
This only applies if your pipeline is using Kafka client library 2.6 or below
In Kafka consumer or Kafka producer stages when using a client library 2.6 or below, if few Kafka brokers are unavailable due to any network-related issue, the server is not reachable, or during a disaster recovery scenario you might observe an error as below:
com.streamsets.pipeline.api.StageException: KAFKA_41 - Could not get partition count for topic 'topicName1' : org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata. |
Resolving The Problem
More details about this error can be found in the KIP-601 article.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
15 March 2025
UID
ibm17186224

