IBM Support

KAFKA_41 - Timeout expired while fetching topic metadata

Troubleshooting


Problem

Let's say you have 3 brokers in the bootstrap list and the first broker is not reachable and the request is made from the client to the first broker. This request will get a time-out because the first broker is not reachable and the pipeline will not make a retry to the next available broker, instead, it will fail with an above “timeout” error.

In an ideal scenario, the client/pipeline should traverse through all the brokers in the list before marking it as a failure.

server1:9092,server2:9092,server3:9092

This is a known issue in kafka library v2.6 and below which is external to the SDC.

(blue star) Solution

There are two solutions to this problem:

  • Solution 1 (Recommended)

    1. Upgrade the Kafka client library in the stage to 2.7 or above and tune socket timeouts accordingly. In this version, Kafka introduced two new configurations (see below) that make socket timeout to be controlled on the client side.

socket.connection.setup.timeout.max.ms

socket.connection.setup.timeout.ms

b. To upgrade the Kafka library to 2.7 or above, you can go to Kafka consumer origin or Kafka producer stage > Configuration > General > and choose the appropriate Stage Library above 2.7 as shown in the below image.

  • Solution 2

    1. Decrease the TCP retry value from the sdc running server-side file  /proc/sys/net/ipv4/tcp_syn_retries to 3. (The default is 6)

Symptom

This only applies if your pipeline is using Kafka client library 2.6 or below

In Kafka consumer or Kafka producer stages when using a client library 2.6 or below, if few Kafka brokers are unavailable due to any network-related issue, the server is not reachable, or during a disaster recovery scenario you might observe an error as below:

com.streamsets.pipeline.api.StageException: KAFKA_41 - Could not get partition count for topic 'topicName1' : org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata.

Resolving The Problem

More details about this error can be found in the KIP-601 article. 

Document Location

Worldwide

[{"Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSM7CU","label":"IBM StreamSets Data Collector"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
15 March 2025

UID

ibm17186224