Setting up the Kafka server

Before you use the Kafka connector, Kafka server must be configured either as standalone or in a cluster environment.

Kafka connector works with Apache distribution of Kafka at he version 0.10.0.0. Kafka connector works with both offline as well as streaming messages. It is expected for Kafka connector to be configured with the Apache zookeeper as a synchronization service.

Kafka can be set up in either of the following three modes. The Information server engine user such as dsadm or isadmin must have the permission and privileges to access the machine where Kafka server is running.
  • Single node single broker cluster
  • Single node multi broker cluster
  • Multiple node multiple broker cluster
The following table describes these modes in detail:
Table 1. Kafka server set up modes
Modes Description
Single node single broker cluster In this configuration one or more instance of zookeeper and only one instance of Kafka is running on a host which can be accessible from the machine where Information Server engine tier is installed. This setup essentially means a Kafka cluster of size 1.

If the Kafka Connector has to be used for this type of configuration, the hostname of the machine where Kafka Server is running is required along with the port number on which the Kafka server listens. To connect to a Kafka server, the configuration property bootstrap.servers is required which should contain the hostname:portnumber of the Kafka server to which we are trying to connect.

In this configuration the value for bootstrap.servers property would be a single hostname:port.

Single node multi broker cluster In this configuration, on the same machine (node), one or more instances of zookeeper and more than one Kafka broker should be running. Kafka brokers are uniquely identified by the broker.id property. For Kafka Connector to establish a connection to the Kafka server, the hostname along with the list of port numbers should be provided. For example, for the bootstrap.servers property, the value would be hostname:9092, hostname:9093 (that is all the ports on the same server where Kafka service would be running).

This configuration guarantees that the messages that would be read from topics are not lost even if the port has gone down the messages, and ensures fault tolerance to a certain extent.

Multiple node multiple broker cluster In this configuration, Kafka server should be installed on multiple hosts (nodes) and again multiple brokers can run on each node. The connection to Kafka server requires the hostname: port values for all the nodes where Kafka is running.

For example the bootstrap.servers property value should be a list of host/port pairs which would be used for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified for bootstrapping.

Since these servers are only used for the initial connection, to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, if a server is experiencing a down time).

Note: For each of the above configurations, we can run multiple zookeeper services.

When the Kafka connector is configured as a source stage, the messages are read from Kafka which is equivalent to implementing Kafka connector as consumer object. When Kafka connector is configured in the target stage, the messages are written into Kafka, which is equivalent to implementing Kafka connector as a Producer object. Kafka connector is available in the location //Stagetypes//Parallel//Real Time//Kafka connector

Note: The current version of Kafka connector does not support multiple input links or multiple output links. The supported data types are Varchar, Char, Longvarchar