Apache Kafka cluster and components
Apache Kafka is a high-throughput distributed messaging system that you can use to facilitate scalable data collection.
Apache Kafka is bundled with Log Analysis in the <HOME>/IBM®/LogAnalysis/kafka directory.
An installation of Apache Kafka consists of a number of brokers that run on individual servers that are coordinated by an instance of Apache ZooKeeper. You can start by creating a single broker and you can add more as you scale your data collection architecture.
In the scalable data collection architecture, the Receiver cluster writes data to Apache Kafka topics and partitions, based on the data sources. The Sender cluster reads data from Apache Kafka, does some processing and sends the data to Log Analysis.
Apache ZooKeeper
Apache Kafka uses Apache ZooKeeper to maintain and coordinate the Apache Kafka brokers.
A version of Apache ZooKeeper is bundled with Apache Kafka.
Topics, partitions, and consumer groups
The basic objects in Apache Kafka are topics, partitions, and consumer groups.
Topics are divided into partitions. Partitions are distributed across all the Apache Kafka brokers. LFAs
Create one partition for every two physical processors on the server where the broker is installed. For example, if you use eight core processors, create four partitions in the Apache Kafka broker. You specify the number of partitions in your Apache Kafka configuration. For more information, see Configuring Apache Kafka brokers.
You do not need to manually create topics or consumer groups. You only need to specify the correct values in the configuration for the LFA, Sender, and Receiver clusters. The appropriate topics and partitions are created for you.
In the Receiver configuration, you configure Logstash to receive data from
the LFAs and send it to the Apache Kafka brokers. The configuration
maps the logical data source attributes that are specified in the LFA configuration to the
topic_id
and message_key
parameters in Apache Kafka. This configuration ensures
that data from each physical data source is mapped to a partition in Apache Kafka. For more information, see
Configuring the Receiver cluster for single line logs.
In the Sender configuration, you configure Logstash to read data from a
specific topic or in the consumer group. This configuration is based on the
group_id
and topic_id
that you specify. The
topic_id
is the same as the name of the logical data source. For more information,
see Configuring the Sender cluster for single line logs.
Apache Kafka brokers
The configuration parameters for each Apache Kafka servers are specified in
the
<kafka_install_dir>/kafka_version_number/config/server.properties
file, where version_number
is the Kafka version number. For
Kafka version numbers for Log Analysis
Version 1.3.8 and its fix packs, see Other supported software.
broker.id=1
port=17991
log.dirs=/tmp/kafka-logs-server_0
zookeeper.connect=example.com:12345
You can find a sample configuration file in the <HOME>/IBM/LogAnalysis/kafka/test-configs directory.
If you want to implement high availability in a production environment, the Apache Kafka server cluster must consist of multiple servers. You can also use these servers to configure replication and retention period. However, when you add new brokers to the cluster, the existing topics are not distributed automatically across new brokers. For more information about how to fix this issue, see https://kafka.apache.org/081/ops.html.