Cluster components

To process incoming raw events from Apache Kafka, Apache Flink relies on job submitters, a job manager, and task managers. The events are processed by Flink processing jobs, which read, transform, and write the events.

The Apache Flink cluster consists of job submitters, a job manager, and task managers. These components are each exposed as separate Kubernetes artifacts.

Job submitters

Jobs submitters are Kubernetes jobs, which prepare and submit the Flink jobs to the job manager with the appropriate arguments for execution within the Flink cluster. For example, my-release-bai-bpmn is the job submitter for BPMN event processing.

Job manager

There is only one job manager. The job manager coordinates the distributed execution of the jobs. It schedules the job tasks onto the task slots that are made available by the task managers, and coordinates checkpoints and recoveries on failure. The job manager is exposed as a Kubernetes deployment so that it is automatically redeployed if it fails. Apache Zookeeper is used to ensure that no data is lost when the new instance of the job manager is deployed.

Task managers

The task managers execute job tasks through task slots. Each task manager has one task slot. Task managers are managed with a Kubernetes StatefulSet and are automatically redeployed in case of failure.

Processing jobs are split into parallel instances that are allocated for execution to the available task slots. The number of parallel instances is called parallelism. The higher the parallelism, the more data the system can process concurrently. Depending on parallelism, a job can use several task slots, 1 per unit of parallelism, and the task slots can be from different task managers. A job can be executed by more than one task manager.

For more information about job execution through task slots, see the Task Slots and Resources External link opens a new window or tab page of the Flink documentation.