Retrieving job execution logs

Flink jobs ingest and process events before storing results in HDFS, Kafka, and Elasticsearch. If no results appear in one of those destinations, it is important to be able to investigate how the job were executed by looking at the logs.

Before you begin

Create an OpenShift route or an Ingress to the Flink web user interface and access this interface as instructed in Postdeployment tasks.

About this task

The Flink web user interface displays a number of metrics about the jobs it processes. Here are a few examples of helpful indicators:
  • At what rate the events are being processed by the processing jobs
  • The lag between this throughput and the number of events in the Kafka bus
  • The existence of back pressure in the system
  • Which task manager is executing which job
Warning: Opening access to the Flink or Elasticsearch interface might introduce a security vulnerability if no protection is set up at ingress level. Therefore, enable access only for debug purposes.

The job submitters contain the logs for the submission of each job to the cluster. These logs include the parameters that were used for the job. Because the actual job is executed on the task managers, you need to check the task manager logs to monitor job execution.

Procedure

  1. Click Running Jobs, and then click the job you want to monitor.
  2. View the list of the task managers that are executing the job.

     New in 19.0.3  Click an item in the bottom table, and then on the Subtasks tab, click a subtask.

Results

You can see the logs of the corresponding task managers in various ways.
  • On OpenShift or other certified Kubernetes platform, look up the web console of your Kubernetes cluster, such as, for example, the OpenShift dashboard.
  • You can also run the logs command. The <id> placeholder is the numerical identifier of a task manager that is executing the job.
    • On OpenShift:
      oc logs --since=3h <my-release>-bai-flink-taskmanager-<id>
    • On other certified Kubernetes platforms:
      kubectl logs --since=3h <my-release>-bai-flink-taskmanager-<id>