Monitoring Spark jobs through the Spark user interface
The Spark user interface (UI) helps you monitor various aspects of running a Spark application.
For example, you can check which stage is running, how many tasks are included in a stage, why certain stages took longer to run, is there a strangler task in a stage, and are all the executors in the application being used optimally. Additionally, you can analyze the memory, CPU, and disk consumption of all drivers and executors, and many other metrics. For more details on what you can monitor on the Spark UI, see Spark web interfaces.
IBM Analytics Engine powered by Apache Spark exposes the Spark UI for all running Spark jobs. Note that the Spark UI is not accessible for completed jobs. If you need to investigate the run of a completed Spark application, you can check the Spark events on the Spark history server. For details. see Accessing and customizing the Spark history server.
You can also view the list of Spark jobs and the status of a particular Spark job from the Cloud Pak for Data web client. For more information on how to view Spark applications, see Managing Analytics Engine powered by Apache Spark instances.
You can get the endpoint to the Spark UI of a running Spark job by using one of the following Analytics Engine powered by Apache Spark APIs:
- Get the status of a given Spark job
- List all active Spark jobs
For instructions on generating an API key, see Generating an API authorization token.
For example, when you use the API to get the status of a given Spark job, the response includes the endpoint to the Spark UI:
curl -k -X GET <V4_JOBS_API_ENDPOINT>/<job_id> -H "Authorization: ZenApiKey ${MY_TOKEN}"
Example of a response with the endpoint to the Spark UI:
{
"application_id": "<application_id>",
"state": "RUNNING",
"start_time": "Monday' 07 June 2021 '14:46:23.237+0000",
"spark_application_id": "app-20210607144623-0000",
"spark_ui":"<V4_JOBS_API_ENDPOINT>/<job_id>/spark_ui/"
}
For details on the Analytics Engine powered by Apache Spark APIs you can use to get at the Spark UI endpoint of a running job, see Submitting Spark jobs via API.
Parent topic: Apache Spark