As an administrator, monitor all applications that are submitted to an instance group.
Before you begin
You must be a cluster or cluster (read only) administrator, consumer or consumer (read only)
administrator, or have the Spark Applications View All permission to monitor all applications that
are submitted to an instance group.
About this task
As a cluster or consumer administrator, you can monitor all applications that are submitted to
an instance group in the cluster from
the Instance Groups page. Application developers can monitor the applications that they submitted to
an instance group from the My Notebooks (or My
Notebooks & Applications) page (see Monitoring your applications). To monitor GPU usage for
applications, you must enable
GPUs.
Procedure
-
From the cluster management console, click
.
-
In the Instance Group List tab, click the instance group to monitor.
-
Click the Applications tab.
-
Click Submitted Applications to view a list of applications that were
submitted to the instance group.
Use this tab to monitor progress during the application lifecycle by checking application state
in the State column. Any application in the
Failed, Waiting, or
Error state might require attention.
If an instance group is
stopped, your list does not include applications that were submitted to that instance group. However, if the batch master restarts on the same host or
if it is enabled for high availability, those applications show up after the instance group recovers.
-
Click an application to drill down to its details: you can monitor how the application's
executors are performing, what errors occurred, and how those errors correlate to application
performance and resource usage.
-
Click the Overview tab to download driver logs by clicking the download
icon (
). You can also check the error
messages to look for any issues that need further investigation. If there are driver or executor
error messages to view, click the Error messages link to direct you to the Drivers and
Executors tab and download the driver and executor logs.
If
you have the required permissions, you can view Spark driver and Spark executor logs for an
application also through the RESTful APIs.
Note: When a task is lost during application submission in client mode, the executor log does not
generate a log for the lost task, resulting in an incorrect number of failed tasks to be generated
in the executor log.
You
can also use charts to view application performance based on running and completed tasks and
application resource usage based on cores and memory: the Running Tasks and Completed Tasks charts
show data every minute; if an application completes within a minute, the charts do not show data.
The Cores Used and Memory Used charts show data every 30 seconds; if an application completes within
30 seconds, these charts do not show data.
Tip: By default, metrics for the Running
Tasks and Completed Tasks charts are retrieved from Spark gauges that are written every minute. To
change the frequency the metrics are written,
tune the default Spark metrics sink.
If there are data connectors
configured for the instance group, you
can also view data connectors that are
used by the Spark application from this page. To check the status of data connectors from this page, see Checking data connector status.
-
Click the Drivers and Executors tab to check the driver and executor
performance, resource usage, activity that is related to the resource orchestrator; and to download
logs for the drivers and executors.
-
Click the Performance tab for a graphical view of the application's
running and completed tasks over a specific time range, as well as the task durations.
-
Click the Resource Usage tab for a graphical view of resource usage for
the application. To view data for the application within a specific duration, select
Custom time period from the drop-down menu, enter the duration and click
Update Charts.