Monitoring all applications as an administrator

As an administrator, monitor all applications that are submitted to an instance group.

Before you begin

You must be a cluster or cluster (read only) administrator, consumer or consumer (read only) administrator, or have the Spark Applications View All permission to monitor all applications that are submitted to an instance group.

About this task

As a cluster or consumer administrator, you can monitor all applications that are submitted to an instance group in the cluster from the Instance Groups page. Application developers can monitor the applications that they submitted to an instance group from the My Notebooks (or My Notebooks & Applications) page (see Monitoring your applications). To monitor GPU usage for applications, you must enable GPUs.

Procedure

  1. From the cluster management console, click Workload > Instance Groups.
  2. In the Instance Group List tab, click the instance group to monitor.
  3. Click the Applications tab.
  4. Click Submitted Applications to view a list of applications that were submitted to the instance group.

    Use this tab to monitor progress during the application lifecycle by checking application state in the State column. Any application in the Failed, Waiting, or Error state might require attention.

    If an instance group is stopped, your list does not include applications that were submitted to that instance group. However, if the batch master restarts on the same host or if it is enabled for high availability, those applications show up after the instance group recovers.

  5. Click an application to drill down to its details: you can monitor how the application's executors are performing, what errors occurred, and how those errors correlate to application performance and resource usage.
    1. Click the Overview tab to download driver logs by clicking the download icon (download_icon). You can also check the error messages to look for any issues that need further investigation. If there are driver or executor error messages to view, click the Error messages link to direct you to the Drivers and Executors tab and download the driver and executor logs. If you have the required permissions, you can view Spark driver and Spark executor logs for an application also through the RESTful APIs.
    Note: When a task is lost during application submission in client mode, the executor log does not generate a log for the lost task, resulting in an incorrect number of failed tasks to be generated in the executor log.
    You can also use charts to view application performance based on running and completed tasks and application resource usage based on cores and memory: the Running Tasks and Completed Tasks charts show data every minute; if an application completes within a minute, the charts do not show data. The Cores Used and Memory Used charts show data every 30 seconds; if an application completes within 30 seconds, these charts do not show data.
    Tip: By default, metrics for the Running Tasks and Completed Tasks charts are retrieved from Spark gauges that are written every minute. To change the frequency the metrics are written, tune the default Spark metrics sink.

    If there are data connectors configured for the instance group, you can also view data connectors that are used by the Spark application from this page. To check the status of data connectors from this page, see Checking data connector status.

    1. Click the Drivers and Executors tab to check the driver and executor performance, resource usage, activity that is related to the resource orchestrator; and to download logs for the drivers and executors.

      For applications in the Error state, you can view this information from the Instance Groups page. See Debugging Spark applications.

    2. Click the Performance tab for a graphical view of the application's running and completed tasks over a specific time range, as well as the task durations.
    3. Click the Resource Usage tab for a graphical view of resource usage for the application. To view data for the application within a specific duration, select Custom time period from the drop-down menu, enter the duration and click Update Charts.