Submit and monitor a Spark batch application

In this lesson, you submit and monitor a Spark batch application.

About this task

A Spark application is submitted either by the spark-submit command (known as a Spark batch application) or from a notebook. It includes a driver program and executors, and runs various parallel operations in the cluster. Spark batch applications, by default, run as the consumer execution user for the drivers and executors.

You can submit Spark batch applications to run on a Spark instance group immediately by using the spark-submit command from the My Applications & Notebooks page and the Spark Instance Groups page. You can also submit Spark batch applications from the CLI by using the spark-submit command in the Spark deployment directory from either inside or outside the cluster, or by using ascd Spark application RESTful APIs. Your spark-submit syntax can be:
--class main-class application-jar [application-arguments]
  • --class main-class is the fully qualified name of the class that contains the main method for the Java and Scala application. For SparkPi, the main class would be org.apache.spark.examples.SparkPi.
  • application-jar is the .jar file that contains your application and all its dependencies. For SparkPi, this value might bedeployment_dir/spark-2.0.1-hadoop-2.7/examples/jars/spark-examples_2.11-2.0.1.jar.
  • (Optional) application-arguments are any arguments that must be passed to the main method of your main class.

You can use the open source spark-submit command on the CLI from the directory Spark was deployed to. For this tutorial you use the cluster management console.

You can also submit batch applications by using RESTful APIs. The system converts the command to a REST command that you can use with cURL or other tools and scripting languages to submit the batch application.

In this lesson, you submit SparkPi, a sample Spark application that is packaged with Spark and computes the value of Pi.

This lesson uses the following concepts:
Concept Description
finished state Indicates that application execution is complete.
Spark master An application option that specifies the master URL of the Spark instance group to submit the batch application. If required, click Change master to specify the Spark master URL to where the batch application should be submitted.
running state Indicates that application execution is in progress.
Spark application An application started by using either the spark-submit command or a notebook. A Spark application has a driver program and runs various parallel operations in the cluster.
Spark batch application An application started by using the spark-submit command.

To submit a new Spark batch application:

Procedure

  1. Click the Overview tab.
  2. Click Submit Application.
  3. Click Load sample command.
  4. Click Submit.
    The SparkPi batch application is submitted to the sample Spark instance group. Notice that the state changes from running to finished.
    Tip: You have to click Refresh in the cluster management console to update. You also have the option to select an automatic refresh for applications.

    Once the application has completed, it is in the finished state.

To monitor the application:

  1. Click the Spark Pi application.
    1. In the Overview tab, look for any highlighted issues. If any issues occur, details are displayed to help you quickly debug. See Debugging Spark applications.

      From this tab, you can download the driver logs by clicking the download icon (Download icon.). You can also check the error messages to look for any issues that need further investigation. If there are driver or executor error messages to view, click the Error messages link to direct you to the Drivers and Executors tab and download the driver and executor logs.

    2. Click the Drivers and Executors tab to check the driver and executor logs, resource usage, and activity that is related to the driver and executor running processes.
      From here, you can download the logs. For each executor, we show the number of failed tasks and the number of errors in the standard error logs.
    3. Click the Performance tab to check the application's running and completed tasks over a specific time range, and the task durations.
    4. Click the Resource Usage tab for a graphical view of resource usage for the application. To view data for the application within a specific duration, select Custom time period from the menu, enter the duration, and click Update Charts.

Results

You have submitted and reviewed a new Spark batch application.

Summary

In this lesson, you learned how to submit and monitor a Spark batch application.

In the next lesson, you schedule a Spark batch application.