Submitting Spark batch applications to an instance group

Submit Spark batch applications to run on an instance group immediately.

Before you begin

You must be a cluster or consumer administrator, consumer user, or have the Spark Applications Submit permission to submit Spark batch applications to an instance group.
The instance group to which you submit Spark batch applications must be in the Started state, or the batch master URL must be available. The batch master URL is available when services associated with the instance group (such as notebooks) are started but the instance group itself is not.

Note: Spark batch applications, by default, run as the consumer execution user for the driver and executor. To run Spark batch applications as the submission user, you need to select Enable impersonation to have Spark applications run as the submission user in the instance group's configuration.

About this task

Follow this task to submit a Spark batch application immediately using the cluster management console. To schedule a Spark batch application to run at a particular time or to run periodically at specific intervals, see Scheduling Spark batch application submission to an instance group.

Based on your permissions, you can submit Spark batch applications to instance groups from the My Applications & Notebooks page and the Instance Groups page. For a list of permissions that are required to view instance groups and submit Spark batch applications from the Instance Groups page, see Permission list.

The following parameters cannot be overridden if defined in the spark-submit command or other REST requests, such as cURL. The values are taken from the application's instance group configuration:

spark.authenticate
spark.shuffle.service.port
spark.ego.app.schedule.policy / SPARK_EGO_APP_SCHEDULE_POLICY
spark.ego.auth.mode / SPARK_EGO_AUTH_MODE
spark.ego.driver.container.type / SPARK_EGO_DRIVER_CONTAINER_TYPE
spark.ego.executor.container.type / SPARK_EGO_EXECUTOR_CONTAINER_TYPE
spark.ego.gpu.slots.per.task / SPARK_EGO_GPU_SLOTS_PER_TASK
spark.ego.impersonation / SPARK_EGO_IMPERSIONATION
spark.ego.logservice.port / SPARK_EGO_LOGSERVICE_PORT
spark.ego.slots.per.task or SPARK_EGO_SLOTS_PER_TASK

Procedure

From the cluster management console, go to My Applications & Notebooks.

To submit Spark batch applications from the Instance Groups page, click the instance group to submit the Spark batch application and click the Applications tab.
Click Run Application.
For Other options, write the spark-submit command:
1. Check the --master option that specifies the primary URL of the instance group to submit the Spark batch application. If required, click Change master to specify the primary URL where the Spark batch application must run.
2. Enter other options for the spark-submit command in the text box. Your spark-submit syntax could be:
```
--class main-class application-jar [application-arguments]
```
  - --class main-class is the fully qualified name of the class that contains the main method for the Java and Scala application. For SparkPi, the main class would be org.apache.spark.examples.SparkPi.
  - application-jar is the .jar file that contains your application and all its dependencies. For SparkPi, this value could be deployment_dir/spark-2.1.1-hadoop-2.7/examples/jars/spark-examples_2.11-2.1.1.jar.
  - (Optional) application-arguments are any arguments that must be passed to the main method of your main class.
  Tip: Use these tips to submit your Spark batch application:
  - To submit a sample application, click Load sample command. The system loads the command options to submit SparkPi, a sample application that is packaged with Spark and computes the value of Pi.
  - To submit Spark batch applications by using RESTful APIs, click Convert to REST command. The system converts the command to a REST command that you can use with cURL or other tools and scripting languages to submit the Spark batch application.
Select the Enable data connectors check box to enable data connectors for the Spark application:
1. Select the data connectors that you want to enable for the Spark application.
2. Select the data connector that specifies the fs.default.FS parameter in the Hadoop configuration file from the drop-down menu.
Click Submit.

Results

The Spark batch application is submitted to the specified instance group.

What to do next

Monitor the Spark batch application that is associated with the instance group. See Monitoring Spark applications.