Submit Spark batch applications to run on an instance group immediately.
Before you begin
- You must be a cluster or consumer administrator, consumer user, or have the Spark Applications
Submit permission to submit Spark batch applications to an instance group.
- The instance group to which you
submit Spark batch applications must be in the Started state, or the
batch master URL must be available.
The batch master URL is available
when services associated with the instance group (such as notebooks) are started
but the instance group itself is
not.
Note: Spark batch applications, by default, run as the consumer execution user for the driver and
executor. To run Spark batch applications as the submission user, you need to select
Enable impersonation to have Spark applications run as the submission user in
the instance group's
configuration.
About this task
Follow this task to submit a Spark batch application immediately using the cluster management
console. To schedule a Spark batch application to run at a particular time or to run periodically at
specific intervals, see Scheduling Spark batch application submission to an instance group.
Based on your permissions, you can submit Spark batch applications to instance groups from the My
Applications & Notebooks page and the Instance Groups page. For a list of permissions that are required to view instance groups and submit Spark batch
applications from the Instance Groups page, see Permission list.
The following parameters cannot be overridden if defined in the
spark-submit
command or other REST requests, such as cURL. The values are taken from the application's
instance group configuration:
- spark.authenticate
- spark.shuffle.service.port
- spark.ego.app.schedule.policy /
SPARK_EGO_APP_SCHEDULE_POLICY
- spark.ego.auth.mode / SPARK_EGO_AUTH_MODE
- spark.ego.driver.container.type /
SPARK_EGO_DRIVER_CONTAINER_TYPE
- spark.ego.executor.container.type /
SPARK_EGO_EXECUTOR_CONTAINER_TYPE
- spark.ego.gpu.slots.per.task /
SPARK_EGO_GPU_SLOTS_PER_TASK
- spark.ego.impersonation /
SPARK_EGO_IMPERSIONATION
- spark.ego.logservice.port /
SPARK_EGO_LOGSERVICE_PORT
- spark.ego.slots.per.task
or SPARK_EGO_SLOTS_PER_TASK
Procedure
-
From the cluster management console, go to
My Applications & Notebooks.
To submit Spark batch applications from the Instance Groups page, click the instance group to submit the Spark batch
application and click the Applications tab.
-
Click Run Application.
-
For Other options, write the spark-submit
command:
-
Check the --master option that specifies the primary URL of the instance group to submit the Spark batch
application. If required, click Change master to specify the primary URL where the Spark batch
application must run.
-
Enter other options for the spark-submit command in the text box. Your
spark-submit syntax could be:
--class main-class application-jar [application-arguments]
- --class main-class is the fully qualified name of the
class that contains the main method for the Java and Scala application. For
SparkPi, the main class would be org.apache.spark.examples.SparkPi.
- application-jar is the .jar file
that contains your application and all its dependencies. For SparkPi, this value could be
deployment_dir/spark-2.1.1-hadoop-2.7/examples/jars/spark-examples_2.11-2.1.1.jar.
- (Optional) application-arguments are any arguments
that must be passed to the main method of your main class.
Tip: Use these tips to submit your Spark batch application:
- To submit a sample application, click Load sample command. The system
loads the command options to submit SparkPi, a sample application that is packaged with Spark and
computes the value of Pi.
- To submit Spark batch applications by using RESTful APIs, click Convert to REST
command. The system converts the command to a REST command that you can use with cURL or
other tools and scripting languages to submit the Spark batch application.
-
Select the Enable data connectors check box to enable
data connectors for the Spark
application:
-
Select the data connectors that
you want to enable for the Spark application.
-
Select the data connector that
specifies the fs.default.FS parameter in the Hadoop configuration file from the
drop-down menu.
-
Click Submit.
Results
The Spark batch application is submitted to the specified instance group.
What to do next
Monitor the Spark batch application that is associated with the instance group. See Monitoring Spark applications.