Submit Spark batch applications to run on a instance group according to a schedule. You can
schedule the Spark batch application to run just once at a specified time, or to run periodically at
a specified time or interval.
Before you begin
- You must be a cluster or consumer administrator, consumer user, or have the Spark Applications
Submit permission to create Spark batch application schedules for an instance group.
- The instance group to which you
submit Spark batch applications must be in the Started state, or the
batch master URL must be available.
The batch master URL is available
when services associated with the instance group (such as notebooks) are started
but the instance group itself is not.
Note: When you submit Spark batch applications to an instance group using the
spark-submit command, the Spark batch applications, by default, run as the
consumer execution user for the driver and executor. To run Spark batch applications as the OS user,
edit the instance group configuration
and set SPARK_EGO_IMPERSONATION to true.
When SPARK_EGO_IMPERSONATION=true and
Enable authentication and authorization for the submission user is selected
for the instance group (enabling PAM
authentication), applications run as the user who created the schedule.
About this task
Follow this task to schedule a Spark batch application to run at a particular time or to
run periodically at specific intervals.Based on your permissions, you can schedule Spark batch
applications from the My Notebooks (or My
Notebooks & Applications)
page and the Instance Groups page. For a list of permissions required to view instance groups and create Spark batch
application schedules from the Instance Groups page, see Permission list.
Procedure
-
From the cluster management console, go to , and
click the Application Schedules tab.
If you are on the My Notebooks page, first select the
Show Applications
checkbox to display Spark application information, then select the appropriate applications tab.
To schedule a Spark batch application from Instance Groups page, go to the Instance Groups and click the instance group to create the Spark batch
application schedule. Then, click .
-
Click Next.
-
In the Schedule Spark Application dialog, write the
spark-submit command, much as you would to submit applications from the
spark-submit command line.
-
Select the instance group to which
you want to submit the Spark batch application. The system dynamically determines a Spark master based on those available in the
instance group at the time.
-
Enter other options for the spark-submit command in the text box.
Tip: Use these tips to help submit your Spark batch application:
- To submit Spark batch applications using RESTful APIs, click Convert to REST
command. The system converts the command to a REST command that you can use with cURL or
other tools and scripting languages to submit the Spark batch application.
- To submit a sample application, click Load sample command. The system
loads the command options to submit SparkPi, a sample application that is packaged with Spark and
computes the value of Pi.
-
Select the Enable data connectors check box to enable
data connectors for the Spark batch
application:
-
Select the data connectors that
you want to enable for the Spark batch application.
-
Select the data connector that
specifies the fs.default.FS parameter in the Hadoop configuration file from the
drop-down menu.
-
Click Create Schedule.
Results
The Spark batch application is scheduled for submission to the instance group and will run at the specified
time.If the instance group for the
Spark batch application is restarted, only those Spark batch applications scheduled to run in the
future are triggered. For Spark batch applications scheduled to run at specified intervals (for
example, every two hours), if the start time has passed, the Spark batch application is triggered
based on the startup time of the batch master.
What to do next
If the Spark batch application is submitted to the instance group, monitor the Spark batch
application. See Monitoring Spark applications.
If the Spark batch application is scheduled for submission, manage the schedule as required: you
can modify the schedule, pause and resume the Spark batch application submission, or remove the
schedule altogether. See Managing Spark batch application schedules.