Scheduling Spark batch application submission to an instance group

Submit Spark batch applications to run on a instance group according to a schedule. You can schedule the Spark batch application to run just once at a specified time, or to run periodically at a specified time or interval.

Before you begin

  • You must be a cluster or consumer administrator, consumer user, or have the Spark Applications Submit permission to create Spark batch application schedules for an instance group.
  • The instance group to which you submit Spark batch applications must be in the Started state, or the batch master URL must be available. The batch master URL is available when services associated with the instance group (such as notebooks) are started but the instance group itself is not.
Note: When you submit Spark batch applications to an instance group using the spark-submit command, the Spark batch applications, by default, run as the consumer execution user for the driver and executor. To run Spark batch applications as the OS user, edit the instance group configuration and set SPARK_EGO_IMPERSONATION to true. When SPARK_EGO_IMPERSONATION=true and Enable authentication and authorization for the submission user is selected for the instance group (enabling PAM authentication), applications run as the user who created the schedule.

About this task

Follow this task to schedule a Spark batch application to run at a particular time or to run periodically at specific intervals.

Based on your permissions, you can schedule Spark batch applications from the My Notebooks (or My Notebooks & Applications) page and the Instance Groups page. For a list of permissions required to view instance groups and create Spark batch application schedules from the Instance Groups page, see Permission list.

Procedure

  1. From the cluster management console, go to Workload > My Notebooks & Applications, and click the Application Schedules tab.

    If you are on the My Notebooks page, first select the Show Applications checkbox to display Spark application information, then select the appropriate applications tab.

    To schedule a Spark batch application from Instance Groups page, go to the Instance Groups and click the instance group to create the Spark batch application schedule. Then, click Applications > Application schedules.

  2. Click Next.
  3. In the Schedule Spark Application dialog, write the spark-submit command, much as you would to submit applications from the spark-submit command line.
    1. Select the instance group to which you want to submit the Spark batch application. The system dynamically determines a Spark master based on those available in the instance group at the time.
    2. Enter other options for the spark-submit command in the text box.
      Tip: Use these tips to help submit your Spark batch application:
      • To submit Spark batch applications using RESTful APIs, click Convert to REST command. The system converts the command to a REST command that you can use with cURL or other tools and scripting languages to submit the Spark batch application.
      • To submit a sample application, click Load sample command. The system loads the command options to submit SparkPi, a sample application that is packaged with Spark and computes the value of Pi.
  4. Select the Enable data connectors check box to enable data connectors for the Spark batch application:
    1. Select the data connectors that you want to enable for the Spark batch application.
    2. Select the data connector that specifies the fs.default.FS parameter in the Hadoop configuration file from the drop-down menu.
  5. Click Create Schedule.

Results

The Spark batch application is scheduled for submission to the instance group and will run at the specified time.

If the instance group for the Spark batch application is restarted, only those Spark batch applications scheduled to run in the future are triggered. For Spark batch applications scheduled to run at specified intervals (for example, every two hours), if the start time has passed, the Spark batch application is triggered based on the startup time of the batch master.

What to do next

If the Spark batch application is submitted to the instance group, monitor the Spark batch application. See Monitoring Spark applications.

If the Spark batch application is scheduled for submission, manage the schedule as required: you can modify the schedule, pause and resume the Spark batch application submission, or remove the schedule altogether. See Managing Spark batch application schedules.