Technical Blog Post
Abstract
Scheduling Spark batch applications
Body
With IBM Spectrum Conductor with Spark v2.1.0, you can now schedule Spark batch applications. Scheduling enables you to periodically submit Spark batch applications to a Spark instance group to run at a specified time or at a specified interval, or a combination of both. You can specify a simple repeating interval (for example, every x hour), or a complex schedule specified as a cron expression (for example, every Monday and Tuesday at 7 am).
Scheduling is convenient especially in the financial domain, where users periodically run multiple jobs. With this option, a user can set up a schedule to submit their routine job just once and the schedule is executed according to the specified schedule. By scheduling Spark batch application, you can save time as well as cluster resources.
You can set up scheduling both from the RESTful APIs and the web-based cluster management console. Once you schedule a batch application, you can manage its entire lifecycle, right from viewing details of scheduled batch applications and monitoring results to pausing/resuming scheduled batch applications and viewing history of results.
This blog provides a quick look at how you can set up and manage schedules for batch applications. For complete information on managing schedules, refer to the IBM Knowledge Center.
Schedule a batch application
Schedule a Spark batch application to run on a Spark instance group either from the cluster management console or from the REST API. You must have created a Spark instance group to which you can submit batch applications.
From the management console
To schedule a batch application from the management console, go to Spark > Applications & Notebooks > Run or Schedule a batch application, enter the Spark application command and click Schedule.
There are two ways to create a schedule.
- Pick a submission time and interval: In this case, set the interval and the timezone of the master host or the client running the management console.
- Write a cron expression: In this case, enter a cron expression and set the submission time. Ensure that the cron expression you specify is supported by Quartz, an open-source framework. IBM Spectrum Conductor with Spark embeds Quartz version 1.6.5. See http://www.quartz-scheduler.org/ for detailed information on cron expressions.
Take note that if you set a submission time in the past, the batch application will run as scheduled. For example, if the current time is 1 pm June 27, and the schedule specifies the submission time as 10 am June 27 and the repeat interval as 2 hours, based on the current time, the batch application is scheduled to run first at 2 pm June 27 (rather than at 12 pm which is in the past). For more information, see Scheduling batch application submission to a Spark instance group
From the REST API
To schedule a batch application from the REST API, use /platform/rest/conductor/v1/instances/{id}/schedules, where the id attribute represents the UUID of the target Spark Instance Group.
Scheduling batch applications from the REST API involves the following parameters:
- name: Scheduled batch application name.
- command: Spark batch command.
- repeatinterval (optional): Repeat interval for the schedule. Enter a positive number followed by h/H to represent hours, or d/D to represent days. For example, to schedule batch application submission every two days, enter 2d.
- scheduleexpression (optional): Cron expression for the schedule. For example, to submit a batch application every Monday and Tuesday at 7 am, enter 0 0 7 ? * MON,TUE *
- startdate (optional): Start date for the schedule.
- timezone (optional): Timezone of the master host or the client running the cluster management console for startdate.
Here is an example:
For more information, see the REST API Reference
Pause/resume a scheduled batch application
Once a schedule is created, you can pause or resume your scheduled batch applications. To pause or resume scheduled batch applications that were created by other users, you need additional permissions (see subsequent section on Permissions for managing schedules). While you can resume all your scheduled batch applications, even with all assigned permissions, you cannot resume another user’s scheduled batch applications.
From the management console
To pause or resume a scheduled batch application from the management console, go to the Scheduled Applications tab on the Applications & Notebooks page, or the Applications tab for a Spark instance group on the Spark Instance Groups page. Select the scheduled batch applications you want to stop and, depending on the state, click Pause or Resume.
For more information, see Pausing a scheduled batch application and Resuming a scheduled batch application .
From the REST API
To pause a scheduled batch application from the REST API, use /platform/rest/conductor/v1/instances/{id}/schedules/{name}/pause.
Here is an example:
curl -u Admin:Admin -k -H 'Accept: application/json' -X PUT https://9.111.254.189:8643/platform/rest/conductor/v1/instances/b828ebff-1182-4eb0-8341-c4d805836a09/schedules/schedule4/pause
To resume a scheduled batch application from the REST API, use /platform/rest/conductor/v1/instancecs/{id}/schedules/{name}/resume.
View list of scheduled batch applications
You can view all scheduled batch applications belonging to a Spark Instance Group.
From the management console
To view scheduled batch applications from the management console, go to the Scheduled Applications tab on the Applications & Notebooks page, or the Applications tab for a Spark instance group on the Spark Instance Groups page.
From the REST API
To view a list of scheduled batch applications from the REST API, use /platform/rest/conductor/v1/instances/{id}/schedules.
To view details about a schedule, add the scheduled batch application name to the URL as follows: /platform/rest/conductor/v1/instances/{id}/schedules/{name}.
View history of results for a scheduled batch application
You can view detailed information of scheduled batch applications submitted to a Spark Instance Group.
From the management console
To view the history of batch applications submitted to a Spark instance group from the management console, go to the Scheduled Applications tab on the Applications & Notebooks page, or the Applications tab for a Spark instance group on the Spark Instance Groups page. Click the date/time link in the Last Submitted column. You can also download logs to see the results of this batch application.
For more information, see Viewing history of a scheduled batch application .
From the REST API
To view history of batch application submitted to a Spark instance group from the REST API, use /platform/rest/conductor/v1/instances/{id}/schedules/{name}/applications.
Permissions for managing schedules
As a consumer user (or as a user assigned the SPARK_INSTANCEGROUP_VIEW permission), you can create batch application schedules for Spark instance groups to which you have access and manage the schedules yourself.
If you are an administrator or have the following permissions assigned to your user account, you can manage batch applications scheduled by other users:
- If your user account is assigned the EGO_CONSUMER_VIEW permission, you can view schedules created by other users for the Spark Instance Groups that they have access to.
- If your user account is assigned the SPARK_INSTANCEGROUP_CONTROL permission, you can manage schedules created by other users for the Spark Instance Groups that they have access to. Note though that you cannot resume another user’s scheduled batch application.
Scheduled batch application states
A scheduled batch application can have any of the following states: Active, Paused, or Finished. The Finished state only occurs for a batch application that is scheduled to run just once.
Sometimes, you may see scheduled batch applications in the Expired Credentials and Unauthorized User states. A scheduled batch application can go into the Expired Credentials state when cluster service ascd is down for more than 7 hours and user tokens expire. When this occurs, you must take steps to reactivate these schedules when the ascd is back up. A scheduled batch application can go into the Unauthorized User state when a user changes their password after creating a scheduled batch application. When this occurs, you must take steps to reactivate the scheduled batch application.
Typically, once the user resumes schedules in these states, any related batch applications go into the Active state. For more information, see Batch application schedule states .
Let us know!
We would love to hear your feedback – let us know what you think about this new feature. If you’ve got questions, get in touch with us in our forum!
To try out IBM Spectrum Conductor with Spark, download an evaluation version from our Service Management Connect page.
UID
ibm16163797