Databricks Job Launcher
The Databricks Job Launcher executor starts a Databricks job each time it receives an event. You can run jobs based on notebooks or JARs. For information about supported versions, see Supported Systems and Versions.
Use the executor to start a Databricks job as part of an event stream. You can use the executor in any logical way, such as running Databricks jobs after the Hadoop FS, MapR FS, or Amazon S3 destination closes files.
Note that the Databricks Job Launcher executor starts a job in an external system. It does not monitor the job or wait for it to complete. The executor becomes available for additional processing as soon as it successfully submits a job.
Before you use the executor, perform the necessary prerequisites.
When you configure the executor, you specify the cluster base URL, job type, job ID, and user credentials. You can optionally configure job parameters and security such as an HTTP proxy and SSL/TLS details.
You can configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Prerequisites
- Create the job.
The Databricks Job Launcher executor can start jobs based on notebooks or JARs.
- Optionally configure the job to allow concurrent runs.
By default, Databricks does not allow running multiple instances of a job at the same time. With the default, if the Databricks Job Launcher executor receives multiple events in quick succession, it starts multiple instances of the job, but Databricks queues those instances and runs them one by one.
To enable parallel processing, in Databricks, configure the job to allow concurrent runs. You can configure the maximum number of concurrent runs through the Databricks API with the max_concurrent_runs parameter, or through the UI using the Jobs > Advanced menu and the Maximum Concurrent Runs property.
- Save the job and note the job ID.
When you submit the job, Databricks generates a job ID. Use the job ID when you configure the Databricks Job Launcher executor.
Event Generation
The Databricks Job Launcher executor can generate events that you can use in an event stream. When you enable event generation, the executor generates events each time it starts a Databricks job.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
Since the executor events include the run ID for each started job, you might generate events to keep a log of the run IDs.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Event records generated by the Databricks Job Launcher executor have the following event-related record header attributes. Record header attributes are stored as String values:
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses the following type:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
Event Field Name | Description |
---|---|
app_id | Run ID of the Databricks job. |
Monitoring
Data Collector does not monitor Databricks jobs. Use your regular cluster monitor application to view the status of jobs.
Jobs started by the Databricks Job Launcher executor display using the job ID specified in the stage. The job ID is the same for all instances of the job. You can find the run ID for a particular instance in the Data Collector log.
The executor also writes the run ID of the job to the event record. To keep a record of all run IDs, enable event generation for the stage.
Configuring a Databricks Job Launcher Executor
Configure a Databricks Job Launcher executor to start a Databricks job each time the executor receives an event record.