Start Jobs
The Start Jobs processor starts one or more Control Hub jobs in parallel upon receiving a record. The processor can also start job instances from a job template.
The Start Jobs processor is an orchestration stage that you use in orchestration pipelines. Orchestration stages perform tasks, such as schedule and start pipelines and Control Hub jobs, that you can use to create an orchestrated workflow across StreamSets. For example, an orchestration pipeline can use the Cron Scheduler origin to generate a record every weekday at 9 AM that triggers the Start Jobs processor, which starts a set of Control Hub jobs.
After performing its task, the Start Jobs processor updates the orchestration record, adding details about the jobs that it started. Then, it passes the record downstream. You can pass the record to an orchestration stage to trigger another task. Or, you can pass it to a non-orchestration stage to perform other processing.
When you configure the Start Jobs processor, you specify the Control Hub URL, and the jobs or job template to start. You can also specify runtime parameters for each job or job instance.
You can configure the processor to reset the origins in the jobs when possible, and to run the jobs in the background. When running jobs in the background, the processor immediately updates and passes the input record downstream instead of waiting for the jobs to finish.
You also configure the credentials used to run the job. You can optionally configure properties used to maintain the HTTP connection, to establish an HTTP proxy, and to enable SSL/TLS.
You can also use a connection to configure the processor.
Job Execution and Data Flow
- Run jobs in the foreground
- By default, the processor starts jobs that run in the foreground. When the jobs run in the foreground, the processor updates and passes the orchestration record downstream after all the started jobs complete.
- Run jobs in the background
- You can configure the processor to start jobs that run in the background. When jobs run in the background, the processor updates and passes the orchestration record downstream immediately after starting the jobs.
Generated Record
The Start Jobs processor updates the orchestration record that it receives with information about the jobs that it starts.
Field Name | Description |
---|---|
<unique task name> | List Map field within the orchestratorTasks
field of the record. Contains the following subfields:
|
<job ID> | List Map field within the jobResults field that provides
details about each job. Contains the following fields:
|
For example, the following preview shows information provided by a Start Jobs processor
with the start load job
task name:
Note that the job status and colors indicate that the jobs are running at the time that
the processor creates the record. There is no finishedSuccessfully
field because the jobs have not yet completed.
For an example of a full orchestration record, see ../Orchestration_Pipelines/OrchestrationPipelines_Title.html#concept_x43_wlc_zlb__section_qtk_mlq_zlb.
Suffix for Job Instance Names
For job instances created or started from a job template, Control Hub appends a suffix to uniquely name each job instance.
The suffix is added to the job template name after a hyphen, as follows:
<job template name> - <suffix>
- Counter
- Control Hub appends a number to the job template name. For
example, job instances created from the Web Log
Collection Job are named as follows:
Web Log Collection Job - 1
Web Log Collection Job - 2
- Timestamp
- Control Hub appends a timestamp indicating when the job instance is started to the
job template name. For example, job
instances created from the Web Log Collection Job are named as
follows:
Web Log Collection Job - 2021-10-22
Web Log Collection Job - 2021-10-23
- Parameter Value
- Control Hub appends the value of the specified parameter to the job template name. For example, job instances created
from the Web Log Collection Job are named as follows:
Web Log Collection Job - /server1/logs
Web Log Collection Job - /server2/logs
Runtime Parameters for Jobs
When you configure the Start Jobs processor to start job instances from templates, you must specify the runtime parameters for each job instance that you want the processor to start. You can also specify runtime parameters when you configure the processor to start jobs.
You can use functions from the StreamSets expression language to define parameter values.
When you configure runtime parameters in the Start Jobs processor, you must enter the runtime parameters as a JSON object, specifying the parameter names and values as key-value pairs. The parameter names must match runtime parameters defined for the pipeline that the job runs.
The format that you use differs depending on whether you are specifying parameters for a job or job instance:
- Format for jobs
- When configuring runtime parameters for a job, you specify one JSON object with all of the parameters that you want to define.
- Format for job instances
- When configuring runtime parameters for a job template, you specify one JSON object for each job instance that you want the processor to run.
[
{
"FileDir": "/server1/logs",
"ErrorDir": "/server1/errors"
},
{
"FileDir": "/server2/logs",
"ErrorDir": "/server2/errors"
}
]
Configuring a Start Jobs Processor
Configure a Start Jobs processor to start Control Hub job upon receiving a record. The Start Jobs processor is an orchestration stage that you use in orchestration pipelines.