Wait for Jobs
The Wait for Jobs processor waits for one or more jobs or job instances to complete.
The Wait for Jobs processor is an orchestration stage that you use in orchestration pipelines. Orchestration stages perform tasks, such as schedule and start pipelines and Control Hub jobs, that you can use to create an orchestrated workflow across StreamSets.
Use this processor when you want to wait for jobs that were started upstream to complete, before performing other orchestration tasks. For example, you might use this processor to wait for jobs to complete that were started by a Start Jobs origin, before starting additional related jobs with a Start Jobs processor.
The Wait for Jobs processor checks the status of all jobs listed in the incoming orchestration records. When the jobs complete, the processor updates the jobs status details in the record and passes a single orchestration record downstream.
When you configure the Wait for Jobs processor, you specify the Control Hub URL where the jobs or job instances run, and you specify how long to wait between job-status checks. You also configure the Control Hub credentials used to monitor jobs. You can optionally configure properties used to maintain the HTTP connection, to establish an HTTP proxy, and to enable SSL/TLS.
You can also use a connection to configure the processor.
Stage Processing and Pipeline Implementation
Use the Wait for Jobs processor downstream from a Start Jobs origin or Start Jobs processor that starts jobs that run in the background. When running jobs in the background, a Start Jobs stage passes its orchestration record downstream immediately after starting jobs, rather than waiting for them to complete.
When a Wait for Jobs processor receives an orchestration record, it uses the job IDs listed in the record to check for the status of those jobs with the Control Hub URL specified in the stage. After all of the jobs complete, the processor updates the job status information in the orchestration record and passes the record downstream.
If you pass orchestration records from multiple stages to the processor, the processor waits until all jobs associated with those records are complete, then passes a single merged orchestration record downstream.
For example, instead of using a Wait for Jobs processor immediately after a Start Jobs origin that starts a job in the background, you can just configure the origin to run the job in the foreground. Then, the Start Jobs origin passes its orchestration record downstream after the job completes, with no need for a Wait for Jobs processor.
In contrast, say you want three jobs and a pipeline to start when you start your orchestration pipeline. You also want them all to complete before starting an additional set of jobs. To do this, you create the following pipeline:
You configure a Start Jobs origin to start the three jobs in the background, which passes an orchestration record to a Start Pipelines processor as soon as the jobs start. This enables the jobs and the pipeline to run concurrently.
You configure the Start Pipelines processor to run its pipeline in the foreground, so the processor passes the updated orchestration record downstream only after the pipeline completes. That takes care of the pipeline, but the jobs may still be running. To ensure that the jobs complete before starting the next set of jobs, you add the Wait for Jobs processor.
When the processor receives the orchestration record from the Start Pipelines processor, it notes the IDs of the jobs that were started by the Start Jobs origin, and waits for them to complete. After all of the jobs complete, the Wait for Jobs processor updates job status information in the orchestration record and passes the record to the Start Jobs processor, which starts the additional set of jobs.
Generated Record
When the Wait for Jobs processor completes its task, it updates the job status and related information in the orchestration record before passing the record downstream.
Field Name | Description |
---|---|
<unique task name>/success | Boolean field that indicates whether all jobs
completed successfully.
The processor adds this field. |
<job ID>/jobStatus | Status of the job. For more information see
Job Status The processor updates this field. |
<job ID>/jobStatusColor | Status color of the job. For more
information see Job Status. The processor updates this field. |
<job ID>/errorMessage | Error message associated with the
job. The processor updates this field as needed. |
<job ID>/finishedSuccessfully | Boolean field that
indicates whether a job completed successfully. Contains the following
field:
The processor adds these fields. |
For example, the following preview shows the fields that a Wait for Jobs processor adds and updates in comparison to the incoming record:
Notice how the processor updated the jobStatus
and
jobStatusColor
fields, and added the
finishedSuccessfully
, jobMetrics
, and
success
fields. All of the changes indicate that the job completed
successfully.
Configuring a Wait for Jobs Processor
Configure a Wait for Jobs processor to wait for Control Hub jobs to complete before passing an orchestration record downstream. The Wait for Jobs processor is an orchestration stage that you use in orchestration pipelines.