Managing Jobs
After publishing pipelines, create a job for each pipeline that you want to run. You can create a job in the following ways:
- From a pipeline
- When you create a job from a pipeline, you configure all of the job details. You can create a single job instance at a time.
- From a job template
- When you create a job from a job template, the job definition is already determined, and you configure pipeline parameter values only. You can create multiple job instances at the same time. The job instances have the same job definition, but different runtime parameter values.
When you start a job, Control Hub sends an instance of the pipeline to an execution engine assigned all labels assigned to the job.
When a job is active, you can synchronize or stop the job.
When a job is inactive, you can reset the origin for the job, edit the job, or delete the job.
When a job is active or inactive, you can edit the latest pipeline version, upgrade the job to use the latest pipeline version, or schedule the job to start, stop, or upgrade on a regular basis.
Creating a Job from a Pipeline
When you create a job from a pipeline, you configure all of the job details.
- In the Navigation panel, click Create a Job Instance
icon:
.
, and then click the - In the Navigation panel, click Create Job icon:
.
, select a published pipeline, and then click the
Define the Job
Define the job essentials, including the job name, and optionally a description and tags to identify similar jobs.
Select the Pipeline
Select the published pipeline that you want to add to the job and then run.
Configure the Job
Configure the job to determine how engines run the pipeline.
Define the Runtime Parameters
If the selected pipeline uses runtime parameters, define the parameter values to use when the job starts.
Review and Start the Job
You've successfully finished creating the job.
- Exit - Saves the job and exits the wizard, displaying the new job in the Job Instances view.
- Start & Monitor Job - Starts the job and displays the job in the canvas so that you can monitor the progress.
Creating a Job from a Job Template
When you create a job from a job template, the job definition is already determined, and you configure pipeline parameter values only. You can create and start multiple job instances at the same time. The job instances have the same job definition, but different runtime parameter values.
- In the Navigation panel, click Create a Job Instance
icon:
.
, and then click the - In the Navigation panel, click More
icon (
), and then click Create Instances.
, select a job template, click the
Define the Job
Define the job essentials, including the job name, and optionally a description and tags to identify similar jobs.
Select the Job Template
Select the job template that you want to use to create the job instances, and then optionally configure advanced options.
Define the Runtime Parameters
If the pipeline included in the job template uses runtime parameters, define the parameter values to use when the job instances start.
Review and Start the Job
You've successfully finished creating and starting the job instances.
Click Exit, and then monitor the instances from the Job Instances view or from the parent job template details.
Starting Jobs
When you start a job in the Job Instances view, you start a single job instance. Control Hub sends an instance of the pipeline to an engine assigned all labels added to the job.
Before sending an instance of a pipeline to an engine, Control Hub verifies that the engine does not exceed its resource thresholds.
- In the Navigation panel, click .
-
Hover over the inactive job, and then click the Start
Job icon:
.
Filtering Jobs
Synchronizing Jobs
Synchronize a job when you've changed the labels assigned to Data Collectors and the job is actively running on those engines. Or, synchronize a job to trigger a restart of a non running pipeline that has encountered an error.
- Stops the job so that all running pipeline instances are stopped, and then waits until each Data Collector sends the last-saved offset back to Control Hub. Control Hub maintains the last-saved offsets for all pipeline instances in the job.
- Reassigns the pipeline instances to Data Collectors as follows, sending the
last-saved offset for each pipeline instance to a Data Collector:
- Assigns pipeline instances to additional Data Collectors that match the same labels as the job and that have not exceeded any resource thresholds.
- Does not assign pipeline instances to Data Collectors that no longer match the same labels as the job.
- Reassigns pipeline instances on the same Data Collector that matches the same labels as the job and that has not exceeded any resource thresholds. For example, a pipeline might have stopped running after encountering an error or after being deleted from that Data Collector.
- Starts the job, which restarts the pipeline instances from the last-saved offsets so that processing can continue from where the pipelines last stopped.
For example, let’s say a job is active on three Data Collectors with label Test. If you remove label Test from one of the Data Collectors, synchronize the active job so that the pipeline stops running on that Data Collector. Or, let's say that one of the three pipelines running for the job has encountered an error and has stopped running. If you synchronize the active job, Control Hub triggers a restart of the pipeline on that same Data Collector.

Job Offsets
Just as Data Collector and Transformer engines maintain the last-saved offset for some origins when you stop a pipeline, Control Hub maintains the last-saved offset for the same origins when you stop a job.
Let's look at how Control Hub maintains the offset for Data Collector pipelines. Control Hub maintains the offset for Transformer pipelines the same way:
- When you start a job, Control Hub can run a remote pipeline instance on each Data Collector assigned all labels assigned to the job. As a Data Collector runs a pipeline instance, it periodically sends the latest offset to Control Hub. If a Data Collector becomes disconnected from Control Hub, the Data Collector maintains the offset. It updates Control Hub with the latest offset as soon as it reconnects to Control Hub.
- When you stop a job, Control Hub instructs all Data Collectors running pipelines for the job to stop the pipelines. The Data Collectors send the last-saved offsets back to Control Hub. Control Hub maintains the last-saved offsets for all pipeline instances in that job.
- When you restart the job, Control Hub sends the last-saved offset for each pipeline instance to a Data Collector so that processing can continue from where the pipeline last stopped. Control Hub determines the Data Collector to use on restart based on whether failover is enabled for the job:
- Failover is disabled - Control Hub sends the offset to the same Data Collector that originally ran the pipeline instance. In other words, Control Hub associates each pipeline instance with the same Data Collector.
- Failover is enabled - Control Hub sends the offset to a different Data Collector with matching labels.
You can view the last-saved offset sent by each execution engine in the job History view.
If you want the execution engines to process all available data instead of processing data from the last-saved offset, simply reset the origin for the job before restarting the job. When you reset the origin for a job, you also reset the job metrics.
Origins that Maintain Offsets
Control Hub maintains the last-saved offset for the same origins as execution engines. Execution engines maintain offsets for some origins only.
Data Collector Origins
Data Collector maintains offsets for the following origins:
- Amazon S3
- Aurora PostgreSQL CDC Client
- Azure Blob Storage
- Azure Data Lake Storage Gen2
- Azure Data Lake Storage Gen2 (Legacy)
- Directory
- Elasticsearch
- File Tail
- Google Cloud Storage
- Groovy Scripting
- Hadoop FS Standalone
- HTTP Client
- JavaScript Scripting
- JDBC Multitable Consumer
- JDBC Query Consumer
- Jython Scripting
- Kinesis Consumer
- MapR DB JSON
- MapR FS Standalone
- MongoDB
- MongoDB Atlas
- MongoDB Atlas CDC
- MongoDB Oplog
- MySQL Binary Log
- Oracle CDC
- Oracle CDC Client
- Oracle Multitable Consumer
- PostgreSQL CDC Client
- Salesforce
- Salesforce Bulk API 2.0
- SAP HANA Query Consumer
- SFTP/FTP/FTPS Client
- SQL Server CDC Client
- SQL Server Change Tracking
Transformer Origins
Transformer maintains offsets for all origins that can be included in both batch and streaming pipelines as long as the origin has the Skip Offset Tracking property cleared.
- Delta Lake
- Kudu
- Whole Directory
Resetting the Origin for Jobs
Reset the origin when you want the execution engines running the pipeline to process all available data instead of processing data from the last-saved offset.
You can reset the origin for all inactive jobs. When you reset an origin that maintains the offset, you reset both the origin and the metrics for the job. When you reset an origin that does not maintain the offset, you reset only the metrics for the job.
To reset origins from the Job
Instances view, select jobs in the list, click the
More icon () and then click Reset
Origin.
Uploading an Initial Offset File
In most situations, you do not need to upload an initial offset file for a job. Control Hub maintains the last-saved offset when you stop a job and handles the offset when engines become unresponsive.
However, in some situations, you might want to upload an offset file to ensure that data duplication does not occur.
For example, a Data Collector engine running a pipeline loses its connection to Control Hub. The engine continues running the pipeline, storing the last-saved offset in data files on the engine machine. However, before the engine reconnects to Control Hub to report the last-saved offset, the engine unexpectedly shuts down. To restart the processing from the last-saved offset maintained by the engine, you can upload the offset file stored on the Data Collector machine.
- The job is inactive.
- The job runs a single pipeline instance.
Editing the Latest Pipeline Version
While viewing an inactive job or monitoring an active job, you can access the latest version of the pipeline to edit the pipeline.
When you view or monitor a job, Control Hub displays a read-only view of the pipeline in the pipeline canvas. To edit the latest version of the pipeline, click the Edit Job icon next to the job name, and then click Edit Latest Version of Pipeline, as follows:
Control Hub creates a new draft of the latest version of the pipeline, and opens the draft in edit mode in the pipeline canvas.
When you edit a pipeline from a job, the job is not automatically updated to use the newly edited version. You must upgrade the job to use the latest published pipeline version. When working with job templates, you upgrade the job template to use the latest version.
Upgrading to the Latest Pipeline Version
You can upgrade a job created from a pipeline or a detached job instance created from a job template to use the latest published pipeline version. To upgrade an attached job instance created from a job template, you must upgrade the job template.
When a job includes a pipeline that has a later
published version, Control Hub
notifies you by displaying the New Pipeline Version icon () next to the job.
You can simply click the icon to upgrade the job to
use the latest pipeline version. Or, you can select jobs in the Job
Instances view, click the More icon () and then click
Use Latest Pipeline Version.
When you upgrade to the latest pipeline version, the tasks that Control Hub completes depend on the following job types:
- Inactive job
- When you upgrade an inactive job, Control Hub updates the job to use the latest pipeline version.
- Active job
- When you upgrade an active job, Control Hub stops the job, updates the job to use the latest pipeline version, and then restarts the job. During the process, Control Hub displays a temporary Upgrading status for the job.
Stopping Jobs
Stop a job when you want to stop processing data for the pipeline included in the job.
When stopping a job, Control Hub waits for the pipeline to gracefully complete all tasks for the in-progress batch. In some situations, this can take several minutes.
For example, if a scripting processor includes code with a timed wait, Control Hub waits for the scripting processor to complete its task. Then, Control Hub waits for the rest of the pipeline to complete all tasks before stopping the pipeline.
When you stop a job that includes an origin that can be reset, Control Hub maintains the last-saved offset for the job. For more information, see Job Offsets.
Forcing a Job to Stop
When a job remains in a Deactivating state, you can force Control Hub to stop the job immediately.
Scheduling Jobs
You can use the Control Hub scheduler to schedule a job to start, stop, or upgrade to the latest pipeline version on a regular basis.
Alternatively, you can create a sequence to schedule a collection of jobs to run in sequenced order. A sequence can include jobs that run on different types of IBM StreamSets engines.
Editing Jobs
You can edit inactive jobs to change the job definition. You can edit jobs created from a pipeline or detached job instances created from a job template. You cannot edit attached job instances created from a job template.
Edit inactive jobs from the Job Instances view. Hover over the inactive job, and click the Edit icon: .
- Description
- Pipeline
version - You can select a different pipeline version to run.
For example, after you start a job, you realize that the developer forgot to enable a metric rule for the pipeline, so you stop the job. You inform your developer, who edits the pipeline rules in the pipeline canvas and republishes the pipeline as another version. You edit the inactive job to select that latest published version of the pipeline, and then start the job again.
Important: If you edit the job so that it contains a new pipeline version with a different origin, you must reset the origin before restarting the job. - Execution Engine Labels - You can assign and remove labels from the job to change the group of execution engines that run the pipeline.
- Job Tags - You can assign and remove tags from the job to identify the job in a different way.
- Statistics Refresh Interval - You can change the milliseconds to wait before Control Hub refreshes the statistics when you monitor the job.
- Number of Instances - You can change the number of pipeline instances run for Data Collector jobs.
- Pipeline Force Stop Timeout - You can change the number of milliseconds to wait before Control Hub forces remote pipeline instances to stop.
- Runtime Parameters - You can change the values used for the runtime parameters defined in the pipeline.
- Enable or disable failover - You can enable or disable pipeline failover for
jobs. Control Hub manages pipeline failover differently based on the engine type, as described in the following topics:
Duplicating Jobs
Duplicate a job to create one or more exact copies of an existing job. You can then change the configuration and runtime parameters of the copies.
You duplicate jobs from the Job Instances view in Control Hub.
Deleting Jobs
You can delete inactive jobs. Control Hub automatically deletes inactive job instances older than 365 days that have never been run. You cannot delete an inactive job that is included in a sequence.
- In the Navigation panel, click .
-
Select jobs in the list, and then click the
Delete icon:
.