Creating and managing jobs in a project

You create jobs to run operational assets or files, such as Data Refinery flows, SPSS Modeler flows, Notebooks, and scripts, in a project.

When you create a job you define the properties for the job, such as the name, definition, environment runtime, schedule and notification specifications on different pages. You can run a job immediately or wait for the job to run at the next scheduled interval.

Each time a job is started, a job run is created, which you can monitor and use to compare with the job run history of previous runs. You can view detailed information about each job run, job state changes, and job failures in the job run log.

How a job is created depends on the operational asset or file and the type of project you are working in.

Job creation options for operational assets or files
Operational asset/file	Type of project	Create job in tool	Create job from the Assets page	Create job automatically	More information
Data Refinery flow	All types	✓	✓		Creating jobs in Data Refinery
SPSS Modeler flow	All types	✓	✓		Creating jobs in SPSS Modeler
DataStage flow	Default Git integration and Deprecated Git integration
	Empty and From file	✓	✓	✓	Creating jobs in DataStage
Notebook created in the Notebook editor	All types	✓	✓		Creating jobs in the Notebook editor
Files and scripts created in JupyterLab and RStudio	Default Git integration		✓		Creating code-based jobs
	Deprecated Git integration	✓	✓		Creating jobs in projects with deprecated Git integration
Metadata import	Empty and From file			✓	Creating a metadata import job
Metadata enrichment	Empty			✓	Creating a metadata enrichment job
Masking flow	All types		✓		Creating a masking flow job from the Assets page
Data quality rule	Empty	✓	✓	✓	Creating jobs for running data quality rules
Pipelines	Empty and From file	✓			Creating jobs for Pipelines

Creating jobs automatically

Some jobs are created automatically at the time the asset is created in a project. These jobs are listed on the Jobs page of the project. You can view the job run details, change job settings, run the job manually, and delete the job from the Jobs page. Note that you can't edit the job settings for metadata import or metadata enrichment jobs from the Job's page. You can only do this from the project's Assets page.

In Cloud Pak for Data, jobs are created for:

DataStage flows. See Creating jobs in DataStage.
DataStage flows for data quality rules. See Creating jobs for running data quality rules.
Metadata import assets. See Creating a metadata import job.
Metadata enrichment assets. See Creating a metadata enrichment job.

In addition, the following jobs are created automatically:
- A job for primary key analysis with the job type Relationship Analysis for Metadata Enrichment Assets when the metadata enrichment is created
- A job for relationship analysis with the job type Relationship Analysis for Metadata Enrichment Assets when the metadata enrichment is created
- A publish job with the job type Publish Metadata Enrichment Assets when you publish metadata enrichment results
You can view the job run details or delete the jobs but you cannot change any job settings or run such jobs manually.

Creating jobs for files in a project with deprecated Git integration

You can't create jobs directly in JupyterLab or RStudio. To create jobs for Notebooks or scripts that are created in JupyterLab or RStudio in a project with deprecated Git integration, you must push the files from the IDE to the Git repository associated with your project and then sync the repository files with the project. Any Notebooks, scripts, or RShiny apps that are pushed to a GIT repository with a size of zero bytes are considered invalid and are not synced with the project.

You can create jobs after you have synced your GIT files to create project assets:

From the Notebook viewer for Notebooks. See Creating jobs in the Notebook viewer.
From the project Assets page for Notebooks and scripts. See Creating jobs for the Assets page.

Creating jobs from the Assets page

You can create a job to run an asset from the project's Assets page.

In a project with default Git integration, you need to create a code-based job for non-listed assets by selecting a file from your local Git branch. See Creating jobs for code files.

To create jobs for a listed asset from the Assets page of a project:

Select the asset from the section for your asset type and choose Create job from the menu icon with the lists of options () at the end of the table row.
Define the job details by entering a name and a description (optional).
If you can select Setting, specify the settings that you want for the job.
If you can select Configure, choose an environment runtime for the job. Depending on the asset type, you can optionally configure more settings, for example environment variables or script arguments.

To avoid accumulating too many finished job runs and job run artifacts, set how long to retain finished job runs and job run artifacts like logs or notebook results. You can either select the number of days to retain the job runs or the last number of job runs to keep.
On the Schedule page, you can optionally add a one-time or repeating schedule.

Note: Scheduling jobs within git-based projects is unsupported.

If you define a start day and time without selecting Repeat, the job will run exactly one time at the specified day and time. If you select Repeat, the job will run at the timestamp indicated in the Repeat section between the start and end dates of the schedule.

You can't change the time zone; the schedule uses your web browser's time zone setting. If you exclude certain weekdays, the job might not run as you would expect. The reason might be due to a discrepancy between the time zone of the user who creates the schedule, and the time zone of the compute node where the job runs.
Optionally set to see notifications for the job. You can select the type of alerts to receive.
Review the job settings. Then, create the job and run it immediately, or create the job and run it later.

Managing jobs

You can view all of the jobs that exist for your project from the project's Jobs page. With Admin or Editor role for the project, you can view and edit the job details. You can run jobs manually and you can delete jobs. With Viewer role for the project, you can only view the job details. You can't run or delete jobs with Viewer role.

To view the details of a specific job, click the job. From the job's details page, you can:

View the runs for that job and the status of each run. If a run failed, you can select the run and view the log tail or download the entire log file to help you troubleshoot the run. A failed run might be related to a temporary connection or environment problem. Try running the job again. If the job still fails, you can send the log to Customer Support.
Edit job settings by clicking Edit job, for example to change schedule settings or to pick another environment template.
Run the job manually by clicking from the job's action bar. You can start a scheduled job based on the schedule and on demand.
Delete the job by clicking from the job's action bar.

Viewing and editing jobs in a tool

You can view and edit job settings associated with an asset directly in the following tools:

Data Refinery
Notebook editor or viewer
SPSS Modeler
Pipelines

To view and change job settings in these tools:
1. In the tool, click the Jobs icon from the toolbar and select Save and view jobs. This action lists the jobs that exist for the asset.
2. Select a job to see its details. You can change job settings by clicking Edit job.
DataStage

To view or edit runtime settings in a DataStage flow:
1. Opening the flow and click the Settings icon, which looks like a gear.
2. Click Run on the Settings page.
Pipelines jobs can be viewed and edited normally in the Jobs tab of your Watson Studio project.
IBM Match 360

To view and manage jobs in IBM Match 360:
1. From the Cloud Pak for Data navigation menu, choose Data > Master data to open the IBM Match 360 service.
2. Go to Master data home and then open the Jobs tab.
  
  The Jobs tab shows a list of jobs that have run, or are currently running, on this IBM Match 360 service instance. You can see details such as the job ID, job type, timestamp information, and status. You can also cancel any jobs that are currently running.
  
  Alternatively, from any master data configuration page, click the Processes icon in the action bar to see a list of running and recently completed jobs.

Jobs after upgrading Cloud Pak for Data

Job runs in projects fail if the job is using an environment that has been deleted or is no longer supported after a Cloud Pak for Data version upgrade. To get the job running again, edit the job to point to an alternative environment.

To prevent job runs from failing due to an upgrade, you can use either of the following methods:

Migrate your environments before upgrading Cloud Pak for Data. For details, see:
- Migrating Python 3.7 and Python 3.8 environments from Cloud Pak for Data 4.0
- Migrating Python 3.6 and Python 3.7 environments from Cloud Pak for Data 3.5
Create custom environments based on custom runtime images. Jobs associated with these environments will still run after an upgrade. For details, see Building custom images.

Learn more

Parent topic: Working in projects