Creating jobs in deployment spaces

A job is a way of running a batch deployment, or a self-contained asset like a script, notebook, code package, or flow in Watson Machine Learning. You can select the input and output for your job and choose to run it manually or on a schedule. From a deployment space, you can create, schedule, run, and manage jobs.

Notebooks, scripts, code packages, and flows use notebook environments and do not require a batch deployment to run.

Creating a batch deployment job

Follow these steps when you are creating a batch deployment job:

Important: You must have an existing batch deployment to create a batch job.
  1. From the Deployments tab, select your deployment and click New job. The Create a job dialog box opens.
  2. In the Define details section, enter your job name, an optional description, and click Next.
  3. In the Configure section, select a hardware specification. For more information, see Compute requirements for batch deployment jobs. You can follow these steps to optionally configure environment variables and job run retention settings:
    • Optional: If you are deploying a Python script, an R script, or a notebook, then you can enter environment variables to pass parameters to the job. Click Environment variables to enter the key - value pair.
    • Optional: To avoid finishing resources by retaining all historical job metadata, follow one of these options:
      • Click By amount to set thresholds for saving a set number of job runs and associated logs.
      • Click By duration (days) to set thresholds for saving artifacts for a specified number of days.
  4. Optional: In the Schedule section, toggle the Schedule off button to schedule a run. You can set a date and time for start of schedule and set a schedule for repetition. Click Next.
Note: If you don't specify a schedule, the job runs immediately.
  1. Optional: In the Notify section, toggle the Off button to turn on notifications associated with this job. Click Next.
Note: You can receive notifications for three types of events: success, warning, and failure.
  1. In the Choose data section, provide inline data that corresponds with your model schema. You can provide input in JSON format. Click Next. See Example JSON payload for inline data.
  2. In the Review and create section, verify your job details, and click Create and run.

Notes:

  • If you create a job based on a code package, you are asked to select a specific entrypoint file.
  • Scheduled jobs display on the Jobs tab of the deployment space.
  • Results of job runs are written to the specified output file and saved as a space asset.
  • A data asset can be a data source file that you promoted to the space, a connected data source, or tables from databases and files from file-based data sources.
  • If you exclude certain weekdays in your job schedule, the job might not run as you would expect. The reason is due to a discrepancy between the time zone of the user who creates the schedule, and the time zone of the main node where the job runs.

Example JSON payload for inline data

{
  "deployment": {
    "id": "<deployment id>"
  },
  "space_id": "<your space id>",
  "name": "test_v4_inline",
  "scoring": {
  "input_data": [{
    "fields": ["AGE", "SEX", "BP", "CHOLESTEROL", "NA", "K"],
    "values": [[47, "M", "LOW", "HIGH", 0.739, 0.056], [47, "M", "LOW", "HIGH", 0.739, 0.056]]
    }]
  }
}

Queuing and concurrent job executions

The maximum number of concurrent jobs for each deployment is handled internally by the deployment service. For batch deployment, by default, two jobs can be run concurrently. Any deployment job request for a batch deployment that already has two running jobs is placed in a queue for execution later. When any of the running jobs is completed, the next job in the queue is run. The queue has no size limit. The number of concurrent jobs can be changed manually by a Kubernetes admin. For more information, see Changing the maximum number of concurrent batch jobs.

Changing the maximum number of concurrent batch jobs

Prerequisites: You must be a Kubernetes admin to modify the maximum number of concurrent batch jobs that are running.

To change the maximum number of concurrent batch jobs:

  1. Put the Watson Machine Learning operator in maintenance mode:

    oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}'
    
  2. Update the wmlruntimemanager configmap value:

    1. Run the following command:

      oc project <control_plane>
      
    2. Take backup of the existing wmlruntimemanager configmap:

      oc get cm wmlruntimemanager -o yaml > wmlruntimemanager_org.yaml
      
    3. Update the number of parallel jobs from 2 to a larger value:

      oc get cm wmlruntimemanager -o yaml | sed -e 's|private = 2|private = <new limit>|' > wmlruntimemanager_new.yaml
      
    4. Apply the updated configmap:

      oc apply -f wmlruntimemanager_new.yaml
      
  3. Restart the wml-deployment-manager pod and then wait for wml-deployment-manager to be operational again:

    oc delete pod -l app=wml-deployment-manager
    

Note: Before you install any future upgrades, put the operator back in the normal operation mode:

oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": false}}'

Limitation on using large inline payloads for batch deployments

Batch deployment jobs that use large inline payload might get stuck in starting or running state. For more information, see Known issues and limitations for Watson Machine Learning.

Tip: If you provide huge payloads to batch deployments, use data references instead of inline.

Retention of deployment job metadata

Job-related metadata is persisted and can be accessed until the job and its deployment are deleted.

Viewing deployment job details

When you create or view a batch job, the deployment ID and the job ID are displayed.

Job IDs

  • The deployment ID represents the deployment definition, including the hardware and software configurations and related assets.
  • The job ID represents the details for a job, including input data and an output location and a schedule for running the job.

Use these IDs to refer to the job in the Watson Data API requests or in notebooks that use the watsonx.ai Python client library.

Jobs after upgrade Cloud Pak for Data

Job runs in deployment spaces fail if the job is using a software specification that was deleted or is no longer supported after a Cloud Pak for Data version upgrade. To run the job again, edit the job to point to a different software specification.

To prevent job runs from failing due to an upgrade, create custom software specifications. Jobs that are associated with these software specifications will still run after an upgrade. For more information, see Customizing deployment runtimes.

Parent topic: Managing predictive deployments