Create online deployment API

You can use this API to create online deployment for a model.

HTTP method and URI path

POST /v3/published_models/${modelId}/deployments

Standard headers

Use the following standard HTTP header with this request:

Content-Type: application/json

Authorization: <Bearer token>

Required authorizations

The user ID associated with the token which is specified in the request header needs to be granted with one of the following roles:

  • sysadm
  • mladm
  • apiuser (only if the model was created by the user)

Query parameter

Table 1. Query parameter for the Create online deployment API

In this table, the Parameter column lists the query parameters; the Type column lists the data type for the parameter, the Required or optional column indicates whether the parameter is required or optional for this request; and the Description column provides a brief description of the parameter.

Parameter Type Required or optional Description
sync Boolean Optional Create online deployment in sync or async mode. The valid value is true or false. The mode is sync in default if this parameter is not set.
Important: The scoring service will be called to create online deployment. In async mode, the response is returned without waiting for the completion of the scoring service, so the status of the created online deployment will be INITIALIZING. You need to check the final status of the created online deployment with the Get deployment detail API. The status of a successfully created online deployment is ACTIVE. In sync mode, the response will be returned after the deploy action is completely done.

Request body

The request content is expected to contain a JSON object. See Table 2 for the description of the fields.
Table 2. Supported parameters for the Create online deployment API

In this table, the Parameter column lists the supported parameters; the Type column lists the data type for the parameter, the Required or optional column indicates whether the parameter is required or optional for this request; and the Description column provides a brief description of the parameter.

Parameter Type Required or optional Description

servingId

String

Optional

The servingId must be 3 to 36 characters long, contain only letters, numbers, or underscores, and must begin with a letter.

Note: All deployments that use the same servingId must share an identical model schema, including both input and output formats. Additionally, only one deployment with a given servingId can be deployed within a specific scoringGroupId.

type

String Required Specify the deployment type: online or batch.
name String Required A unique name of the deployment created.
description String Optional Description of the deployment.
author JSON object Optional

Specify the author information

  • name (optional for String): Specify the deployment creator name.
  • email (optional for String): Specify the deployment creator email address.
deploy_info JSON object Required

Specify the online deployment information for deployment’s creation. deploy_info includes scoring group id, version sequence, engine type, and model version href.

  • scoringGroupId – String for Required. The group ID of a scoring service defined to the MLz environment. The scoring service can be a standard scoring service, a scoring cluster, or a CICS-integrated scoring service, but must be an active one. See the guid field returned by the Get scoring service list API.
  • engineType – String for Required. Supported engine types are spark, pmml, scikit, xgboost, arima, sarimax, onnx, watfore. Refer to the Get model version API.
  • artifactVersionHref – String for Required. The URI for the model to create online deployment. It contains the model ID and model version ID. See the Get models API.
  • zaiu – Optional Boolean. Applicable only to ONNX models. Applicable only to ONNX models. When set to true, the deployment uses the z16™ on-chip AI accelerator for scoring. If both zaiu and telum2 are set to true, telum2 takes precedence. If neither flag is specified, the ONNX model runs on the CPU.
  • telum2 - Optional Boolean. Applicable only to ONNX models. When set to true, the deployment uses the z17™ on-chip AI accelerator for scoring. If both zaiu and telum2 are set to true, telum2 takes precedence. If neither flag is specified, the ONNX model runs on the CPU.
  • batching: whether to enable online micro-batching, true or false. For ONNX model online scoring only.
  • maxLatencyInMs: maximum latency in milliseconds for online micro-batching.
  • maxBatchSize: maximum batch size for online micro-batching.
  • predictionHorizon: The predictionHorizon parameter is effective only when the engineType is set to "watfore". It determines the number of future time steps for which the Watson Core Time Series model will generate predictions.
  • timeout: The timeout parameter specifies an optional duration, in milliseconds, for scoring requests in PMML and SnapML models with online scoring only. This parameter defines the maximum time a scoring request can run against the deployment before it is canceled.
    • Valid range: 1–60,000 milliseconds. For example, "timeout": 60000
    • Default value: not specified
      Note: By default, if the scoring timeout duration is not specified, scoring requests will run until completion. The timeout is not applicable to online batch scoring (JES).

    Once the timeout value is set, the specified timeout value is uniformly applied to all incoming online scoring requests for the deployment.

    • If a scoring request completes within the defined timeout duration, it returns a response as expected.
    • If a scoring request exceeds the defined timeout duration, it is canceled and a scoring request timed out error is returned.
Important:

Machine Learning for IBM z/OS® delivers exceptional throughput and performance for inferencing tasks. However, in rare scenarios, an inferencing request may exceed the expected duration. The scoring timeout option serves as a safeguard to automatically cancel such long-running inferencing requests.

Choose a reasonable timeout value. Setting a very low timeout may cause the majority of incoming inferencing requests to be rejected, leading to real-time online scoring failures.

Using a timeout introduces a minor performance overhead. Enable it only if necessary as a safeguard.

Example of request body:
{
      "type": "online",
      "name": "CVT-online",
      "description": "This is online deployment created for tests",
      "author": {
          "name": "John Smith",
          "email": "john.smith@example.com"
        },
        "deploy_info": {
          "scoringGroupId": "8da4702c-7636-4e0f-92ad-214b1493d50f",
          "engineType": "spark",
          "artifactVersionHref": "/v3/ml_assets/models/2e4d4282-cb64-4b27-b3c3-2b423cb0cc36/versions/208be2e7-5cb7-4f4a-a08a-5673ff1717fa"
        }
  }
 

Expected response

On completion, the service returns an HTTP response, which includes a status code that indicates whether your request is completed. Status code 201 indicates a successful completion. A submission ID should be returned.

Response example of a successful request:
{
    "metadata": {
        "url": "https://127.0.0.0:9999/v3/published_models/179e5d7e-05a8-4ec9-bf23-381456663565/deployments/17650863-0dc2-4b32-81de-7c6f407085f1",
        "guid": "17650863-0dc2-4b32-81de-7c6f407085f1",
        "modified_at": "2022-12-28T06:50:25.387Z",
        "model_status": [],
        "created_at": "2022-12-28T06:50:09.135Z"
    },
    "entity": {
        "author": {
            "email": "john.smith@example.com",
            "name": "wmlz11"
        },
        "deploy_info": {
            "artifactVersionHref": "/v3/ml_assets/models/179e5d7e-05a8-4ec9-bf23-381456663565/versions/4dd86b2c-c09e-4328-acc1-999a57ab09eb",
            "engineType": "spark",
            "nextFire": "0",
            "scheduleStatus": "",
            "scoringGroupId": "a9955c98-2a95-429c-9413-9c86edbe2017",
            "versionSeq": "1"
        },
        "deployed_version": {
            "guid": "4dd86b2c-c09e-4328-acc1-999a57ab09eb",
            "url": "/v3/ml_assets/models/179e5d7e-05a8-4ec9-bf23-381456663565/versions/4dd86b2c-c09e-4328-acc1-999a57ab09eb"
        },
        "description": "This is online deployment created for tests",
        "model_type": "mllib",
        "name": "CVT-online",
        "published_model": {
            "author": {
                "name": "wmlz11"
            },
            "created_at": "2022-12-23T04:57:16.375Z",
            "description": "",
            "guid": "179e5d7e-05a8-4ec9-bf23-381456663565",
            "name": "churn",
            "url": "https://127.0.0.0:9999/v3/published_models/179e5d7e-05a8-4ec9-bf23-381456663565"
        },
        "runtime_environment": "spark",
        "scoring_url": "https://127.0.0.0:15779/iml/v2/scoring/online/17650863-0dc2-4b32-81de-7c6f407085f1",
        "status": "ACTIVE",
        "type": "online"
    }
}
 

HTTP status codes

For unsuccessful requests, the service returns the status codes that are described in Table 3.

Table 3. HTTP error response
HTTP status code Error response Description
400 empty_deploy_info No deploy_info is provided. Specify deploy_info.
400 parsing_error No type is provided. Specify type.
400 not_supported_deployment_type Invalid type is provided. Specify type as online or batch.
400 duplicate_deployment_name Specify a unique online deployment name.
400 duplicate_deployment It means the model of this version has been deployed in the scoring service. Specify another version of this model or another scoring service.
400 timeout_not_supported The timeout value must be an integer greater than 0 and is supported only for the online deployment type with the PMML engine only.
500 not_found artifactVersionHref is not valid. Specify the correct model version href.