Create online deployment API

HTTP method and URI path

POST /v3/published_models/${modelId}/deployments

Standard headers

Use the following standard HTTP header with this request:

Content-Type: application/json

Authorization: <Bearer token>

Required authorizations

The user ID associated with the token which is specified in the request header needs to be granted with one of the following roles:

sysadm
mladm
apiuser (only if the model was created by the user)

Query parameter

Table 1. Query parameter for the Create online deployment API
In this table, the **Parameter** column lists the query parameters; the **Type** column lists the data type for the parameter, the **Required or optional** column indicates whether the parameter is required or optional for this request; and the **Description** column provides a brief description of the parameter.
Parameter	Type	Required or optional	Description
sync	Boolean	Optional	Create online deployment in sync or async mode. The valid value is `true` or `false`. The mode is sync in default if this parameter is not set.

Important: The scoring service will be called to create online deployment. In async mode, the response is returned without waiting for the completion of the scoring service, so the status of the created online deployment will be INITIALIZING. You need to check the final status of the created online deployment with the Get deployment detail API. The status of a successfully created online deployment is ACTIVE. In sync mode, the response will be returned after the deploy action is completely done.

Request body

The request content is expected to contain a JSON object. See Table 2 for the description of the fields.

Table 2. Supported parameters for the Create online deployment API
In this table, the **Parameter** column lists the supported parameters; the **Type** column lists the data type for the parameter, the **Required or optional** column indicates whether the parameter is required or optional for this request; and the **Description** column provides a brief description of the parameter.
Parameter	Type	Required or optional	Description
servingId	String	Optional	The `servingId` must be 3 to 36 characters long, contain only letters, numbers, or underscores, and must begin with a letter. Note: All deployments that use the same `servingId` must share an identical model schema, including both input and output formats. Additionally, only one deployment with a given `servingId` can be deployed within a specific `scoringGroupId`.
type	String	Required	Specify the deployment type: online or batch.
name	String	Required	A unique name of the deployment created.
description	String	Optional	Description of the deployment.
author	JSON object	Optional	Specify the author information name (optional for String): Specify the deployment creator name. email (optional for String): Specify the deployment creator email address.
deploy_info	JSON object	Required	Specify the online deployment information for deployment’s creation. deploy_info includes scoring group id, version sequence, engine type, and model version href. scoringGroupId – String for Required. The group ID of a scoring service defined to the MLz environment. The scoring service can be a standard scoring service, a scoring cluster, or a CICS-integrated scoring service, but must be an active one. See the guid field returned by the Get scoring service list API. engineType – String for Required. Supported engine types are spark, pmml, scikit, xgboost, arima, sarimax, onnx, watfore. Refer to the Get model version API. artifactVersionHref – String for Required. The URI for the model to create online deployment. It contains the model ID and model version ID. See the Get models API. zaiu – Optional Boolean. Applicable only to ONNX models. Applicable only to ONNX models. When set to `true`, the deployment uses the z16™ on-chip AI accelerator for scoring. If both `zaiu` and `telum2` are set to `true`, `telum2` takes precedence. If neither flag is specified, the ONNX model runs on the CPU. telum2 - Optional Boolean. Applicable only to ONNX models. When set to `true`, the deployment uses the z17™ on-chip AI accelerator for scoring. If both `zaiu` and `telum2` are set to `true`, `telum2` takes precedence. If neither flag is specified, the ONNX model runs on the CPU. batching: whether to enable online micro-batching, `true` or `false`. For ONNX model online scoring only. maxLatencyInMs: maximum latency in milliseconds for online micro-batching. maxBatchSize: maximum batch size for online micro-batching. predictionHorizon: The `predictionHorizon` parameter is effective only when the `engineType` is set to "watfore". It determines the number of future time steps for which the Watson Core Time Series model will generate predictions. timeout: The `timeout` parameter specifies an optional duration, in milliseconds, for scoring requests in PMML and SnapML models with online scoring only. This parameter defines the maximum time a scoring request can run against the deployment before it is canceled. Valid range: 1–60,000 milliseconds. For example, `"timeout": 60000` Default value: not specified Note: By default, if the scoring timeout duration is not specified, scoring requests will run until completion. The timeout is not applicable to online batch scoring (JES). Once the timeout value is set, the specified timeout value is uniformly applied to all incoming online scoring requests for the deployment. If a scoring request completes within the defined timeout duration, it returns a response as expected. If a scoring request exceeds the defined timeout duration, it is canceled and a scoring request timed out error is returned.

Important:

Machine Learning for IBM z/OS® delivers exceptional throughput and performance for inferencing tasks. However, in rare scenarios, an inferencing request may exceed the expected duration. The scoring timeout option serves as a safeguard to automatically cancel such long-running inferencing requests.

Choose a reasonable timeout value. Setting a very low timeout may cause the majority of incoming inferencing requests to be rejected, leading to real-time online scoring failures.

Using a timeout introduces a minor performance overhead. Enable it only if necessary as a safeguard.

Example of request body:

{
      "type": "online",
      "name": "CVT-online",
      "description": "This is online deployment created for tests",
      "author": {
          "name": "John Smith",
          "email": "john.smith@example.com"
        },
        "deploy_info": {
          "scoringGroupId": "8da4702c-7636-4e0f-92ad-214b1493d50f",
          "engineType": "spark",
          "artifactVersionHref": "/v3/ml_assets/models/2e4d4282-cb64-4b27-b3c3-2b423cb0cc36/versions/208be2e7-5cb7-4f4a-a08a-5673ff1717fa"
        }
  }

Expected response

On completion, the service returns an HTTP response, which includes a status code that indicates whether your request is completed. Status code 201 indicates a successful completion. A submission ID should be returned.

Response example of a successful request:

{
    "metadata": {
        "url": "https://127.0.0.0:9999/v3/published_models/179e5d7e-05a8-4ec9-bf23-381456663565/deployments/17650863-0dc2-4b32-81de-7c6f407085f1",
        "guid": "17650863-0dc2-4b32-81de-7c6f407085f1",
        "modified_at": "2022-12-28T06:50:25.387Z",
        "model_status": [],
        "created_at": "2022-12-28T06:50:09.135Z"
    },
    "entity": {
        "author": {
            "email": "john.smith@example.com",
            "name": "wmlz11"
        },
        "deploy_info": {
            "artifactVersionHref": "/v3/ml_assets/models/179e5d7e-05a8-4ec9-bf23-381456663565/versions/4dd86b2c-c09e-4328-acc1-999a57ab09eb",
            "engineType": "spark",
            "nextFire": "0",
            "scheduleStatus": "",
            "scoringGroupId": "a9955c98-2a95-429c-9413-9c86edbe2017",
            "versionSeq": "1"
        },
        "deployed_version": {
            "guid": "4dd86b2c-c09e-4328-acc1-999a57ab09eb",
            "url": "/v3/ml_assets/models/179e5d7e-05a8-4ec9-bf23-381456663565/versions/4dd86b2c-c09e-4328-acc1-999a57ab09eb"
        },
        "description": "This is online deployment created for tests",
        "model_type": "mllib",
        "name": "CVT-online",
        "published_model": {
            "author": {
                "name": "wmlz11"
            },
            "created_at": "2022-12-23T04:57:16.375Z",
            "description": "",
            "guid": "179e5d7e-05a8-4ec9-bf23-381456663565",
            "name": "churn",
            "url": "https://127.0.0.0:9999/v3/published_models/179e5d7e-05a8-4ec9-bf23-381456663565"
        },
        "runtime_environment": "spark",
        "scoring_url": "https://127.0.0.0:15779/iml/v2/scoring/online/17650863-0dc2-4b32-81de-7c6f407085f1",
        "status": "ACTIVE",
        "type": "online"
    }
}

HTTP status codes

For unsuccessful requests, the service returns the status codes that are described in Table 3.

Table 3. HTTP error response
HTTP status code	Error response	Description
400	empty_deploy_info	No deploy_info is provided. Specify deploy_info.
400	parsing_error	No type is provided. Specify type.
400	not_supported_deployment_type	Invalid type is provided. Specify type as online or batch.
400	duplicate_deployment_name	Specify a unique online deployment name.
400	duplicate_deployment	It means the model of this version has been deployed in the scoring service. Specify another version of this model or another scoring service.
400	timeout_not_supported	The timeout value must be an integer greater than 0 and is supported only for the online deployment type with the PMML engine only.
500	not_found	artifactVersionHref is not valid. Specify the correct model version href.