Creating a deployment for a custom foundation model

After a custom foundation model asset is created, you can create a deployment for the model to make it available for inferencing.

Prerequisites:

  1. You must set up your task credentials by generating an API key. For more information, see Managing task credentials.
  2. Before deploying your model, review the Available hardware specifications and pick a predefined hardware specification that matches your model.
  3. Additionally, review whether the available software specifications match your model architecture. For details, see Supported model architectures.

Creating a deployment from the watsonx.ai user interface

Follow these steps to create a deployment for a custom foundation model:

  1. In your deployment space or your project, go to the Assets tab.

  2. Find your model in the asset list, click the Menu icon Menu icon, and select Deploy.

  3. Adjust model details, as needed.

    Note:
    • Use the Serving name field to specify a name for your deployment instead of deployment ID.
    • The serving name must be unique within the namespace.
    • The serving name must contain only these characters: [a-z,0-9,_] and must be a maximum 36 characters long.
    • In workflows where your custom foundation model is used periodically, consider assigning your model the same serving name each time you deploy it. This way, after you delete and then re-deploy the model, you can keep using the same endpoint in your code.

  4. Optional: If you want to override some of the base model parameters, click Model deployment parameters and then enter new parameter values. For information on available model parameters, see Global parameters for custom foundation models.

  5. Click Create.

Note:
  • If you use the watsonx-cfm-caikit-1.1 software specification to deploy your model, the value of the max_concurrent_requests parameter is not used.
  • Time-series models do not take any parameters. Do not provide any parameters when you are deploying a custom time-series model. If you provide parameters when you deploy a custom time-series model, they will have no effect.
  • Under Model tasks:
    • If you don't select any tasks:
      • If the model does not include a chat template, the default task is text generation.
      • If the model includes a chat template, the default tasks are: text generation and text chat.
    • The tasks that you select when deploying your model override the tasks that you set at the model creation stage.

Testing the deployment

Follow these steps to test your custom foundation model deployment:

  1. In your deployment space or your project, open the Deployments tab and click the deployment name.

  2. Click the Test tab to input prompt text and get a response from the deployed asset.

  3. Enter test data in one of the following formats, depending on the type of asset that you deployed:

    • Text: Enter text input data to generate a block of text as output.
    • Stream: Enter text input data to generate a stream of text as output.
    • JSON: Enter JSON input data to generate output in JSON format.

    Enter test data for custom foundation model

  4. Click Generate to get results that are based on your prompt.

Retrieving the endpoint for custom foundation model deployments

Follow these steps to retrieve the endpoint URL for your custom foundation model deployment. You need this URL to access the deployment from your applications:

  1. In your deployment space or your project, open the Deployments tab and click the deployment name.
  2. In the API Reference tab, find the private and public endpoint links and code snippets that you can use to include the endpoint details in an application.
Note:

If you added Serving name when you created your online deployment, you see two endpoint URLs. The first URL contains the deployment ID, and the second URL contains your serving name. You can use either one of these URLs with your deployment.

Creating a deployment programmatically

To use the watsonx.ai API, you need a bearer token. For more information, see Credentials for programmatic access.

Note:
  • You can override the default values of your custom foundation model parameters in the online.parameters.foundation_model field. For information on available model parameters, see Global parameters for custom foundation models.
  • If you use the watsonx-cfm-caikit-1.1 software specification to deploy your model, the max_concurrent_requests parameter is not used.
  • Use the Serving name field to specify a name for your deployment instead of deployment ID.
  • The serving name must be unique within the namespace.
  • The serving name must contain only these characters: [a-z,0-9,_] and must be a maximum 36 characters long.
  • In workflows where your custom foundation model is used periodically, consider assigning your model the same serving name each time you deploy it. This way, after you delete and then re-deploy the model, you can keep using the same endpoint in your code.
  • Time-series models do not take any parameters. Do not provide any parameters when you are deploying a custom time-series model. If you provide parameters when you deploy a custom time-series model, they will have no effect.

To deploy a custom foundation model programmatically:

  1. Initiate model deployment. See this code for an example deployment to space:

    curl -X POST "https://<your cloud hostname>/ml/v4/deployments?version=2024-01-29" \
    -H "Authorization: Bearer $TOKEN" \
    -H "content-type: application/json" \
    --data '{
      "asset": {
        "id": "b89ff4c6-9264-4e87-9b04-c98e973f9f67"
      },
      "hardware_spec": {
        "name": "1l40s-48g",
        "num_nodes": 1
      },
      "description": "Description",
      "name": "granite-speech",
      "space_id": "<space id>",
      "online": {
        "parameters": {
          "foundation_model": {
            "tool_call_parser": "granite",
            "chat_template": "granite-3.3.j2",
            "functions": [
                "text_generation",
                "text_chat",
                "image_chat"
            ],
          }
        }
      }
    }'
    

    For hardware_spec use:

    • General-purpose models: the name of the GPU-based configuration that you want to assign to your model
    • Time-series models: The size of the CPU-based configuration that you want to assign to your model

    For project deployments, instead of space_id use project_id.

    If needed, for models that use chat API, use the chat_template field to provide the name of the template that overrides the model's standard chat template.

    For functions:

    • If you don't select any functions:
      • If the model does not include a chat template, the default task is text generation.
      • If the model includes a chat template, the default functions are: text_generation and text_chat.
    • The functions that you select when deploying your model override the functions that you set at the model creation stage.

    The deployment ID is returned in the API response, in the metadata.id field.

  2. Use the deployment ID to poll for the deployment status. See this code for an example of how to poll for the status of a model that is deployed to a project.

    curl -X GET "https://<your cloud hostname>/ml/v4/deployments/<your deployment ID>?version=2024-01-29&project_id=<your project ID>" \
    -H "Authorization: Bearer $TOKEN"
    

    The deployed_asset_type is returned as custom_foundation_model. Wait until the status changes from initializing to ready.

Next steps

Prompting a custom foundation model