Deploying tuned models programmatically

You can programmatically deploy tuned foundation models in watsonx.ai that are customized for your use case.

You can deploy a foundation model that is tuned with any of the following techniques:

Full fine tuning
Low-rank adaptation (LoRA) fine tuning
Quantized low-rank adaptation (QLoRA) fine tuning

Required permissions: To deploy tuned models, you must have the Admin or Editor role in a project.
Required credentials: You must generate credentials to authenticate with watsonx.ai APIs. For details, see Generating a bearer token.

Ways to develop

You can deploy base or custom foundation models tuned with various techniques by using these programming methods:

REST API
Python

To set up access to use the watsonx.ai deployment APIs, see the Developer resources.

Starting in 2.3.1, the text generation API is deprecated and will be removed in the future.

Alternatively, you deploy tuned base or custom foundation models from the UI. For details, see Deploying tuned models from the UI.

REST API

The high-level steps that you follow are mostly the same for each tuning technique and whether you deploy an IBM-provided base foundation model or a custom foundation model.

The key differences are the values to include in the REST API request body for the deploying the foundation model asset and you must also deploy an adapter model asset for LoRA and QLoRA tuned models.

For API method details, see the Trainings and Deployments methods in watsonx.ai API reference documentation.

Optional: If the auto_update_model option was not enabled when you tuned the model, an asset was not created automatically for the tuned model in the repository service. You must create tuned foundation model assets.

Create the base foundation model asset by providing the tuned model details in the REST API request. Use the following REST API request example for creating an asset for a base model tuned with any tuning method:

curl -X POST "/ml/v4/models?version=2025-12-15" \
-H "Authorization: Bearer ${TOKEN}" \
-H "content-type: application/json" \
--data '{
  "name":"base foundation model asset",
  "space_id":"<space_id>",
  "foundation_model":{
    "model_id":"ibm/granite-3-1-8b-base"
  },
  "type":"base_foundation_model_1.0",
  "software_spec":{
    "name":"watsonx-cfm-caikit-1.1"
  }
}'

LoRA or QLoRA fine tuning only: Create the adapter model asset.

After the training process for fine tuning the LoRA or QLoRA adapter completes, the content of the LoRA or QLoRA adapter is stored in the persistent volume claim (PVC). The model content is written to a directory named after the model in the PVC. You must provide the PVC name as the foundation_model.model_id in the model creation input payload.

The following code sample shows how to create the LoRA or QLoRA adapter model asset:

curl -X POST "/ml/v4/models?version=2025-12-15" \
-H "Authorization: Bearer ${TOKEN}" \
-H "content-type: application/json" \
--data '{
  "name":"lora adapter model asset",
  "space_id":"<space_id>",
  "foundation_model":{
    "model_id":"finetunedadapter-385316ea-1d25-4274-9550-e387a1355241" // PVC name
  },
  "type":"lora_adapter_1.0",
  "software_spec":{
    "name":"watsonx-cfm-caikit-1.1"
  },
  "training":{
    "base_model":{
      "model_id":"ibm/granite-3-1-8b-base"
    },
    "task_id":"summarization",
    "fine_tuning": {
        "peft_parameters": {
            "type": "lora",
            "rank": 8
        }
    }
  }
}'

Deploy the tuned foundation model assets.

Full fine tuning

The following sample REST API request deploys a full fine-tuned foundation model asset:

curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments?version=2025-12-15" \
-H "Authorization: Bearer ${TOKEN}" \
-H "content-type: application/json" \
--data '{
  "asset":{
    "id": "<asset_id>"  // Base foundation model asset
  },
  "online":{
    "parameters":{
      "serving_name": "basefmservingname",
      "foundation_model": {
        "max_batch_weight": 10000,
        "max_sequence_length": 8192,
      }
    }
  },
  "hardware_spec": {  // Specify either "id" or "name", only one can be used.
    "id": "<hardware_spec_id>",
    "num_nodes": 1
  },
  "description": "Testing deployment using base foundation model",
  "name": "full_fine_tune_fm_deployment",
  "space_id": "<space_id>"  // Specify either project_id or space_id; only one can be used
}'

LoRA or QLoRA fine tuning

Deploy the base foundation model asset. To create an online deployment for LoRA or QLoRA fine-tuned models, you must set the enable_lora parameter to true in the REST API request body so that you can deploy the LoRA or QLoRA adapters by using the base foundation model .

The following REST API request shows how to create an online deployment for the base foundation model asset:

curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments?version=2025-12-15" \
-H "Authorization: Bearer ${TOKEN}" \
-H "content-type: application/json" \
--data '{
  "asset":{
    "id": "<asset_id>"  // Base foundation model asset
  },
  "online":{
    "parameters":{
      "serving_name": "basefmservingname",
      "foundation_model": {
        "max_batch_weight": 10000,
        "max_sequence_length": 8192,
        "enable_lora": true,
        "max_gpu_loras": 8,
        "max_cpu_loras": 16,
        "max_lora_rank": 32
      }
    }
  },
  "hardware_spec": {  // Specify either "id" or "name", only one can be used.
    "id": "<hardware_spec_id>",
    "num_nodes": 1
  },
  "description": "Testing deployment using base foundation model",
  "name": "base_fm_deployment",
  "space_id": "<space_id>"  // Specify either project_id or space_id; only one can be used.
}'

Use the deployed base foundation model deployment ID to deploy the LoRA or QLoRA adapters as an additional layer on top of the base foundation model. The following REST API request sample shows how to create a deployment for the adapter model asset:

curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments?version=2025-12-15" \
-H "Authorization: Bearer ${TOKEN}" \
-H "content-type: application/json" \
--data '{
  "asset":{
    "id":<asset_id>  // LoRA adapter model asset
  },
  "online":{
    "parameters":{
      "serving_name":"lora_adapter_dep"
    }
  },
  "base_deployment_id": "<your base foundation model deployment ID>",
  "description": "Testing deployment using lora adapter model",
  "name":"lora_adapter_deployment",
  "space_id":<space_id>  // Specify either project_id or space_id; only one can be used.
}'

The API response contains the deployment ID in the metadata.id field.

Poll for the deployment status by using the deployment ID and wait until the state changes from initializing to ready.
```
curl -X GET "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments/<deployment_id>?version=2025-12-15&space_id=<space_id>" \
-H "Authorization: Bearer ${TOKEN}"
```
After deployment is created successfully, the polling status returns the deployed_asset_type as base_foundation_model.
```
"deployed_asset_type": "base_foundation_model"
```

Inference the deployment to test your tuned model by using the watsonx.ai text generation API as follows:

curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v1/deployments/<deployment_id>/text/generation?version=2025-12-15" \
-H "Authorization: Bearer ${TOKEN}" \
-H "content-type: application/json" \
--data '{
"input": "What is the boiling point of water?",
"parameters": {
  "max_new_tokens": 200,
  "min_new_tokens": 20
}
}'

Python

You can deploy fine-tuned foundation models in IBM watsonx.ai programmatically by using the Python library. For details, see Deployments.create() method.

To get started, see the following sample notebooks:

Full fine tuning: Use watsonx, and meta-llama-3-1-8b to Fine Tune with online banking queries annotated.
LoRA fine tuning: Use watsonx, and meta-llama-3-1-8b to Fine Tune with LoRA on online banking queries annotated.