Deploying tuned models programmatically
You can programmatically deploy tuned foundation models in watsonx.ai that are customized for your use case.
You can deploy a foundation model that is tuned with any of the following techniques:
- Full fine tuning
- Low-rank adaptation (LoRA) fine tuning
- Quantized low-rank adaptation (QLoRA) fine tuning
- Required permissions
- To deploy tuned models, you must have the Admin or Editor role in a project.
- Required credentials
- You must generate credentials to authenticate with watsonx.ai APIs. For details, see Generating a bearer token.
Ways to develop
You can deploy base or custom foundation models tuned with various techniques by using these programming methods:
To set up access to use the watsonx.ai deployment APIs, see the Developer resources.
Starting in 2.3.1, the text generation API is deprecated and will be removed in the future.
Alternatively, you deploy tuned base or custom foundation models from the UI. For details, see Deploying tuned models from the UI.
REST API
The high-level steps that you follow are mostly the same for each tuning technique and whether you deploy an IBM-provided base foundation model or a custom foundation model.
The key differences are the values to include in the REST API request body for the deploying the foundation model asset and you must also deploy an adapter model asset for LoRA and QLoRA tuned models.
For API method details, see the Trainings and Deployments methods in watsonx.ai API reference documentation.
-
Optional: If the
auto_update_modeloption was not enabled when you tuned the model, an asset was not created automatically for the tuned model in the repository service. You must create tuned foundation model assets.-
Create the base foundation model asset by providing the tuned model details in the REST API request. Use the following REST API request example for creating an asset for a base model tuned with any tuning method:
curl -X POST "/ml/v4/models?version=2025-12-15" \ -H "Authorization: Bearer ${TOKEN}" \ -H "content-type: application/json" \ --data '{ "name":"base foundation model asset", "space_id":"<space_id>", "foundation_model":{ "model_id":"ibm/granite-3-1-8b-base" }, "type":"base_foundation_model_1.0", "software_spec":{ "name":"watsonx-cfm-caikit-1.1" } }' -
LoRA or QLoRA fine tuning only: Create the adapter model asset.
After the training process for fine tuning the LoRA or QLoRA adapter completes, the content of the LoRA or QLoRA adapter is stored in the persistent volume claim (PVC). The model content is written to a directory named after the model in the PVC. You must provide the PVC name as the
foundation_model.model_idin the model creation input payload.The following code sample shows how to create the LoRA or QLoRA adapter model asset:
curl -X POST "/ml/v4/models?version=2025-12-15" \ -H "Authorization: Bearer ${TOKEN}" \ -H "content-type: application/json" \ --data '{ "name":"lora adapter model asset", "space_id":"<space_id>", "foundation_model":{ "model_id":"finetunedadapter-385316ea-1d25-4274-9550-e387a1355241" // PVC name }, "type":"lora_adapter_1.0", "software_spec":{ "name":"watsonx-cfm-caikit-1.1" }, "training":{ "base_model":{ "model_id":"ibm/granite-3-1-8b-base" }, "task_id":"summarization", "fine_tuning": { "peft_parameters": { "type": "lora", "rank": 8 } } } }'
-
-
Deploy the tuned foundation model assets.
- Full fine tuning
-
The following sample REST API request deploys a full fine-tuned foundation model asset:
curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments?version=2025-12-15" \ -H "Authorization: Bearer ${TOKEN}" \ -H "content-type: application/json" \ --data '{ "asset":{ "id": "<asset_id>" // Base foundation model asset }, "online":{ "parameters":{ "serving_name": "basefmservingname", "foundation_model": { "max_batch_weight": 10000, "max_sequence_length": 8192, } } }, "hardware_spec": { // Specify either "id" or "name", only one can be used. "id": "<hardware_spec_id>", "num_nodes": 1 }, "description": "Testing deployment using base foundation model", "name": "full_fine_tune_fm_deployment", "space_id": "<space_id>" // Specify either project_id or space_id; only one can be used }' - LoRA or QLoRA fine tuning
-
-
Deploy the base foundation model asset. To create an online deployment for LoRA or QLoRA fine-tuned models, you must set the
enable_loraparameter totruein the REST API request body so that you can deploy the LoRA or QLoRA adapters by using the base foundation model .The following REST API request shows how to create an online deployment for the base foundation model asset:
curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments?version=2025-12-15" \ -H "Authorization: Bearer ${TOKEN}" \ -H "content-type: application/json" \ --data '{ "asset":{ "id": "<asset_id>" // Base foundation model asset }, "online":{ "parameters":{ "serving_name": "basefmservingname", "foundation_model": { "max_batch_weight": 10000, "max_sequence_length": 8192, "enable_lora": true, "max_gpu_loras": 8, "max_cpu_loras": 16, "max_lora_rank": 32 } } }, "hardware_spec": { // Specify either "id" or "name", only one can be used. "id": "<hardware_spec_id>", "num_nodes": 1 }, "description": "Testing deployment using base foundation model", "name": "base_fm_deployment", "space_id": "<space_id>" // Specify either project_id or space_id; only one can be used. }' -
Use the deployed base foundation model deployment ID to deploy the LoRA or QLoRA adapters as an additional layer on top of the base foundation model. The following REST API request sample shows how to create a deployment for the adapter model asset:
curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments?version=2025-12-15" \ -H "Authorization: Bearer ${TOKEN}" \ -H "content-type: application/json" \ --data '{ "asset":{ "id":<asset_id> // LoRA adapter model asset }, "online":{ "parameters":{ "serving_name":"lora_adapter_dep" } }, "base_deployment_id": "<your base foundation model deployment ID>", "description": "Testing deployment using lora adapter model", "name":"lora_adapter_deployment", "space_id":<space_id> // Specify either project_id or space_id; only one can be used. }'
-
The API response contains the deployment ID in the
metadata.idfield. -
Poll for the deployment status by using the deployment ID and wait until the state changes from
initializingtoready.curl -X GET "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v4/deployments/<deployment_id>?version=2025-12-15&space_id=<space_id>" \ -H "Authorization: Bearer ${TOKEN}"After deployment is created successfully, the polling status returns the
deployed_asset_typeasbase_foundation_model."deployed_asset_type": "base_foundation_model" -
Inference the deployment to test your tuned model by using the watsonx.ai text generation API as follows:
curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v1/deployments/<deployment_id>/text/generation?version=2025-12-15" \ -H "Authorization: Bearer ${TOKEN}" \ -H "content-type: application/json" \ --data '{ "input": "What is the boiling point of water?", "parameters": { "max_new_tokens": 200, "min_new_tokens": 20 } }'
Python
You can deploy fine-tuned foundation models in IBM watsonx.ai programmatically by using the Python library. For details, see Deployments.create() method.
To get started, see the following sample notebooks: