Tuning a foundation model programmatically
You can programmatically tune a set of foundation models in watsonx.ai to customize the models for your use case.
You can use the programmatically fine tune a foundation model with the following techniques:
- Full fine tuning
- Low-rank adaptation (LoRA) fine tuning
- Quantized low-rank adaptation (QLoRA) fine tuning
You can use any fine tuning method to tune custom foundation models.
- Required permissions
-
To run tuning experiments, you must have the Admin or Editor role in a project.
- Required credentials
-
You must generate credentials to authenticate with watsonx.ai APIs. For details, see Generating a bearer token.
- Data format
-
Tabular: JSON, JSONL
-
Tabular data from supported data connections. For details, see Data formats.
-
Note: You can use the same training data with one or more tuning experiments.
- Data size
-
50 to 10,000 input and output example pairs. The maximum file size is 200 MB.
Ways to develop
You can tune foundation models by using these programming methods:
Alternatively, you can use graphical tools from the watsonx.ai UI to tune foundation models. See Tuning Studio.
Supported foundation models
To get a list of foundation models that support full fine tuning, use the following API request:
curl -X GET \
'https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v1/foundation_model_specs?version=2025-02-20&filters=function_fine_tune_trainable'
To get a list of foundation models that support LoRA or QLoRA fine tuning, use the following API request:
curl -X GET \
'https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v1/foundation_model_specs?version=2025-02-20&filters=function_lora_fine_tune_trainable'
See Choosing a foundation model to tune.
You can use LoRA only with non-quantized models and QLoRA only with quantized models.
REST API
The high-level steps that you follow are mostly the same for each technique. The key differences are the values to include in the request body for the fine-tuning training job and are highlighted in the following procedure. For details about the REST API method details, see Fine tunings in the watsonx.ai API reference documentation.
-
Create a training data file to use for tuning the foundation model.
For more information about the training data file requirements, see Data formats for tuning foundation models.
-
Make your training data file available for the API to use. You can create a data ass
You can do one of the following things:
-
UI method
To upload your JSON or JSONL file, follow the steps in Adding files to reference from the API.
-
API method
Create a connection or data asset by using the Data and AI Common Core API.
You will use the asset ID and training data file details when you add the
training_data_referencessection of the REST API request body. -
-
Use the watsonx.ai API to create a training experiment. See the Create a fine tuning job method.
Submit the POST request to the following endpoint:
curl --request POST 'https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v1/fine_tunings?version=2025-02-14' \ --header 'Accept: application/json' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer ${TOKEN}' \Customize the experiment by setting values for various parameters in the
FineTuningParameterspayload. For details, see Choosing a model to tune and Parameters for tuning foundation models.Set
auto_update_modeltotrueto save the generated output as an asset that you can use when you deploy the tuned foundation model later. Otherwise, you must save the tuned model or adapters that are generated by the experiment to the repository service to generate anasset_idbefore you can use them in the deployment.The following sample request body creates a full fine-tuning experiment:
{ "project_id": "<project-id>", "name": "my fft experiment", "auto_update_model": true, "tuned_model_name": "my-fine-tuned-model", "parameters": { "base_model": { "model_id": "ibm/granite-3-1-8b-base" }, "task_id": "classification", "num_epochs": 10, "learning_rate": 0.00001, "batch_size": 5, "max_seq_length": 1024, "accumulate_steps": 1, "gpu": { "num": 4 } }, "results_reference": { "location": { "path": "full_fine_tuning/results" }, "type": "fs" }, "training_data_references": [ { "location": { "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=<project-id>", "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ] }The following sample shows the API request output:
{ "entity": { "auto_update_model": true, "parameters": { "accumulate_steps": 1, "base_model": { "model_id": "ibm/granite-3-1-8b-base" }, "batch_size": 5, "gpu": { "num": 4 }, "learning_rate": 0.00001, "max_seq_length": 1024, "num_epochs": 10, "response_template": "\n### Response:", "task_id": "classification", "verbalizer": "### Input: \n\n### Response: " }, "results_reference": { "location": { "path": "/projects/<project-id>/assets/full_fine_tuning/results", "notebooks_path": "/projects/<project-id>/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914/notebooks", "training": "/projects/<project-id>/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914", "training_status": "/projects/<project-id>/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914/training-status.json", "assets_path": "/projects/<project-id>/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914/assets" }, "type": "fs" }, "status": { "state": "pending" }, "training_data_references": [ { "location": { "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=<project-id>", "id": "1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ], "tuned_model": { "name": "my-fine-tuned-model-63e98673-a2c0-45c1-8ac6-e26a47ec1914" } }, "metadata": { "created_at": "2025-02-14T20:49:03.959Z", "id": "63e98673-a2c0-45c1-8ac6-e26a47ec1914", "modified_at": "2025-02-14T20:49:03.959Z", "name": "my fft experiment", "project_id": "<project-id>" } }The following sample request body creates a LoRA fine-tuning experiment.
{ "project_id": "<project-id>", "name": "my LoRA experiment", "auto_update_model": true, "tuned_model_name": "my-lora-tuned-model", "parameters": { "base_model": { "model_id": "ibm/granite-3-1-8b-base" }, "task_id": "classification", "num_epochs": 10, "learning_rate": 0.00001, "batch_size": 5, "max_seq_length": 4096, "accumulate_steps": 1, "gpu": { "num": 1 }, "peft_parameters": { "type": "lora", "rank": 8, "lora_alpha": 32, "lora_dropout": 0.05, "target_modules": ["all-linear"] } }, "results_reference": { "location": { "path": "fine_tuning/results" }, "type": "fs" }, "training_data_references": [ { "location": { "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=<project-id>", "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ] }The following sample shows the API request output:
{ "entity": { "auto_update_model": true, "parameters": { "accumulate_steps": 1, "base_model": { "model_id": "ibm/granite-3-1-8b-base" }, "batch_size": 5, "gpu": { "num": 1 }, "learning_rate": 0.00001, "max_seq_length": 1024, "num_epochs": 10, "peft_parameters": { "lora_alpha": 32, "lora_dropout": 0.05, "rank": 8, "target_modules": [ "all-linear" ], "type": "lora" }, "response_template": "\n### Response:", "task_id": "classification", "verbalizer": "### Input: \n\n### Response: " }, "results_reference": { "location": { "path": "/projects/<project-id>/assets/fine_tuning/results", "notebooks_path": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/notebooks", "training": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1", "training_status": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/training-status.json", "assets_path": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/assets" }, "type": "fs" }, "status": { "state": "pending" }, "training_data_references": [ { "location": { "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956?project_id=<project-id>", "id": "1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ], "tuned_model": { "name": "my-lora-tuned-model-2491b2d9-bf96-4d3f-9ea7-8604861471e1" } }, "metadata": { "created_at": "2025-02-14T19:47:36.629Z", "id": "2491b2d9-bf96-4d3f-9ea7-8604861471e1", "modified_at": "2025-02-14T19:47:36.629Z", "name": "My LoRA experiment", "project_id": "<project-id>" } }The following sample request body creates a QLoRA fine-tuning experiment.
{ "project_id": "<project-id>", "name": "my QLoRA experiment", "auto_update_model": true, "tuned_model_name": "my-qlora-tuned-model", "parameters": { "base_model": { "model_id": "meta-llama/llama-3-1-70b-gptq" }, "task_id": "classification", "num_epochs": 10, "learning_rate": 0.00001, "batch_size": 5, "max_seq_length": 1024, "accumulate_steps": 1, "gpu": { "num": 1 }, "peft_parameters": { "type": "qlora", "rank": 8, "lora_alpha": 32, "lora_dropout": 0.05, "target_modules": [] } }, "results_reference": { "location": { "path": "fine_tuning/results" }, "type": "fs" }, "training_data_references": [ { "location": { "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=<project-id>", "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ] }The following sample shows the API request output:
{ "entity": { "auto_update_model": true, "parameters": { "accumulate_steps": 1, "base_model": { "model_id": "meta-llama/llama-3-1-70b-gptq" }, "batch_size": 5, "gpu": { "num": 1 }, "learning_rate": 0.00001, "max_seq_length": 1024, "num_epochs": 10, "peft_parameters": { "lora_alpha": 32, "lora_dropout": 0.05, "rank": 8, "target_modules": [], "type": "qlora" }, "response_template": "\n### Response:", "task_id": "classification", "verbalizer": "### Input: \n\n### Response: " }, "results_reference": { "location": { "path": "/projects/<project-id>/assets/fine_tuning/results", "notebooks_path": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/notebooks", "training": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1", "training_status": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/training-status.json", "assets_path": "/projects/<project-id>/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/assets" }, "type": "fs" }, "status": { "state": "pending" }, "training_data_references": [ { "location": { "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956?project_id=<project-id>", "id": "1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ], "tuned_model": { "name": "my-qlora-tuned-model-2491b2d9-bf96-4d3f-9ea7-8604861471e1" } }, "metadata": { "created_at": "2025-02-14T19:47:36.629Z", "id": "2491b2d9-bf96-4d3f-9ea7-8604861471e1", "modified_at": "2025-02-14T19:47:36.629Z", "name": "My QLoRA experiment", "project_id": "<project-id>" } }
Using custom parameters
The fine-tuning API accepts an optional custom object in the training payload that enables you to pass arbitrary parameters directly to the underlying trainer (fms-hf-tuning). This advanced feature provides flexibility
for experienced users who need to configure training parameters beyond the standard options.
Behavior
- When enabled: Values in
custom.parametersare merged into the generated trainer configuration. If a key conflicts with a standard parameter, the custom value takes precedence (override order:custom.parameters>parameters). - When disabled: The
customobject is persisted and returned in API responses, butcustom.parametershave no effect on training. - Validation: No validation is performed on custom parameters. Incompatible parameters will cause the trainer to fail at runtime, not at submission time.
Warning messages
The API returns warnings in the system.warnings array when custom parameters are used:
- Feature enabled:
custom_parameters_warning- "Custom training parameters are used at your own risk. Custom parameters will override conflicting standard training parameters and may be incompatible, potentially causing training to fail." - Feature disabled:
custom_parameters_unsupported_warning- "Custom training parameters are not supported and will be ignored."
Example request with custom parameters
The following example shows a LoRA fine-tuning request that includes custom parameters:
{
"name": "my-lora-fine-tuning",
"space_id": "<space-id>",
"auto_update_model": true,
"parameters": {
"base_model": {
"model_id": "google/flan-t5-xl"
},
"task_id": "classification",
"accumulate_steps": 1,
"num_epochs": 5,
"learning_rate": 0.00005,
"batch_size": 16,
"max_seq_length": 2048,
"response_template": "\n### Response:",
"verbalizer": "### Input: \n\n### Response: ",
"gpu": {
"num": 1
},
"peft_parameters": {
"type": "lora",
"rank": 16,
"target_modules": ["all-linear"],
"lora_alpha": 32,
"lora_dropout": 0.05
},
"gradient_checkpointing": true
},
"custom": {
"parameters": {
"data_formatter_template": "Custom ### Input: \n\n### Response: ",
"num_train_epochs": 10,
"use_flash_attn": true
}
},
"results_reference": {
"connection": {},
"location": {
"path": "fine-tuning/experiment1"
},
"type": "container"
},
"training_data_references": [
{
"connection": {},
"type": "data_asset",
"location": {
"href": "https://api.dataplatform.cloud.ibm.com/v2/assets/<asset-id>?space_id=<space-id>",
"id": "<asset-id>"
}
}
]
}
Example response with custom parameters**
The response includes the custom object and a warning in the system.warnings array:
{
"entity": {
"custom": {
"parameters": {
"data_formatter_template": "Custom ### Input: \n\n### Response: ",
"num_train_epochs": 10,
"use_flash_attn": true
}
},
"parameters": {
"accumulate_steps": 1,
"base_model": { "model_id": "google/flan-t5-xl" },
"batch_size": 16,
"gpu": { "num": 1 },
"gradient_checkpointing": true,
"learning_rate": 0.00005,
"max_seq_length": 2048,
"num_epochs": 5,
"peft_parameters": {
"type": "lora",
"rank": 16,
"target_modules": ["all-linear"],
"lora_alpha": 32,
"lora_dropout": 0.05
},
"response_template": "\n### Response:",
"task_id": "classification",
"verbalizer": "### Input: \n\n### Response: "
},
"results_reference": {
"connection": {},
"location": {
"path": "fine-tuning/experiment1",
"training": "fine-tuning/experiment1/<training-id>",
"training_status": "fine-tuning/experiment1/<training-id>/training-status.json",
"assets_path": "fine-tuning/experiment1/<training-id>/assets",
"model_path": "fine-tuning/experiment1/<training-id>/model",
"training_log": "fine-tuning/experiment1/<training-id>/data/fine_tunings/training.log"
},
"type": "container"
},
"status": {
"state": "pending"
},
"training_data_references": [
{
"connection": {},
"type": "data_asset",
"location": {
"href": "https://api.dataplatform.cloud.ibm.com/v2/assets/<asset-id>?space_id=<space-id>",
"id": "<asset-id>"
}
}
],
"tuned_model": {
"name": "my-lora-fine-tuning-<training-id>"
}
},
"metadata": {
"created_at": "2025-10-14T20:17:50.445Z",
"id": "<training-id>",
"name": "my-lora-fine-tuning",
"space_id": "<space-id>"
},
"system": {
"warnings": [
{
"id": "custom_parameters_warning",
"message": "Custom training parameters are used at your own risk. Custom parameters will override conflicting standard training parameters and may be incompatible, potentially causing training to fail."
}
]
}
}
Checking training job status
-
To check the status of a training job, you can use the following request.
Use the
metadata.idthat is returned in the POST request to include as the value of theIDpath parameter in the request.curl --request GET 'https://cpd-<namespace-name>.apps.<OCP-domain>/ml/v1/fine_tunings/2491b2d9-bf96-4d3f-9ea7-8604861471e1?project_id=<project-id>&version=2025-02-14'For the API reference, see Get fine tuning job.
The tuning experiment is finished when the state is
completed.If you included
"auto_update_model": truein the request, then the model asset ID of the tuned model or adapter will be listed in theentity.tuned_model.idfield of the response from the GET request. Make a note of the model asset ID. -
Use the watsonx.ai API to deploy your tuned model.
To deploy your tuned model, you must complete the appropriate steps for the tuning method used.
-
Low-rank adaptation or quantized low-rank adaptation: Complete the following tasks:
-
Create a base foundation model asset.
The model asset defines metadata for the foundation model that will be used as the base model. See Creating the model asset.
-
Deploy the base foundation model.
You need a dedicated instance of the base foundation model that can be used at inference time. See Deploying the base model.
-
Deploy the low-rank adapter asset that was generated by the tuning experiment.
Deploy adapters that can adjust the base model weights at inference time to customize the output for the task. See Deploying the LoRA adapter model asset.
-
-
Full fine tuning: See Deploying fine-tuned models.
-
-
Inference the tuned foundation model by using an inference endpoint that includes the unique ID of the deployment that hosts the tuned model.
Python
You can fine tune foundation models in IBM watsonx.ai programmatically by using the TuneExperiment class in the Python library. For details, see Working with TuneExperiment and FineTuner.
The FoundationModelsManager class has multiple helper methods that you can use to get a list of foundation models that are tunable. For details, see Foundation model helper methods.
To get started, see the following sample notebooks:
Node.js
You can fine tune foundation models in IBM watsonx.ai programmatically by using the createFineTuning class in the Python library. For more information, see the following resources:
To learn more, see the code example.