Tuning a foundation model programmatically

You can programmatically tune a set of foundation models in watsonx.ai to customize then for your use case.

Ways to develop

You can tune foundation models by using these programming methods:

Alternatively, you can use graphical tools from the watsonx.ai UI to tune foundation models. See Tuning Studio.

REST API

You can tune a foundation model programmatically by using one of the following methods:

Prompt tuning
Fine tuning with various techniques

Follow the appropriate procedure for the tuning method that you want to use.

Prompt tuning by using the REST API

Prompt tuning a foundation model by using the API is a complex task. The sample Python notebooks simplify the process. You can use a sample notebook as a template for writing your own notebooks for prompt tuning. See Tuning a foundation model programmatically.

Supported foundation models

See Choosing a foundation model to tune.

To get a list of foundation models that support prompt tuning programmatically, you can use the following request:

curl -X GET \
  'https://{hostname}/ml/v1/foundation_model_specs?version=2025-02-20&filters=function_prompt_tune_trainable'

Note:

You cannot prompt tune a custom foundation model or a foundation model from a watsonx.ai lightweight engine installation.

Procedure

At a high level, prompt tuning a foundation model by using the API involves the following steps:

Create a training data file to use for tuning the foundation model.

For more information about the training data file requirements, see Data formats for tuning foundation models.
Upload your training data file.

You can choose to add the file by creating one of the following asset types:
- Connection asset
  
  Note: Only a Cloud Object Storage connection type is supported for prompt tuning training currently.
  
  Use the Data and AI Common Core API to define a connection to your data asset.
  
  You will use the connection ID and training data file details when you add the training_data_references section of the REST request that you create in the next step.
- Data asset
  
  To create a data asset, use the Data and AI Common Core API to define a data asset.
  
  You will use the asset ID and training data file details when you add the training_data_references section of the REST request that you create in the next step.
For more information about the supported ways to reference a training data file, see Data references.
Use the watsonx.ai API to create a training experiment.

See create a training.

You can specify parameters for the experiment in the TrainingResource payload. For more information about available parameters, see Parameters for tuning foundation models.

For the task_id, specify one of the tasks that are listed as being supported for the foundation model in the response to the List the available foundation models method.
Save the tuned model to the repository service to generate an asset_id that points to the tuned model.

To save the tuned model, use the Watson Machine Learning API to create a new model.
Use the watsonx.ai API to create a deployment for the tuned model.

See create a deployment.

To inference a tuned model, you must use the inference endpoint that includes the unique ID of the deployment that hosts the tuned model. For more information, see the inference methods in the Deployments section.

Fine tuning a foundation model by using the REST API

You can use the watsonx.ai REST API to fine tune a foundation model with the following techniques:

Full fine tuning
Low-rank adaptation fine tuning
Quantized low-rank adaptation fine tuning

Supported foundation models

See Choosing a foundation model to tune.

To get a list of foundation models that support low-rank adaptation (LoRA) or quantized low-rank adaptation (QLoRA) fine tuning programmatically, you can use the following request:

curl -X GET \
  'https://{hostname}/ml/v1/foundation_model_specs?version=2025-02-20&filters=function_lora_fine_tune_trainable'

You can use QLoRA only with quantized models and LoRA only with non-quantized models.

You can fine tune custom foundation models, but you cannot apply the LoRA or QLoRA methods on custom foundation models.

Procedure

The high-level steps that you follow are mostly the same for each technique. The key differences are the values to include in the request body for the fine-tuning training job and are highlighted in this procedure.

Create a training data file to use for tuning the foundation model.

For more information about the training data file requirements, see Data formats for tuning foundation models.
Make your training data file available for the API to use.

You can do one of the following things:
- UI method
To upload your .json or .jsonl file, follow the steps in Adding files to reference from the API.
- API method
  
  Create a data asset by using the Data and AI Common Core API to define a data asset.
You will use the asset ID and training data file details when you add the training_data_references section of the request body that you create in the next step.

Use the watsonx.ai API to create a training experiment.

See create a training.

Submit the POST request to this endpoint:

curl --request POST 'https://my-onprem-instance.example.com/ml/v1/fine_tunings?version=2025-02-14'

Customize the experiment by changing values for parameters in the TrainingResource payload. For more information, see these resources:

Supported foundation models, see Choosing a model to tune.
Changeable parameters, see Parameters for tuning foundation models.

Set auto_update_model to true to save the generated output as an asset that you can use when you deploy the tuned foundation model later. Otherwise, you must save the tuned model or adapters that are generated by the experiment to the repository service to generate an asset_id before you can use them in the deployment.

The following sample request body creates a full fine-tuning experiment.

{ 
  "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58",
  "name": "my fft experiment",
  "auto_update_model": true,
  "tuned_model_name": "my-fine-tuned-model",
  "parameters": {
    "base_model": {
      "model_id": "ibm/granite-3-1-8b-base" },
    "task_id": "classification",
    "num_epochs": 10,
    "learning_rate": 0.00001,
    "batch_size": 5,
    "max_seq_length": 1024,
    "accumulate_steps": 1,
    "gpu": {
      "num": 4
    }
  }, 
  "results_reference": {
    "location": {
      "path": "full_fine_tuning/results" },
    "type": "fs"
  }, 
  "training_data_references": [
    {
    "location": {
      "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=4e34d515-c61f-4f18-92b4-758be78d0a58",
      "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" },
    "type": "data_asset"
    }
  ]
}

The request output looks something like this:

{
  "entity": {
    "auto_update_model": true,
    "parameters": {
      "accumulate_steps": 1,
      "base_model": {
        "model_id": "ibm/granite-3-1-8b-base"
      },
      "batch_size": 5,
      "gpu": {
        "num": 4
      },
      "learning_rate": 0.00001,
      "max_seq_length": 1024,
      "num_epochs": 10,
      "response_template": "\n### Response:",
      "task_id": "classification",
      "verbalizer": "### Input:  \n\n### Response: "
    },
    "results_reference": {
      "location": {
        "path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/full_fine_tuning/results",
        "notebooks_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914/notebooks",
        "training": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914",
        "training_status": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914/training-status.json",
        "assets_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/full_fine_tuning/results/63e98673-a2c0-45c1-8ac6-e26a47ec1914/assets"
      },
      "type": "fs"
    },
    "status": {
      "state": "pending"
    },
    "training_data_references": [
      {
        "location": {
          "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=4e34d515-c61f-4f18-92b4-758be78d0a58",
          "id": "1e6591a2-c69d-4716-92e3-73e8c2270956"
        },
        "type": "data_asset"
      }
    ],
    "tuned_model": {
      "name": "my-fine-tuned-model-63e98673-a2c0-45c1-8ac6-e26a47ec1914"
    }
  },
  "metadata": {
    "created_at": "2025-02-14T20:49:03.959Z",
    "id": "63e98673-a2c0-45c1-8ac6-e26a47ec1914",
    "modified_at": "2025-02-14T20:49:03.959Z",
    "name": "my fft experiment",
    "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58"
  }
}

The following sample request body creates a LoRA fine-tuning experiment.

{ 
  "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58",
  "name": "my LoRA experiment",
  "auto_update_model": true,
  "tuned_model_name": "my-lora-tuned-model",
  "parameters": {
    "base_model": {
      "model_id": "ibm/granite-3-1-8b-base" },
    "task_id": "classification",
    "num_epochs": 10,
    "learning_rate": 0.00001,
    "batch_size": 5,
    "max_seq_length": 4096,
    "accumulate_steps": 1,
    "gpu": {
      "num": 4
    },
    "peft_parameters": {
      "type": "lora",
      "rank": 8,
      "lora_alpha": 32,
      "lora_dropout": 0.05,
      "target_modules": ["all-linear"]
    }
  }, 
  "results_reference": {
    "location": {
      "path": "fine_tuning/results" },
    "type": "fs"
  }, 
  "training_data_references": [
    {
    "location": {
      "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=4e34d515-c61f-4f18-92b4-758be78d0a58",
      "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" },
    "type": "data_asset"
    }
  ]
}

The output of the request looks something like this:

{
  "entity": {
    "auto_update_model": true,
    "parameters": {
      "accumulate_steps": 1,
      "base_model": {
        "model_id": "ibm/granite-3-1-8b-base"
      },
      "batch_size": 5,
      "gpu": {
        "num": 4
      },
      "learning_rate": 0.00001,
      "max_seq_length": 1024,
      "num_epochs": 10,
      "peft_parameters": {
        "lora_alpha": 32,
        "lora_dropout": 0.05,
        "rank": 8,
        "target_modules": [
          "all-linear"
        ],
        "type": "lora"
      },
      "response_template": "\n### Response:",
      "task_id": "classification",
      "verbalizer": "### Input:  \n\n### Response: "
    },
    "results_reference": {
      "location": {
        "path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results",
        "notebooks_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/notebooks",
        "training": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1",
        "training_status": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/training-status.json",
        "assets_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/assets"
      },
      "type": "fs"
    },
    "status": {
      "state": "pending"
    },
    "training_data_references": [
      {
        "location": {
          "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956?project_id=4e34d515-c61f-4f18-92b4-758be78d0a58",
          "id": "1e6591a2-c69d-4716-92e3-73e8c2270956"
        },
        "type": "data_asset"
      }
    ],
    "tuned_model": {
      "name": "my-lora-tuned-model-2491b2d9-bf96-4d3f-9ea7-8604861471e1"
    }
  },
  "metadata": {
    "created_at": "2025-02-14T19:47:36.629Z",
    "id": "2491b2d9-bf96-4d3f-9ea7-8604861471e1",
    "modified_at": "2025-02-14T19:47:36.629Z",
    "name": "My LoRA experiment",
    "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58"
  }
}

The following sample request body creates a QLoRA fine-tuning experiment.

{ 
  "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58",
  "name": "my QLoRA experiment",
  "auto_update_model": true,
  "tuned_model_name": "my-qlora-tuned-model",
  "parameters": {
    "base_model": {
      "model_id": "meta-llama/llama-3-1-70b-gptq" },
    "task_id": "classification",
    "num_epochs": 10,
    "learning_rate": 0.00001,
    "batch_size": 5,
    "max_seq_length": 1024,
    "accumulate_steps": 1,
    "gpu": {
      "num": 4
    },
    "peft_parameters": {
      "type": "qlora",
      "rank": 8,
      "lora_alpha": 32,
      "lora_dropout": 0.05,
      "target_modules": []
    }
  }, 
  "results_reference": {
    "location": {
      "path": "fine_tuning/results" },
    "type": "fs"
  }, 
  "training_data_references": [
    {
    "location": {
      "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=4e34d515-c61f-4f18-92b4-758be78d0a58",
      "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" },
    "type": "data_asset"
    }
  ]
}

The output of the request looks something like this:

{
  "entity": {
    "auto_update_model": true,
    "parameters": {
      "accumulate_steps": 1,
      "base_model": {
        "model_id": "meta-llama/llama-3-1-70b-gptq"
      },
      "batch_size": 5,
      "gpu": {
        "num": 4
      },
      "learning_rate": 0.00001,
      "max_seq_length": 1024,
      "num_epochs": 10,
      "peft_parameters": {
        "lora_alpha": 32,
        "lora_dropout": 0.05,
        "rank": 8,
        "target_modules": [],
        "type": "qlora"
      },
      "response_template": "\n### Response:",
      "task_id": "classification",
      "verbalizer": "### Input:  \n\n### Response: "
    },
    "results_reference": {
      "location": {
        "path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results",
        "notebooks_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/notebooks",
        "training": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1",
        "training_status": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/training-status.json",
        "assets_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/assets"
      },
      "type": "fs"
    },
    "status": {
      "state": "pending"
    },
    "training_data_references": [
      {
        "location": {
          "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956?project_id=4e34d515-c61f-4f18-92b4-758be78d0a58",
          "id": "1e6591a2-c69d-4716-92e3-73e8c2270956"
        },
        "type": "data_asset"
      }
    ],
    "tuned_model": {
      "name": "my-qlora-tuned-model-2491b2d9-bf96-4d3f-9ea7-8604861471e1"
    }
  },
  "metadata": {
    "created_at": "2025-02-14T19:47:36.629Z",
    "id": "2491b2d9-bf96-4d3f-9ea7-8604861471e1",
    "modified_at": "2025-02-14T19:47:36.629Z",
    "name": "My QLoRA experiment",
    "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58"
  }
}

To check the status of a training job, you can use the following request.

Use the metadata.id that is returned in the POST request to include as the value of the ID path parameter in the request.
```
curl --request GET 'https://my-onprem-instance.example.com/ml/v1/fine_tunings/2491b2d9-bf96-4d3f-9ea7-8604861471e1?project_id=4e34d515-c61f-4f18-92b4-758be78d0a58&version=2025-02-14'
```
For the API reference, see Get fine tuning job.

The tuning experiment is finished when the state is completed.

If you included "auto_update_model": true in the request, then the model asset ID of the tuned model or adapter will be listed in the entity.tuned_model.id field of the response from the GET request. Make a note of the model asset ID.
Use the watsonx.ai API to deploy your tuned model.

To deploy your tuned model, you must complete the appropriate steps for the tuning method used.
- Low-rank adaptation or quantized low-rank adaptation: Complete the following tasks:
  1. Create a base foundation model asset.
    
    The model asset defines metadata for the foundation model that will be used as the base model. See Creating the model asset.
  2. Deploy the base foundation model.
    
    You need a dedicated instance of the base foundation model that can be used at inference time. See Deploying the base model.
  3. Deploy the low-rank adapter asset that was generated by the tuning experiment.
    
    Deploy adapters that can adjust the base model weights at inference time to customize the output for the task. See Deploying the LoRA adapter model asset.
- Full fine tuning: See Deploying fine-tuned models.
Inference the tuned foundation model.

To inference a tuned model, use an inference endpoint that includes the unique ID of the deployment that hosts the tuned model.
- Low-rank adaptation or quantized low-rank adaptation: See Inferencing deployed PEFT models.
- Full fine tuning: See Inferencing the deployed model.

Parent topic: Coding generative AI solutions