Methods for tuning foundation models
Learn more about different tuning methods and how they work.
Models can be tuned in the following ways:
-
Fine tuning: Using the base model’s previous knowledge as a starting point, fine-tuning tailors the model by tuning it with a smaller, task-specific dataset. This process changes the parameter weights for a model whose weights were set through prior training to optimize the model for a task.
A result of fine tuning is an entirely new model. More computational and storage resources are required to host the new tuned model that you create by fine tuning a foundation model.
Fine tuning was added with the 5.0.3 release.
-
Prompt tuning: Adjusts the content of the prompt that is passed to the model to guide the model to generate output that matches a pattern you specify. The underlying foundation model and its parameter weights are not changed. Only the prompt input is altered.
Although the result of prompt tuning is a new tuned model asset, the prompt-tuned model merely adds a layer of function that runs before the input is processed by the underlying foundation model. When you prompt-tune a model, the underlying foundation model is not changed, which means that it can be used to address different business needs without being retrained each time. As a result, you reduce computational needs and inference costs.
To get started, see Tuning a foundation model.
Tuning method comparison
The following table compares the available tuning methods based on common criteria for choosing a tuning method.
Criteria | Prompt tuning | Fine tuning |
---|---|---|
Tuning technique | Prompt vectors are tuned; model parameters remain fixed. | All model parameters are fine-tuned on the target task. |
Tuned model outcomes | Effective when the target task is similar to the pretrained knowledge of the model. | Effective at customizing a model for a task when given sufficient data and compute resources. |
Required compute resources | Fewer resources are required. Only the prompt vector is tuned; the underlying model is unaltered. | High computational resources and memory are required to fully update the model parameters. |
Tuning time duration | Shorter timeframe that can range from 10 minutes to a few hours. | Longer timeframe. Exact duration depends on the model and dataset sizes. |
Cost | Fewer resources are needed to prompt tune and host the tuned model. An effectively prompt-tuned smaller model can do the equivalent work of a larger model and requires fewer resource to host. | Factor the cost of the extra resources required both to fine tune the model and to deploy and host the new fine-tuned model that is generated. |
Purpose | Most suitable for quick adaptation tasks, especially when computational resources are limited, or the task is closely related to the pretrained model. | Best for scenarios where maximum accuracy and task-specific adaptation are critical and resources are abundant. |
How fine tuning works
Use the Tuning Studio to run a fine-tuning experiment that uses supervised learning to train a foundation model on a specific task. You provide training data that consists of examples of user input and the expected foundation model output pairs that are modeled for your task. With the data that you provide, the Tuning Studio runs a fine-tuning experiment.
The fine-tuning experiment manages a series of training runs in which the output that is generated by the foundation model is compared to the training data output. Based on the differences between the two responses, the experiment adjusts the underlying foundation model parameter weight values. After many runs through the training data, the model finds the parameter weights that generate output that more closely matches the output you want. The result of the fine-tuning experiment is a new foundation model that is tuned for your task.
For more information about the fine-tuning process that is used in Tuning Studio, see Fine-tuning workflow.
How prompt tuning works
Foundation models are sensitive to the input that you give them. Your input, or how you prompt the model, can introduce context that the model will use to tailor its generated output. Prompt engineering to find the right prompt often works well. However, it can be time-consuming, error-prone, and its effectiveness can be restricted by the context window length that is allowed by the underlying model.
Prompt tuning a model in the Tuning Studio applies machine learning to the task of prompt engineering. Instead of adding words to the input itself, prompt tuning is a method for finding a sequence of values that, when added as a prefix to the input text, improve the model's ability to generate the output you want. This sequence of values is called a prompt vector.
Normally, words in the prompt are vectorized by the model. Vectorization is the process of converting text to tokens, and then to numbers defined by the model's tokenizer to identify the tokens. Lastly, the token IDs are encoded, meaning they are converted into a vector representation, which is the input format that is expected by the embedding layer of the model. Prompt tuning bypasses the model's text-vectorization process and instead crafts a prompt vector directly. This changeable prompt vector is concatenated to the vectorized input text and the two are passed as one input to the embedding layer of the model. Values from this crafted prompt vector affect the word embedding weights that are set by the model and influence the words that the model chooses to add to the output.
To find the best values for the prompt vector, you run a tuning experiment. You demonstrate the type of output that you want for a corresponding input by providing the model with input and output example pairs in training data. With each training run of the experiment, the generated output is compared to the training data output. Based on what it learns from differences between the two, the experiment adjusts the values in the prompt vector. After many runs through the training data, the model finds the prompt vector that works best.
You can choose to start the training process by providing text that is vectorized by the experiment. Or you can let the experiment use random values in the prompt vector. Either way, unless the initial values are exactly right, they will be changed repeatedly as part of the training process. Providing your own initialization text can help the experiment reach a good result more quickly.
The result of the experiment is a tuned version of the underlying model. You submit input to the tuned model for inferencing and the model generates output that follows the tuned-for pattern.
For more information about the prompt-tuning process that is used in Tuning Studio, see Prompt-tuning workflow.
Learn more
- IBM Research blog post: What is prompt-tuning?
- Research paper: The Power of Scale for Parameter-Efficient Prompt Tuning
- Tuning parameters
Parent topic: Tuning Studio