Parameters for tuning foundation models
Tuning parameters configure the tuning experiments that you use to tune the model.
During the experiment, the tuning model repeatedly adjusts the structure of the prompt so that its predictions can get better over time.
The following diagram illustrates the steps that occur during a tuning training experiment run. The parts of the experiment flow that you can configure are highlighted. These decision points correspond with experiment tuning parameters that you control.
The diagram shows the following steps of the experiment:
-
Starts from the initialization method that you choose to use to initialize the prompt.
If the initialization method parameter is set to
text
, then you must add the initialization text. -
If specified, tokenizes the initialization text and converts it into a prompt vector.
-
Reads the training data, tokenizes it, and converts it into batches.
The size of the batches is determined by the batch size parameter.
-
Sends input from the examples in the batch to the foundation model for the model to process and generate output.
-
Compares the model's output to the output from the training data that corresponds to the training data input that was submitted. Then, computes the loss gradient, which is the difference between the predicted output and the actual output from the training data.
At some point, the experiment adjusts the prompt vector that is added to the input based on the performance of the model. When this adjustment occurs depends on how the Accumulation steps parameter is configured.
-
Adjustments are applied to the prompt vector that was initialized in Step 2. The degree to which the vector is changed is controlled by the Learning rate parameter. The edited prompt vector is added as a prefix to the input from the next example in the training data, and is submitted to the model as input.
-
The process repeats until all of the examples in all of the batches are processed.
-
The entire set of batches are processed again as many times as is specified in the Number of epochs parameter.
Parameter details
The parameters that you change when you tune a model are related to the tuning experiment, not to the underlying foundation model.
The following table captures the default parameters for each open source foundation model.
Parameter name | Value options | Default value for flan-t5-xl-3b | Default value for llama-2-13b-chat | Learn more |
---|---|---|---|---|
Initialization method | Random, Text | Random | Random | Initializing prompt tuning |
Initialization text | None | None | None | Initializing prompt tuning |
batch_size | 1 - 16 | 16 | 8 | Segmenting the training data |
accumulate_steps | 1 - 128 | 16 | 16 | Segmenting the training data |
learning_rate | 0.00001 - 0.5 | 0.3 | 0.002 | Managing the learning rate |
num_epochs (number of training cycles) | 1 - 50 | 20 | 20 | Choosing the number of training runs to complete |
The default parameters that are used for tuning the granite-13b-instruct-v2 foundation model are adjusted based on the type of task you want the tuned model to do.
The following table captures the default parameters per supported task type for the granite-13b-instruct-v2 foundation model.
Parameter name | Value options | Default value for classification | Default value for generation | Default value for summarization | Learn more |
---|---|---|---|---|---|
batch_size | 1 - 16 | 8 | 16 | 8 | Segmenting the training data |
accumulate_steps | 1 - 128 | 32 | 16 | 1 | Segmenting the training data |
learning_rate | 0.00001 - 0.5 | 0.0006 | 0.0002 | 0.0002 | Managing the learning rate |
num_epochs (number of training cycles) | 1 - 50 | 20 | 20 | 40 | Choosing the number of training runs to complete |
Segmenting the training data
When an experiment runs, the experiment first breaks the training data into smaller batches, and then trains on one batch at a time. Each batch must fit in GPU memory to be processed. To reduce the amount of GPU memory that is needed, you can configure the tuning experiment to postpone making adjustments until more than one batch is processed. Tuning runs on a batch and its performance metrics are calculated, but the prompt vector isn't changed. Instead, the performance information is collected over some number of batches before the cumulative performance metrics are evaluated.
Use the following parameters to control how the training data is segmented:
Batch size Number of labeled examples (also known as samples) to process at one time.
For example, for a data set with 1,000 examples and a batch size of 10, the data set is divided into 100 batches of 10 examples each.
If the training data set is small, specify a smaller batch size to ensure that each batch has enough examples in it.
Accumulation steps: Number of batches to process before the prompt vector is adjusted.
For example, if the data set is divided into 100 batches and you set the accumulation steps value to 10, then the prompt vector is adjusted 10 times instead of 100 times.
Initializing prompt tuning
When you create an experiment, you can choose whether to specify your own text to serve as the initial prompt vector or let the experiment generate it for you. These new tokens start the training process either in random positions, or based on the embedding of a vocabulary or instruction that you specify in text. Studies show that as the size of the underlying model grows beyond 10 billion parameters, the initialization method that is used becomes less important.
The choice that you make when you create the tuning experiment customizes how the prompt is initialized.
Initialization method: Choose a method from the following options:
- Text: The Prompt Tuning method is used where you specify the initialization text of the prompt yourself.
- Random: The Prompt Tuning method is used that allows the experiment to add values that are chosen at random to include with the prompt.
Initialization text: The text that you want to add. Specify a task description or instructions similar to what you use for zero-shot prompting.
Managing the learning rate
The learning rate parameter determines how much to change the prompt vector when it is adjusted. The higher the number, the greater the change to the vector.
Choosing the number of training runs to complete
The Number of epochs parameter specifies the number of times to cycle through the training data.
For example, with a batch size of 10 and a data set with 1,000 examples, one epoch must process 100 batches and update the prompt vector 100 times. If you set the number of epochs to 20, the model is passed through the data set 20 times, which means it processes a total of 2,000 batches during the tuning process.
The higher the number of epochs and bigger your training data, the longer it takes to tune a model.
Learn more
Parent topic: Tuning a model