Parameters for tuning foundation models

Tuning parameters configure the tuning experiments that you use to tune the model.

During the experiment, the tuning model repeatedly adjusts the structure of the prompt so that its predictions can get better over time.

The following diagram illustrates the steps that occur during a tuning training experiment run. The parts of the experiment flow that you can configure are highlighted. These decision points correspond with experiment tuning parameters that you control.

Tuning experiment run process

The diagram shows the following steps of the experiment:

  1. Starts from the initialization method that you choose to use to initialize the prompt.

    If the initialization method parameter is set to text, then you must add the initialization text.

  2. If specified, tokenizes the initialization text and converts it into a prompt vector.

  3. Reads the training data, tokenizes it, and converts it into batches.

    The size of the batches is determined by the batch size parameter.

  4. Sends input from the examples in the batch to the foundation model for the model to process and generate output.

  5. Compares the model's output to the output from the training data that corresponds to the training data input that was submitted. Then, computes the loss gradient, which is the difference between the predicted output and the actual output from the training data.

    At some point, the experiment adjusts the prompt vector that is added to the input based on the performance of the model. When this adjustment occurs depends on how the Accumulation steps parameter is configured.

  6. Adjustments are applied to the prompt vector that was initialized in Step 2. The degree to which the vector is changed is controlled by the Learning rate parameter. The edited prompt vector is added as a prefix to the input from the next example in the training data, and is submitted to the model as input.

  7. The process repeats until all of the examples in all of the batches are processed.

  8. The entire set of batches are processed again as many times as is specified in the Number of epochs parameter.

Note: No layer of the base foundation model is changed during this process.

Parameter details

The parameters that you change when you tune a model are related to the tuning experiment, not to the underlying foundation model.

The following table captures the default parameters for each open source foundation model.

Table 1: Tuning parameter values for open source foundation models
Parameter name Value options Default value for flan-t5-xl-3b Default value for llama-2-13b-chat Learn more
Initialization method Random, Text Random Random Initializing prompt tuning
Initialization text None None None Initializing prompt tuning
batch_size 1 - 16 16 8 Segmenting the training data
accumulate_steps 1 - 128 16 16 Segmenting the training data
learning_rate 0.00001 - 0.5 0.3 0.002 Managing the learning rate
num_epochs (number of training cycles) 1 - 50 20 20 Choosing the number of training runs to complete

The default parameters that are used for tuning the granite-13b-instruct-v2 foundation model are adjusted based on the type of task you want the tuned model to do.

The following table captures the default parameters per supported task type for the granite-13b-instruct-v2 foundation model.

Table 2: Tuning parameter values for IBM foundation models
Parameter name Value options Default value for classification Default value for generation Default value for summarization Learn more
batch_size 1 - 16 8 16 8 Segmenting the training data
accumulate_steps 1 - 128 32 16 1 Segmenting the training data
learning_rate 0.00001 - 0.5 0.0006 0.0002 0.0002 Managing the learning rate
num_epochs (number of training cycles) 1 - 50 20 20 40 Choosing the number of training runs to complete

Segmenting the training data

When an experiment runs, the experiment first breaks the training data into smaller batches, and then trains on one batch at a time. Each batch must fit in GPU memory to be processed. To reduce the amount of GPU memory that is needed, you can configure the tuning experiment to postpone making adjustments until more than one batch is processed. Tuning runs on a batch and its performance metrics are calculated, but the prompt vector isn't changed. Instead, the performance information is collected over some number of batches before the cumulative performance metrics are evaluated.

Use the following parameters to control how the training data is segmented:

Batch size Number of labeled examples (also known as samples) to process at one time.

For example, for a data set with 1,000 examples and a batch size of 10, the data set is divided into 100 batches of 10 examples each.

If the training data set is small, specify a smaller batch size to ensure that each batch has enough examples in it.

Accumulation steps: Number of batches to process before the prompt vector is adjusted.

For example, if the data set is divided into 100 batches and you set the accumulation steps value to 10, then the prompt vector is adjusted 10 times instead of 100 times.

Initializing prompt tuning

When you create an experiment, you can choose whether to specify your own text to serve as the initial prompt vector or let the experiment generate it for you. These new tokens start the training process either in random positions, or based on the embedding of a vocabulary or instruction that you specify in text. Studies show that as the size of the underlying model grows beyond 10 billion parameters, the initialization method that is used becomes less important.

The choice that you make when you create the tuning experiment customizes how the prompt is initialized.

Initialization method: Choose a method from the following options:

  • Text: The Prompt Tuning method is used where you specify the initialization text of the prompt yourself.
  • Random: The Prompt Tuning method is used that allows the experiment to add values that are chosen at random to include with the prompt.

Initialization text: The text that you want to add. Specify a task description or instructions similar to what you use for zero-shot prompting.

Managing the learning rate

The learning rate parameter determines how much to change the prompt vector when it is adjusted. The higher the number, the greater the change to the vector.

Choosing the number of training runs to complete

The Number of epochs parameter specifies the number of times to cycle through the training data.

For example, with a batch size of 10 and a data set with 1,000 examples, one epoch must process 100 batches and update the prompt vector 100 times. If you set the number of epochs to 20, the model is passed through the data set 20 times, which means it processes a total of 2,000 batches during the tuning process.

The higher the number of epochs and bigger your training data, the longer it takes to tune a model.

Learn more

Parent topic: Tuning a model