Creating a fine-tuning experiment
Create a fine-tuning experiment that you can run to fine tune a foundation model from the Tuning Studio.
Starting with the 2.1.1 release, you can use the Low-rank Adaptation (LoRA) and Quantized Low-rank Adaptation (QLoRA) fine-tuning techniques programmatically. You cannot fine tune a foundation model by using the LoRA or QLoRA techniques from the Tuning Studio. For more information, see Tuning a foundation model programmatically.
Prerequisite task: Tuning Studio
To continue fine tuning a foundation model, complete the following steps:
-
Choose the Generation task type.
For more information about formatting prompts, see Verbalizer settings.
-
Add the training data that will be used to tune the model. You can upload a file or use an asset from your project.
To see examples of how to format your file, expand What should your data look like?, and then click Preview template. For more information, see Data formats.
-
Optional: If you want to change the token size of the examples that are used during training, expand What should your data look like? to make adjustments.
For more information, see Setting fine-tuning token limits.
-
Optional: Click Configure parameters to edit the parameters that are used by the tuning experiment.
The tuning run is configured with parameter values that represent a good starting point for tuning a model. You can adjust them if you want.
For more information about the available parameters and what they do, see Tuning parameters.
After you change parameter values, click Save.
-
Click Start tuning.
The tuning experiment begins. It might take one to many hours depending on the size of your training data and the availability of compute resources. When the experiment is finished, the status shows as completed.
Setting fine-tuning token limits
For natural language models, words are converted to tokens. 256 tokens is equal to approximately 130—170 words. 128 tokens is equal to approximately 65—85 words. However, token numbers are difficult to estimate and can differ by model. For more information, see Tokens and tokenization.
Each foundation model has a maximum sequence length, which is an upper limit to the number of tokens in the input prompt plus the number of tokens in the generated output from the model. Maximum sequence length is also known as context window length.
For a fine-tuning experiment, you can adjust the total sequence length that is applied to inference requests during the experiment. Use a sequence length that is long enough to encompass the text from the input-and-output example pairs in your training data in full. Otherwise, the examples will be truncated, leaving the output text incomplete. If the experiment uses incomplete output examples as the standard against which to compare model outputs, poor results are likely.
Verbalizer settings
A verbalizer specifies how to format the training examples that are submitted to the foundation model during a tuning experiment.
The verbalizer for fine tuning has two parts:
- Sample template
-
Defines the format in which the input and output example pairs from your training data are submitted to the model during a tuning experiment.
The default verbalizer has the following format:
### Input: {{input}} ### Response: {{output}}
The default verbalizer format adds the label
### Input:
before the input text that it imports from theinput
field in the training data file. The verbalizer adds the label### Response:
before the output text that it imports from the correspondingoutput
field in the training data file.If you must change the verbalizer, do not change the
{{input}}
or{{output}}
variables. These variables instruct the tuning experiment to extract text from theinput
andoutput
segments of the examples in your training data file. - Response sequence
-
Model output label segment of the verbalizer. The experiment uses the text in this field to determine where the input ends and the model output begins.
The default response sequence has the following format:
### Response:
The response sequence text must exactly match the output label from the verbalizer.
If you change the verbalizer text, you must update the response sequence also. Adjust the Response template character range slider until the field shows the output label text and nothing else. If the response label starts on a new line in the verbalizer, include the new line in the response sequence also.
Troubleshooting a fine-tuning experiment
If your fine-tuning experiment does not complete successfully, and one of the following messages is displayed, try these solutions.
- Out of memory
-
An out of memory message means that there aren't enough resources available to complete the tuning experiment. You can try to reduce the demand on resources by making changes to the experiment configuration.
To change the configuration parameters that are mentioned in the following list, create a new tuning experiment. After selecting the task type, click Configure parameters to make changes, and then save your changes and start tuning.
-
Reduce the batch size.
Large batch sizes can increase the memory footprint. Although larger batch sizes can result in faster train times, try reducing the value if you run out of memory repeatedly. Set the Batch size slider to the value you want to use.
-
Reduce the number of gradient accumulation steps.
Gradient accumulation steps can contribute to overall batch size. Try a lower value. Set the Accumulate steps slider to the value you want to use.
-
Increase the number of GPUs.
Fine tuning a large model consumes more memory and requires more GPUs. To dedicate more GPUs to the experiment, set the Number of GPUs slider to a higher number.
-
Reduce the sequence length of the dataset to the lowest practical value.
Larger sequence lengths consume more memory. If you can reduce the size of the sequence length without harming the quality of the training data that is used during the experiment, do so. Consider this option especially if the sequence length is more than 4,000 tokens.
Remember, if the sequence length is too low, the output examples from your training data will be truncated. This truncation means that incomplete output examples will be used as the standard against which the foundation model output is compared when the quality of the results are evaluated. Using incomplete examples as a standard is likely to produce poor fine tuning results.
To change the sequence length, exit the Configure parameters page. From the Add training data panel, expand the What should your data look like section. Set the Maximum sequence length slider to the value you want to use.
-
- RuntimeError: The size of tensor a must match the size of tensor b
-
This message is occasionally displayed when the dataset is small. From the Configure parameters page of the tuning experiment, set the Accumulate steps slider to 1.
- Could not find response key in the following instance
-
The text that is specified in the response template segment of the verbalizer cannot be found in the training data examples. Some tokenizers tokenize words at the start of a sequence differently from other parts of a sequence. To avoid hitting this inconsistency, include a newline separator at the start of the response template in the verbalizer.
For example:
verbalizer: "### Input: {{input}} \n\n### Response: {{output}}" response_template: “\n### Response:”
Learn more
Parent topic: Tuning Studio