Let’s understand the sentiment analysis task that provides a clear explanation of prompt tuning’s mechanics and benefits. Suppose that the goal is to adapt a 175-billion parameter model to classify movie reviews as “positive” or “negative.” A full fine-tuning approach would be prohibitively expensive and slow. With prompt tuning, the process is as follows:
Start with a frozen pretrained model: The 175B parameter backbone remains entirely untouched, preserving its vast repository of general knowledge learned during pretraining.5
Add soft prompts: A small set of trainable vectors (for example, 20 virtual tokens) is attached to the input embeddings of every movie review. These vectors are not human-readable text; they are continuous embeddings that exist in the same high-dimensional space as the model’s vocabulary (for example, a 12,288-dimensional space for a model of this scale). Through optimization, these vectors learn to encode a continuous, task-specific signal that steers the model’s behavior.
Feed the input: For example,
[Soft Prompts] The movie was absolutely fantastic!
In this example, suppose that we initialize 20 soft prompt tokens for a sentiment analysis task. After training, the input might look like this internally:
[<v1>, <v2>, <v3>, ... <v20>, The, movie, was, absolutely, fantastic, !]
Here, each v1 is a learned, high-dimensional prompt vector. The goal of training is to find the optimal values for the vectors that guide the frozen model to correctly classify the sentiment of the subsequent text.
Train only the soft prompts: By using a labeled dataset of movie reviews, the training process is initiated. Through backpropagation, the error gradient is computed but the optimization step updates only the parameters of the soft prompt embeddings. This approach involves tuning only a few thousand parameters instead of the model’s 175 billion weights.5
Deploy with modularity: Once training is complete, the resulting set of 20 vectors constitutes the entire task-specific adaptation. To adapt the same base model for a different task, such as spam detection, one simply trains a new set of soft prompts on a spam dataset and swaps them in at inference time
This technique offers substantial efficiency benefits. Instead of storing a separate, full copy of the model for each task—a 175B parameter model can require up to 350 GB—one needs to store the task-specific prompt parameters, which might be only a few KB in size.1 This modularity makes prompt tuning a practical and cost-effective solution for large-scale model adaptation.2