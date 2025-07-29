In-context learning (ICL) is an advanced AI capability introduced in the seminal research paper “Language Models are Few-Shot Learners,” which unveiled GPT-3.1 Unlike supervised learning, where a model undergoes a training phase with backpropagation to alter its parameters, ICL relies entirely on pretrained language models and keeps their parameters unchanged.

The AI model uses the prompt as a temporary guide to infer the task and generate the expected output. ICL works by recognizing relationships between the examples in the prompt also known as input/output pairs and applying the same mapping to new inputs. This process mimics human reasoning, where we solve new problems by drawing analogies from previous experiences. It leverages patterns and knowledge learned during pretraining and dynamically adapts to new tasks, making it highly flexible and efficient.

At its core, in-context learning works by conditioning a large language model (LLM) on a prompt that includes a set of examples (input/output pairs or in-context examples) typically written in natural language as part of the input sequence. These examples, often drawn from a dataset, are not used to retrain the model but are fed directly into its context window. This window shows the amount of text an LLM can process at once, acting as its temporary memory for generating coherent responses and is the part of the model that processes sequential input.

Formally, let the prompt consist of k examples in the form of input/output pairs:

C={(x1 ,y1 ),(x2 ,y2 ),...,(xk ,yk )}

Given a new input x and a candidate output space Y={y1,...,ym}, the model computes the probability of each possible output conditioned on the prompt:

P(yj ∣ x,C)

The prediction is determined by choosing the option with the highest probability:

y ^ = arg max y j ∈ Y P ( y j ∣ x , C )

The model does not update its weights during this process. Instead, leveraging its deep learning transformer architecture, the model learns the pattern dynamically by using only the examples in the current prompt.

To see this method in practice, consider a sentiment classification task. The prompt might look like this:

Review: The movie was fantastic → Sentiment: Positive

Review: I hated the storyline → Sentiment: Negative

Review: The music was pleasant → Sentiment:

The model completes the last line by predicting “Positive,” continuing the structure observed in the earlier input-label mappings. This example showcases few-shot learning, where the model infers the task and generates appropriate responses based on a few number of examples.