Context engineering is the practice of deliberately designing, structuring and optimizing the context provided to a large language model (LLM) to produce more accurate, relevant and reliable outputs.
It encompasses two closely related and commonly discussed techniques: Prompt engineering and retrieval augmented generation (RAG):
Context engineering brings these elements together into a broader framework. Context engineering is especially important in agentic AI systems or multistep reasoning tasks, as these cases rely on the model to read from and update its context multiple times before returning an output.
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.
In modern machine learning (ML) systems, context is everything the model sees at inference time. Using an LLM as an example, context includes:
Context engineering involves structuring what information to include in the context that the LLM can access in real-time, and how to format that information so the LLM can use it correctly. The context window can be thought of as the model’s short-term memory.
LLMs (and AI agents built on top of them) do not have long-term memory. They generate responses solely based on the context window they have access to at inference time. As a result, the quality and structure of the context window directly influence the model’s reasoning, tool use and outputs.
A well-managed context window will lead to better reasoning, tool calls and answers. A poorly managed context window can lead to hallucinations, confusion or irrelevant outputs.
In real-world AI engineering, when building RAG systems or agents, performance often depends more on the quality of the information in the context window than on model size or training techniques.
The core steps of the context engineering process are as follows:
Choosing what information to include in the context. It includes filtering out irrelevant or redundant data and either reranking or reducing the context window to better fit the current task. This step also includes preventing context poisoning, when inaccurate or malicious information enters the context window.
Curating how all the information in the full context window is organized and given structure. That could mean using data formats like markdown or JSON to define instructions, providing examples of prompts and responses called few-shot learning and separating LLM instructions from data. Particularly with multistep or reasoning tasks, this kind of structure helps the LLM generate useful inferences.
How instructions are framed, the constraints and rules included in them and any requirements for the output formatting. They can be specific examples of how to format JSON responses or instructions to respond with only the name of a document rather than a description of it.
Includes techniques for fitting more useful information into limited token space. Finding ways to use that space more efficiently typically doesn’t involve actual data compression algorithms like Huffman encoding. Instead, techniques like summarization and deduplication can ensure that the LLM’s memory only contains relevant information.
Ordering matters for LLMs. The prompts passed to a model should include instructions and rules first, the most relevant context nearest to the top and deeper history or less important information further back in the window.
Managing how tool outputs are inserted into context and how LLMs determine when the current context or the models’ training doesn’t contain the required information and a tool call is required.
Context engineering encompasses many ways to manage the natural language that guides how the model generates an inference. Whether the model is deciding which tool to call or what doc to use, having the right context to inform that decision will improve the results.
There are several concepts that underpin context engineering, including:
Context retrieval is the process of selecting and pulling relevant information from external sources to include in the model’s context so it does not rely solely on its internal training data. Effective retrieval reduces hallucinations and improves factual grounding by supplying the model with evidence it can reference during generation.
This process typically involves searching databases, document stores or knowledge bases to find content that is most relevant to a user’s query. It can be based on keyword matching, semantic similarity or hybrid approaches that combine both. The goal of context retrieval is to ensure that the model has access to accurate, relevant and up-to-date information.
Context generation refers to the process of constructing or transforming information into a form that is more interpretable and actionable for the model, especially when raw, retrieved data is too large, noisy or unstructured.
The process can include summarizing long documents, extracting key facts, organizing data into structured formats or synthesizing multiple retrieved pieces of information into a coherent input. Context generation can also involve adding instructions, formatting the data clearly or combining retrieved content with system prompts and conversation history.
Together, context retrieval and context generation form an interdependent pipeline. Retrieval determines what information is selected, while generation determines how that information is shaped and presented. If retrieval brings in irrelevant or low-quality data, even well-structured context generation cannot fully compensate. Likewise, if generation is poorly designed, even highly relevant retrieved data might not be used effectively by the model.
Context processing is the set of steps that transform raw inputs, retrieved data and intermediate information into a clean, organized and usable form before it is passed to a language model. It acts as the bridge between context retrieval and final context generation, ensuring that all pieces of information are coherent, relevant and optimized for the model’s reasoning. Context processing can involve:
Context management is the ongoing process of controlling, organizing and maintaining the information that is included in a model’s context across interactions and over time. It ensures that the right information is available to the model when needed, while unnecessary or outdated information is removed or minimized. Key functions include:
Data retrieval is central to context engineering because it determines what and how external information is brought into the model’s working context at inference time. It directly controls the quality, relevance and factual grounding of the model output.
Retrieval is responsible for selecting the most useful pieces of information from potentially large and diverse data sources, such as codebases, databases or knowledge bases. If it surfaces relevant and accurate content, then the model can produce responses that are both precise and trustworthy.
Conversely, poor retrieval can introduce irrelevant, redundant or misleading information, which increases the likelihood of hallucinations or incorrect conclusions.
Retrieval also plays a key role in managing the limitations of the model’s context window. Retrieval mechanisms must prioritize and rank information so that the most important content is included. Techniques such as semantic search, hybrid search and reranking are essential parts of context engineering to ensure that the limited space is used effectively.
Retrieval enables dynamic knowledge access, as in a RAG generative AI (genAI) system. Instead of relying solely on training data, a system can pull in up-to-date or domain-specific information at run time. This capability is especially important in enterprise use cases, where the most valuable knowledge often resides in proprietary or frequently changing data sources.
Finally, retrieval interacts closely with how context is structured and presented. The way retrieved data is chunked, ordered and formatted influences how well the model can interpret and use it. Even highly relevant information can be underutilized if it is poorly organized or buried among less important details.
Retrieval is not just about finding data, but about preparing it to be effectively consumed within the broader context engineering pipeline.
Databases help determine how information is stored, retrieved, structured and ultimately injected into a model’s context. Different storage systems and formats influence what can be retrieved efficiently and how easily the model can interpret the retrieved data. Below are the key database categories that are important to understand:
Traditional relational databases (such as PostgreSQL or MySQL) store structured data in tables with defined schemas. They are useful when context engineering requires precise queries, filtering and joins across well-defined fields. Data retrieved from relational systems often needs to be converted into a readable textual or structured format before being passed to a model.
Vector databases are especially important in modern context engineering. Systems like Pinecone, Weaviate and FAISS store embeddings and enable semantic search. Instead of exact matching, they retrieve information based on meaning and similarity, which is critical for RAG systems. These databases determine which pieces of content are most relevant to a query at a conceptual level.
Search engines such as Elasticsearch or OpenSearch are also widely used. They support keyword-based and hybrid search, combining lexical and semantic retrieval. These features make them useful when both exact matches and conceptual similarity are important for building context.
Graph databases, such as Neo4j, represent data as nodes and relationships. These tools are valuable when context depends on connections between entities, such as knowledge graphs or recommendation systems. They help retrieve not just facts, but relationships, which can improve reasoning in complex queries.
In addition to databases, data formats are equally important because they determine how retrieved information is presented to the model.
Structured formats such as JSON and XML are commonly used because they preserve hierarchy and meaning, making it easier for models to interpret relationships between fields.
Tabular formats such as CSV are useful for numerical or categorical data but often need extra labeling or explanation to be meaningful in context.
Unstructured formats, such as plain text, PDFs and HTML, require preprocessing steps like chunking, cleaning and summarization before use. The way these formats are transformed directly impacts how well the model can understand and use the information.
Hybrid and multimodal formats include combinations of text with images, code or metadata. Context engineering must account for how to represent these diverse inputs in a way that fits within the model’s input constraints while preserving their meaning.
Overall, different databases determine how information is stored and retrieved, while data formats determine how that information is represented and understood. Effective context engineering requires aligning both so that the right information is not only retrieved but also presented in a way the model can use effectively.
There are a wide range of tools and technologies designed to help data scientists and machine learning engineers manage the context of one or many LLMs. Some of the most common include:
Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.
Deliver high-quality, trusted data faster by automating DataOps across pipelines, platforms, and teams.
Successfully scale AI with the right strategy, data, security and governance in place.