What Is Context Engineering?

By Joshua Noble

Context engineering, defined

Context engineering is the practice of deliberately designing, structuring and optimizing the context provided to a large language model (LLM) to produce more accurate, relevant and reliable outputs.

It encompasses two closely related and commonly discussed techniques: Prompt engineering and retrieval augmented generation (RAG):

Prompt engineering is the process of creating an LLM prompt that will lead to a better output. This method is a subset of context engineering, but narrower in scope. It doesn’t address the system prompts or any other information that the LLM has access to at inference time.

RAG focuses on ways to add information from documents or databases to the LLMs’ context window. However, RAG doesn’t address the system prompts, how message history is managed and how information from other tools is incorporated into the context.

Context engineering brings these elements together into a broader framework. Context engineering is especially important in agentic AI systems or multistep reasoning tasks, as these cases rely on the model to read from and update its context multiple times before returning an output.

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

What is context in machine learning?

In modern machine learning (ML) systems, context is everything the model sees at inference time. Using an LLM as an example, context includes:

The system prompt (the instructions and rules the LLM receives)
The user’s query that is passed to the LLM
Any documents or information retrieved from an external source like a vector database or document store
Any data retrieved from structured data sources like JSON, CSV or schemas.
The history of the interactions and responses between the LLM and the user
The output of any additional systems such as tool outputs or agentic systems

Why is context engineering important?

Context engineering involves structuring what information to include in the context that the LLM can access in real-time, and how to format that information so the LLM can use it correctly. The context window can be thought of as the model’s short-term memory.

LLMs (and AI agents built on top of them) do not have long-term memory. They generate responses solely based on the context window they have access to at inference time. As a result, the quality and structure of the context window directly influence the model’s reasoning, tool use and outputs.

A well-managed context window will lead to better reasoning, tool calls and answers. A poorly managed context window can lead to hallucinations, confusion or irrelevant outputs.

In real-world AI engineering, when building RAG systems or agents, performance often depends more on the quality of the information in the context window than on model size or training techniques.

Think Keynotes

Power the agentic enterprise

Understand how AI-ready data platforms enable real-time insights and execution, while supporting secure, sovereign deployment across environments.

Explore watsonx.data

What are the key context engineering steps?

The core steps of the context engineering process are as follows:

Context selection
Context structuring
Prompt design and engineering
Context compression
Context sequencing
Tool and memory integration

1. Context selection

Choosing what information to include in the context. It includes filtering out irrelevant or redundant data and either reranking or reducing the context window to better fit the current task. This step also includes preventing context poisoning, when inaccurate or malicious information enters the context window.

2. Context structuring

Curating how all the information in the full context window is organized and given structure. That could mean using data formats like markdown or JSON to define instructions, providing examples of prompts and responses called few-shot learning and separating LLM instructions from data. Particularly with multistep or reasoning tasks, this kind of structure helps the LLM generate useful inferences.

3. Prompt design and engineering

How instructions are framed, the constraints and rules included in them and any requirements for the output formatting. They can be specific examples of how to format JSON responses or instructions to respond with only the name of a document rather than a description of it.

4. Context compression

Includes techniques for fitting more useful information into limited token space. Finding ways to use that space more efficiently typically doesn’t involve actual data compression algorithms like Huffman encoding. Instead, techniques like summarization and deduplication can ensure that the LLM’s memory only contains relevant information.

5. Context sequencing

Ordering matters for LLMs. The prompts passed to a model should include instructions and rules first, the most relevant context nearest to the top and deeper history or less important information further back in the window.

6. Tool and memory integration

Managing how tool outputs are inserted into context and how LLMs determine when the current context or the models’ training doesn’t contain the required information and a tool call is required.

Context engineering encompasses many ways to manage the natural language that guides how the model generates an inference. Whether the model is deciding which tool to call or what doc to use, having the right context to inform that decision will improve the results.

Core context engineering concepts

There are several concepts that underpin context engineering, including:

Context retrieval
Context generation
Context processing
Context management

Context retrieval

Context retrieval is the process of selecting and pulling relevant information from external sources to include in the model’s context so it does not rely solely on its internal training data. Effective retrieval reduces hallucinations and improves factual grounding by supplying the model with evidence it can reference during generation.

This process typically involves searching databases, document stores or knowledge bases to find content that is most relevant to a user’s query. It can be based on keyword matching, semantic similarity or hybrid approaches that combine both. The goal of context retrieval is to ensure that the model has access to accurate, relevant and up-to-date information.

Context generation

Context generation refers to the process of constructing or transforming information into a form that is more interpretable and actionable for the model, especially when raw, retrieved data is too large, noisy or unstructured.

The process can include summarizing long documents, extracting key facts, organizing data into structured formats or synthesizing multiple retrieved pieces of information into a coherent input. Context generation can also involve adding instructions, formatting the data clearly or combining retrieved content with system prompts and conversation history.

Together, context retrieval and context generation form an interdependent pipeline. Retrieval determines what information is selected, while generation determines how that information is shaped and presented. If retrieval brings in irrelevant or low-quality data, even well-structured context generation cannot fully compensate. Likewise, if generation is poorly designed, even highly relevant retrieved data might not be used effectively by the model.

Context processing

Context processing is the set of steps that transform raw inputs, retrieved data and intermediate information into a clean, organized and usable form before it is passed to a language model. It acts as the bridge between context retrieval and final context generation, ensuring that all pieces of information are coherent, relevant and optimized for the model’s reasoning. Context processing can involve:

Cleaning and normalizing data: These processes help to remove inconsistencies, noise or irrelevant elements.
Chunking and segmentation: In this step, large documents are broken into smaller, logically consistent pieces so they can fit within the context window and be more easily retrieved and understood.
Ranking and filtering: These steps reorder content based on relevance, remove redundancies or prioritize the most important facts, which helps ensure the most valuable information appears prominently in the final context.
Transformation and enrichment: Data can be converted into structured formats such as labeled text, tables or JSON-like representations. Additional metadata, summaries or annotations can be added. This step makes it easier for the model to interpret and reason over the content.
Compression: Because models have limited context windows, processing can involve summarizing or condensing information so that more useful content can fit into the available space without overwhelming the model.
Consistency and alignment: This step integrates retrieved data, user queries, system instructions and prior conversation history into a unified representation. This alignment reduces ambiguity and helps the model understand how different pieces of information relate to one another.

Context management

Context management is the ongoing process of controlling, organizing and maintaining the information that is included in a model’s context across interactions and over time. It ensures that the right information is available to the model when needed, while unnecessary or outdated information is removed or minimized. Key functions include:

Deciding what to retain and what to discard: Context management determines which parts of a conversation, retrieved documents or external data should be kept to support future reasoning and which parts can be safely omitted without losing important meaning.
Maintaining continuity across multiple interactions: In conversational systems, this means tracking relevant user inputs, system instructions and prior responses so that the model can respond consistently and with awareness of earlier context. This function can include summarizing past conversations or selectively storing key facts as memory.
Prioritizing information: Not all pieces of information have equal importance, so context management assigns priority to certain data based on relevance, recency or importance to the task. More relevant or recent information is typically kept closer to the model’s input, while less relevant details might be compressed or removed.
Handling the information lifecycle: This function works to update the context when new information becomes available, invalidate outdated data and ensure that the context reflects the most accurate and current state of knowledge. It is especially important in dynamic environments where information changes over time.
Coordinating with retrieval systems and processing systems: This step ensures that retrieved information is integrated properly, that processed data is consistent and that the overall context remains coherent. It also helps balance competing constraints such as limited token space, latency requirements and the need for high-quality outputs.

The importance of data retrieval in context engineering

Data retrieval is central to context engineering because it determines what and how external information is brought into the model’s working context at inference time. It directly controls the quality, relevance and factual grounding of the model output.

Selecting relevant, quality content

Retrieval is responsible for selecting the most useful pieces of information from potentially large and diverse data sources, such as codebases, databases or knowledge bases. If it surfaces relevant and accurate content, then the model can produce responses that are both precise and trustworthy.

Conversely, poor retrieval can introduce irrelevant, redundant or misleading information, which increases the likelihood of hallucinations or incorrect conclusions.

Managing context window limitations

Retrieval also plays a key role in managing the limitations of the model’s context window. Retrieval mechanisms must prioritize and rank information so that the most important content is included. Techniques such as semantic search, hybrid search and reranking are essential parts of context engineering to ensure that the limited space is used effectively.

Enabling dynamic knowledge access

Retrieval enables dynamic knowledge access, as in a RAG generative AI (genAI) system. Instead of relying solely on training data, a system can pull in up-to-date or domain-specific information at run time. This capability is especially important in enterprise use cases, where the most valuable knowledge often resides in proprietary or frequently changing data sources.

Preparing data for effective consumption

Finally, retrieval interacts closely with how context is structured and presented. The way retrieved data is chunked, ordered and formatted influences how well the model can interpret and use it. Even highly relevant information can be underutilized if it is poorly organized or buried among less important details.

Retrieval is not just about finding data, but about preparing it to be effectively consumed within the broader context engineering pipeline.

Understanding databases and data formats for effective context engineering

Databases help determine how information is stored, retrieved, structured and ultimately injected into a model’s context. Different storage systems and formats influence what can be retrieved efficiently and how easily the model can interpret the retrieved data. Below are the key database categories that are important to understand:

Relational databases

Traditional relational databases (such as PostgreSQL or MySQL) store structured data in tables with defined schemas. They are useful when context engineering requires precise queries, filtering and joins across well-defined fields. Data retrieved from relational systems often needs to be converted into a readable textual or structured format before being passed to a model.

NoSQL databases

NoSQL databases, such as MongoDB or Cassandra, handle semistructured or unstructured data and are more flexible than relational databases. They are often used when storing documents, logs or user-generated content, which are common inputs in context engineering pipelines.

Vector databases

Vector databases are especially important in modern context engineering. Systems like Pinecone, Weaviate and FAISS store embeddings and enable semantic search. Instead of exact matching, they retrieve information based on meaning and similarity, which is critical for RAG systems. These databases determine which pieces of content are most relevant to a query at a conceptual level.

Search engines

Search engines such as Elasticsearch or OpenSearch are also widely used. They support keyword-based and hybrid search, combining lexical and semantic retrieval. These features make them useful when both exact matches and conceptual similarity are important for building context.

Graph databases

Graph databases, such as Neo4j, represent data as nodes and relationships. These tools are valuable when context depends on connections between entities, such as knowledge graphs or recommendation systems. They help retrieve not just facts, but relationships, which can improve reasoning in complex queries.

In addition to databases, data formats are equally important because they determine how retrieved information is presented to the model.

Structured formats

Structured formats such as JSON and XML are commonly used because they preserve hierarchy and meaning, making it easier for models to interpret relationships between fields.

Tabular formats

Tabular formats such as CSV are useful for numerical or categorical data but often need extra labeling or explanation to be meaningful in context.

Unstructured formats

Unstructured formats, such as plain text, PDFs and HTML, require preprocessing steps like chunking, cleaning and summarization before use. The way these formats are transformed directly impacts how well the model can understand and use the information.

Hybrid and multimodal formats

Hybrid and multimodal formats include combinations of text with images, code or metadata. Context engineering must account for how to represent these diverse inputs in a way that fits within the model’s input constraints while preserving their meaning.

Overall, different databases determine how information is stored and retrieved, while data formats determine how that information is represented and understood. Effective context engineering requires aligning both so that the right information is not only retrieved but also presented in a way the model can use effectively.

Context engineering tools and technologies

There are a wide range of tools and technologies designed to help data scientists and machine learning engineers manage the context of one or many LLMs. Some of the most common include:

Vector databases: These databases (such as OpenSearch, Pinecone, Weaviate and FAISS) enable semantic retrieval based on meaning rather than exact matches. They store embeddings and allow context engineering pipelines to retrieve the most relevant pieces of information for a specific query, which is critical in RAG.
Frameworks and orchestration tools: Python libraries such as LangChain and LlamaIndex help developers build automated pipelines that handle data retrieval, data validation, user preference structures, prompt construction templates, tool usage and memory. They provide abstractions for chaining together different steps in context engineering workflows.
Embedding models and APIs: Services like OpenAI Embeddings API or open source models from Hugging Face convert text into vector representations that can be stored and searched in vector databases. Without embeddings, semantic retrieval would not be possible.
Evaluation and monitoring tools: Platforms such as Weights & Biases or LangSmith help track how context is constructed and how it affects model outputs. These tools allow developers to debug retrieval quality, prompt structure and overall system performance.

Authors

Joshua Noble

Data Scientist

Your AI can't act on what it can't access

Unify and access your data to help scale your AI

What is context engineering?