OpenRAG is a framework developed by IBM for building Retrieval-Augmented Generation (RAG) systems that connect large language models (LLMs) to external data sources like documents, databases, or knowledge bases.
It is a comprehensive, single-package RAG platform built on Langflow, Docling and OpenSearch. It provides tools to process data, retrieve relevant information and generate answers grounded in that data.
Fundamentally, OpenRAG is an open source toolkit for building state-of-the-art and context-aware AI systems that answer questions using your own data.
Large language models alone rely only on training data and cannot access private or updated information. RAG applications solve this issue by retrieving relevant documents with semantic search and giving them to the model as context to create better informed and more predictable AI agents. OpenRAG provides a complete stack to implement this pipeline.
A typical pipeline takes a user query, activates a retriever, which performs a vector search or a hybrid search and finds relevant documents. Those documents are ordered using reranking and then the LLM generates an answer using information from the most relevant documents.
OpenRAG can integrate with enterprise artificial intelligence platforms such as IBM watsonx.data® or on other platforms and with a range of LLMs and hosting options for those models. Organizations can connect the framework to models from different providers, including those models hosted on IBM watsonx.ai® or other external model hosting.
The OpenRAG framework is open and composable so that companies can combine different embedding models, search engines, orchestration layers and LLMs depending on infrastructure and governance requirements.
OpenRAG is aimed at building enterprise gen AI assistants and knowledge systems. Common applications include customer support assistants that search documentation and support tickets and internal knowledge copilots that answer employee questions using company documents. These applications can also include analytics tools that allow users to query large collections of reports or policies in natural language.
By combining retrieval, indexing and generative AI into a coordinated system, OpenRAG helps organizations deploy agentic applications grounded in their own data rather than relying solely on general-purpose model knowledge.
One configuration is a fully self-hosted architecture in which all components run inside an organization’s infrastructure. In this setup, document ingestion tools process files by parsing them into embeddings, the retrieval system is hosted internally on a Kubernetes cluster, and the orchestration workflow runs within the same environment.
For example, an enterprise could run OpenSearch for indexing and hybrid retrieval on internal servers. It could also use orchestration tools such as Langflow or other pipeline frameworks locally and host open-weight language models through inference servers like vLLM or Hugging Face Text Generation Inference. This architecture is common for organizations with strict data residency or security requirements because all documents, embeddings and model inference remain inside the organization’s network.
For organizations that want to use models hosted by model providers, OpenRAG separates retrieval infrastructure from model inference. In this approach enterprise data and vector indexes remain self-hosted, while the language model runs in a managed cloud service.
For example, documents might be processed and indexed internally using OpenSearch, but queries are sent to external model providers such as OpenAI API to access generative pretrained transformer (GPT) models or Anthropics Claude for generation. The retrieval results are added to the prompt before the request is sent to the model. This configuration allows organizations to keep proprietary data within their infrastructure while benefiting from large, continuously updated hosted models.
Organizations that already use the IBM AI stack can deploy OpenRAG with IBM’s managed data and model platforms. In this configuration, enterprise data is stored and indexed within IBM watsonx.data, while models are served through IBM watsonx.ai using IBM Granite® models.
Retrieval pipelines, document processing and agent workflows can run within the IBM environment while still relying on open tools like OpenSearch for search. This architecture simplifies integration with IBM governance, security and data management services, which can be important for regulated industries.
Another configuration distributes different components across multiple cloud environments. For instance, ingestion and indexing might run on one cloud provider while model inference runs on another platform optimized for GPU workloads.
A retrieval index might run in a Kubernetes cluster while LLM inference is handled by a managed GPU service. In such an architecture, orchestration layers coordinate calls between services, retrieving documents from the search index, passing them to the model, and returning responses to the application.
OpenRAG also supports architectures where the retrieval system interacts with multiple services during reasoning. In these configurations, the orchestration layer acts as an agent controller that decides when to retrieve documents, query structured data sources or call external APIs.
The workflow engine might use Langflow to manage multi-step reasoning loops. During a single query, the system might retrieve documents from OpenSearch, query structured data systems, and make multiple calls to the language model before producing a final answer. This architecture supports complex tasks such as investigative research, analytics workflows or multi-step enterprise question answering.
A lighter-weight configuration can be used for experimentation or departmental deployments. In this setup, developers run a simplified OpenRAG stack on a single machine or small cluster. Document ingestion, embedding generation and indexing all run locally, while models are served by a lightweight inference server or accessed through localhost. This architecture is useful for proofs of concept or developer experimentation before scaling to larger distributed environments.
The flexibility of OpenRAG comes from its separation of concerns: document ingestion, retrieval infrastructure, orchestration logic and model inference can each be deployed independently. This modularity allows organizations to design architectures that meet their requirements for data governance, scalability, cost and performance while still using a consistent RAG framework.
OpenRAG has many compelling real-world use cases that it enables where it can be deployed quickly and at scale.
One major use case is the creation of enterprise knowledge assistants that allow employees to query internal documentation with natural language question. These systems typically rely on internal data like technical documentation, engineering manuals, internal wikis and project reports. With OpenRAG documents are processed, parsed and indexed into a searchable knowledge base with Docling.
When a user asks a question, the system retrieves the most relevant passages from the knowledge base and provides them as context to a language model. The model then generates a grounded answer that references the retrieved material. Employees can ask how to configure a system, where a specific policy is documented or what procedures apply to a particular scenario. The system effectively functions as an intelligent search and explanation layer over internal knowledge.
Imagine that a company is running a modified and slightly older version of a commonly used application for data analysis. If a user asks a generic chatbot for instructions on how to generate a report with the application, they’ll get inaccurate information.
This result happens because the internet will likely contain instructions only for the newest and un-modified version of the software. An OpenRAG system will retrieve instructions specific to the internal version and provide a user with accurate instructions and guidance.
Customer support automation and technical help desks are another place that retrieval augmented agents are valuable. Organizations often maintain large knowledge bases consisting of troubleshooting guides, support tickets, product documentation, firmware notes and customer issue histories. These materials are processed and indexed so that OpenRAG retrieves relevant information when a customer or support agent asks a question, and answers include that specific information.
An LLM could be trained on historical support conversations and product terminology to improve response accuracy. Documents retrieved using Docling can supply factual grounding to help the language model synthesize information into clear explanations or step-by-step instructions.
Users might interact with the system through a web chat, a support portal or embedded AI assistants in applications. The system might guide customers through troubleshooting steps, summarize known issues or help agents quickly locate relevant documentation during live support calls.
OpenRAG can be used to build compliance assistants that analyze regulatory documents and internal policy frameworks. Organizations operating in regulated industries must often interpret complex sets of rules that come from government agencies, internal governance teams and industry standards bodies. These documents would be regulations, compliance manuals, risk assessments, audit reports and legal guidance.
The system processes these documents into a knowledge base and indexes them for retrieval. A domain-adapted language model can be fine-tuned on legal or regulatory text so that it better understands the structure and terminology of policies.
Users interact with the system by asking questions such as whether a proposed action complies with internal rules, how a specific regulation applies to a business process or what reporting requirements exist for a specific situation. The assistant retrieves relevant passages from the regulatory corpus and generates explanations that reference the original sources.
Another use case involves research environments that require exploration of large collections of documents. This process can include scientific papers, financial reports, patents or internal analytical studies. A knowledge base could be built from these documents and indexed so that researchers can query them conversationally.
Fine-tuned language models trained on domain literature can help interpret technical content and generate summaries or comparisons between documents. Users could interact with the system through analytical interfaces that allow them to ask complex research questions, request summaries of relevant papers, or compare findings across multiple documents. The system would retrieve relevant sections of reports and generate synthesized insights based on those sources.
OpenRAG facilitates interactions that combine structured data with document retrieval. For example, organizations might maintain datasets, reports, dashboards and analytical notes that describe business performance. The retrieval system indexes explanatory documents and metadata about datasets, while the language model helps interpret the results.
In such a system, users might ask analytical questions about trends, metrics or historical performance. The assistant retrieves raw data using SQL, or generated reports associated with the relevant data, then uses them to generate contextualized explanations. This type of interaction can help non-technical users explore complex data environments by asking questions in natural language rather than writing SQL queries, using spreadsheets or navigating multiple systems.
OpenRAG can power collaborative knowledge interfaces that allow teams to interact with shared knowledge bases during projects. Documents such as design specifications, architecture diagrams, planning notes and research summaries can be indexed and retrieved during discussions. The assistant can provide summaries, extract key insights or help teams compare design alternatives.
A fine-tuned model trained on domain terminology improves access to and understanding of specialized technical content. Users interact with the assistant through chat systems, meeting assistants or embedded tools within development environments. These systems help teams quickly retrieve relevant knowledge and synthesize information from large document collections during collaborative work.
In all these use cases, OpenRAG enables a consistent pattern: documents and data sources are processed into a searchable knowledge base, retrieval systems locate relevant context, LLMs generate grounded responses based on that retrieved information. The specific architecture, models and interaction patterns can vary widely depending on the organization’s data sources, security requirements and application goals.
OpenRAG is an open source project hosted on GitHub with scripts to initialize the project through the Python package manager utility uvx and Docker containers to manage to install Docling and OpenSearch. You can get started running everything on localhost on a laptop or desktop machine or configure OpenRAG on platforms like watsonx®.
Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.
Run your applications, analytics, and generative AI with databases on any cloud.
Successfully scale AI with the right strategy, data, security and governance in place.