Published: 21 August 2024
Contributors: Vanna Winland, Erika Russi
LlamaIndex is an open source data orchestration framework for building large language model (LLM) applications. LlamaIndex is available in Python and TypeScript and leverages a combination of tools and capabilities that simplify the process of context augmentation for generative AI (gen AI) use cases through a Retrieval-Augmented (RAG) pipeline.
Learn about barriers to AI adoptions, particularly lack of AI governance and risk management solutions.
Register for the guide on foundation models
LlamaIndex allows users to curate and organize their private or custom data through data integration and context augmentation.
Context augmentation is when data is provided to the LLM context window, essentially augmenting LLMs with private or external data.
Widely used open source LLMs are pretrained on large amounts of public data. These large-scale LLMs help solve many real-world problems. However, they take a considerable amount of time and resources to train for a specific use case. Additionally, a model’s internal data is only as up to date through the time during which it was pretrained. Depending on the model, context augmentation is necessary for models to reflect real-time awareness of current events.
Foundation models are gaining popularity as they are flexible, reusable AI models that can be applied to nearly any domain or task. Foundation models similar to the IBM Granite™ series are trained on curated data, however whether the model is trained on a large data set or a pretrained foundation model, they will likely need to be trained on external sources of domain-specific data. This process is facilitated by a system framework that connects the LLM to private data to tailor it to the use case or overall goal of the application. The part of the application stack that facilitates connecting LLMs to custom data sources are data frameworks such as LlamaIndex.
Data comes from many sources in many formats; it is up to the data framework to ingest, transform and organize the data for LLMs to use. Data is often siloed and unstructured. To obtain and structure it, a data framework such as LlamaIndex, must run the data through a process commonly called the ingestion pipeline.
Once the data is ingested and transformed into a format the LLM can use, the next step is to convert the information into a data structure for indexing. The common procedure is converting unstructured data into vector embeddings. This process is called “creating an embedding” in natural language processing (NLP) but is referred to as “indexing” in data terminology.1 Indexing is necessary because it enables the LLM to query and retrieve the ingested data by the vector index. The data can be indexed according to the chosen querying strategy.
Data integration facilitates context augmentation by integrating private data into the context window or “knowledge base” of the LLM. IBM’s Granite 3B and 8B models’ context window length was recently extended to 128,000 tokens.2 A larger context window enables the model to retain more text in its working memory, which enhances its ability to track key details throughout extended conversations, lengthy documents and codebases. This capability allows LLM-chatbots to produce responses that are coherent both in the short term and over a more extended context.
However, even with an extended context window, a fine-tuned model might incur considerable costs in both training and inference. Fine-tuning models with specific or private data requires data transformations and systems that promote efficient data retrieval methods for LLM prompting. RAG methodology is seen as a promising option to facilitate long-context language modeling.3
RAG is one of the most known and used methods for context augmentation. RAG enables LLMs to leverage a specialized knowledge base, enhancing LLMs’ ability to provide more precise answers to questions.4 The typical retrieval augmentation process is performed with three steps:
Data frameworks such as LlamaIndex streamline the data ingestion and retrieval process by providing comprehensive API calls for every step in the RAG pattern. This process is powered by the concept of a query engine that allows users to ask questions over their data. Querying over external data and prompting LLMs with contextual knowledge makes it possible to create domain-specific LLM applications.
LlamaIndex uses RAG to add and connect external data to the datapool that LLMs already have access to. Applications including query engines, chatbots and agents use RAG techniques to complete tasks.5
LlamaIndex’s workflow can be broken down into a few steps:
The goal of this workflow is to help ingest, structure and allow LLMs to access private or domain-specific data. Having access to more relevant data allows LLMs to respond more accurately to prompts whether it be for building chatbots or query engines.
Data ingestion or “loading” is the first step to connecting external data sources to an LLM. LlamaIndex refers to data ingestion as loading the data for the application to use. Most private or custom data can be siloed in formats such as application programming interfaces (APIs), PDFs, images, structured query language databases (SQL) and many more. LlamaIndex can load in over 160 different data formats including structured, semistructured and unstructured datasets.
Data connectors, also known as data “loaders,” fetch and ingest data from its native source. The gathered data is transformed into a collection of data and metadata. These collections are called “Documents” in LlamaIndex. Data connectors, or “Readers” in LlamaIndex, ingest and load various data formats. LlamaIndex features an integrated reader that converts all files in each directory into documents, including Markdown, PDFs, Word Documents, PowerPoint presentation, images, audio files and videos.6 To account for other data formats not included in the built-in functionality, data connectors are available through LlamaHub, a registry of open source data loaders. This step in the workflow builds a knowledge base to form indexes over the data so it can be queried and used by LLMs.
Once the data has been ingested, the data framework needs to transform and organize the data into a structure that can be retrieved by the LLM. Data indexes structure the data into representations that LLMs can use. LlamaIndex offers several different index types that are complementary to the application’s querying strategy, including: a vector store index, a summary index and a knowledge graph index.
Once the data has been loaded and indexed, the data can be stored. LlamaIndex supports many vector stores that vary in architecture, complexity and cost. By default, LlamaIndex stores all indexed data only in memory.
Vector store index
The vector store index is commonly used in LLM applications following the RAG pattern because it excels at handling natural language queries. Retrieval accuracy from natural language queries depends on semantic search, or search based on meaning rather than keyword matching.7 Semantic search is enabled by converting input data into vector embeddings. A vector embedding is a numerical representation of the semantics of the data that the LLM can process. The mathematical relationship between vector embeddings allows LLMs to retrieve data based on the meaning of the query terms for more context-rich responses.
The `VectorStoreIndex` takes the data collections or `Document` objects and splits them into `Node`s which are atomic units of data that represents a “chunk” of the source data (`Document`). Once the data collection is split into chunks, vector embeddings of each chunk of data are created. The integrated data is now in a format that the LLM can run queries over. These vector indexes can be stored to avoid reindexing. The simplest way to store data is to persist it to disk, but LlamaIndex integrates with multiple vector databases and embedding models.8
To search embeddings, the user query is first converted into a vector embedding. Next, a mathematical process is used to rank all embeddings based on their semantic similarity to the query.
The vector store index uses top-k semantic retrieval to return the most similar embeddings as their corresponding chunks of text. LlamaIndex is designed to facilitate the creation and management of large-scale indexes for efficient information retrieval.
The final step in the workflow is the implementation of query engines to handle prompt calls to LLMs. Querying consists of three distinct stages, retrieval, postprocessing and response synthesis. Retrieval is when the most relevant documents are fetched and returned from the vector index. Postprocessing is when the embedding chunks or `Nodes` retrieved are optionally refitted by reranking, transforming or filtering. Response synthesis is when the most-relevant data and prompt are combined and sent to the LLM to return a response.
Query engines
Query engines allow users to ask questions over their data, taking in a natural language query and returning a context-rich response.9 Query engines are composed of one or many indexes and retrievers. Many query engines can be used at the same time for data collections with multiple index types. LlamaIndex offers several different query engines for structured and semistructured data, for example a JSON query engine for querying JSON documents.
Data agents are LLM-driven AI agents capable of performing a range of tasks on data, including both reading and writing functions.10 LlamaIndex data agents are LLM-powered knowledge works that can perform the following:
AI agents can interact with their external environment through a set of APIs and tooling. LlamaIndex has support for OpenAI Function agent (built on top of the OpenAI Function API) and a ReAct agent. The core components of data agents are “a reasoning loop” or the agent’s reasoning paradigm and “tool abstractions” that are the tools themselves.
Agents use a reasoning loop, or paradigm, to solve multistep problems. In LlamaIndex, both OpenAI Function and ReAct agents follow a similar pattern to decide which tools to use, as well as the sequence and the parameters to call each tool. The pattern of reasoning is called ReAct or reasoning and action. This process can either be a simple one-step tool selection process or a more complex process where several tools are picked at each step.
Tool abstractions outline how tools are accessed and used by the agent. LlamaIndex offers tools and ToolSpecs, a Python class that represents a full API specification that an agent can interact with. The base tool abstraction defines a generic interface that can take in a series of arguments and return a generic tool output container that can capture any response. LlamaIndex provides tool abstractions that wrap around the existing data query engines and abstractions for functions that can be used by the tool spec class.
A tool spec enables users to define complete services rather than individual tools that handle specific tasks. For example, the Gmail tool spec allows the agent to both read and draft emails.11 Here is an example of how that would roughly be defined:
Each function is converted into a tool, using the `FunctionTool` abstraction.
LlamaIndex utilizes LlamaHub’s Tool Repository, consisting of 15+ Tool Specs for agents to interact with. This list consists of services meant to enhance and enrich the agents’ capability to perform different actions. Here are some of several specs included in the repository:
Utility tools
LlamaIndex offers utility tools that can augment the capabilities of existing Tools including:
LlamaIndex works with open source foundation models such as the IBM Granite™ series, Llama2 and OpenAI and other LLM frameworks including LangChain and Ollama. LLMs can be used in a myriad of ways as stand-alone modules or plugged into other core LlamaIndex modules.12 LLMs can also be used to drive AI agents to act as knowledge workers that follow autonomous workflows.
LlamaIndex has expanded its functionality to include the use of LLM-driven AI agents to act as knowledge workers. LlamaIndex AI agents following a ReAct (reasoning and action) behavioral pattern. ReAct agents follow a reasoning paradigm that instructs the agent to plan and reason through while iteratively improving upon responses. This type of AI agent can be interpreted as a form of chain-of-thought prompting. Agents are able to use tooling to integrate with other LLMs. For example, the LlamaHub collection of data connecter and agent tools.
LlamaIndex offers several use case examples in their docs with links to tutorials.13
Learn more about generative AI
Learn more about LLMs in this explainer article
Learn more about how to clean and store data for consistent accessibility
Learn more about how models process data with vector embedding
Learn more about the open source orchestration framework LangChain
Learn more about prompt engineering
1 Elena Lowery, “Use Watsonx.Ai with LlamaIndex to Build Rag Applications,” Use Watsonx.ai with LlamaIndex to build RAG applications, May 28, 2024, https://community.ibm.com/community/user/watsonx/blogs/elena-lowery/2024/05/28/use-watsonxai-with-llamaindex-to-build-rag-applica.
2 Matt Stallone et al., “Scaling Granite Code Models to 128K Context,” arXiv.org, July 18, 2024, https://arxiv.org/abs/2407.13739 (link resides outside ibm.com).
3 Matt Stallone et al., “Scaling Granite Code Models to 128K Context.”
4 Kim Martineau, “What Is Retrieval-Augmented Generation (Rag)?,” IBM Research, May 1, 2024, https://research.ibm.com/blog/retrieval-augmented-generation-RAG.
5 “High-Level Concepts,” LlamaIndex, https://docs.llamaindex.ai/en/stable/getting_started/concepts/ (link resides outside ibm.com).
6 “Loading Data (Ingestion),” LlamaIndex, https://docs.llamaindex.ai/en/stable/understanding/loading/loading/ (link resides outside ibm.com).
7 Elena Lowery, “Use Watsonx.Ai with LlamaIndex to Build Rag Applications.”
8 Elena Lowery, “Use Watsonx.Ai with LlamaIndex to Build Rag Applications.”
9 “Query Engine,” LlamaIndex, https://docs.llamaindex.ai/en/latest/module_guides/deploying/query_engine/ (link resides outside ibm.com).
10 Jerry Liu, “Data Agents,” Medium, July 13, 2023, https://medium.com/llamaindex-blog/data-agents-eed797d7972f (link resides outside ibm.com).
11 “Google,” LlamaIndex, https://docs.llamaindex.ai/en/stable/api_reference/tools/google/ (link resides outside ibm.com).
12 “Using LLMs,” LlamaIndex, https://docs.llamaindex.ai/en/latest/module_guides/models/llms/ (link resides outside ibm.com).
13 “Use Cases,” LlamaIndex, https://docs.llamaindex.ai/en/latest/use_cases/ (link resides outside ibm.com).
14 “Question-Answering (RAG),” LlamaIndex, https://docs.llamaindex.ai/en/stable/use_cases/q_and_a/ (link resides outside ibm.com).
15 “Agents,” LlamaIndex, https://docs.llamaindex.ai/en/stable/use_cases/agents/ (link resides outside ibm.com).