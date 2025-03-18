Retains context, smart sizing, query-optimization, efficient retrieval
As artificial intelligence (AI) and natural language processing (NLP) continue to advance, efficiently managing large documents has become essential for enhancing information retrieval in gen AI applications. Traditional methods of fixed-size chunking often struggle to maintain the structure of documents, resulting in a fragmented understanding and diminished effectiveness in retrieval-augmented generation (RAG) applications.
Agentic chunking offers a more flexible solution by using AI agents to dynamically divide text into manageable pieces while preserving coherence and context.
Agentic chunking is an AI-based method where agents assess content and establish optimal segmentation strategies instead of depending on static rules. In contrast to fixed-size chunking, which divides text at set intervals, agentic chunking adjusts dynamically according to the semantics of the document. Agentic chunking uses AI-driven text splitting, recursive chunking and chunk overlap to segment text while preserving context, enhancing the performance of models such as GPT and LlamaIndex. It optimizes context windows dynamically and enriches chunks with metadata, improving retrieval and AI model performance.
Agentic chunking plays a crucial role in building effective RAG systems, allowing for better data segmentation that improves both retrieval accuracy and the coherence of responses for various use cases such as building chatbots.
Adaptive chunking strategy: Dynamically choose the best chunking method based on the type of content, the intent behind the query and the needs for retrieval to ensure effective segmentation.
Dynamic chunk sizing: Modifying chunk sizes in real time by considering the semantic structure and context, instead of sticking to fixed token limits.
Context-preserving overlap: Smartly assessing the overlap between chunks to keep coherence intact and avoid losing essential information, thereby enhancing retrieval efficiency.
Agentic chunking offers advantages over traditional chunking:
Retains context: Maintains crucial information without unnecessary breaks.
Smart sizing: Adjusts chunk boundaries according to meaning and significance.
Query-optimized: Continuously refines chunks to match specific queries.
Efficient retrieval: Improves search and RAG by minimizing unnecessary fragmentation.
In this tutorial, you will experiment with agentic chunking strategy by using the IBM Granite-3.0-8B-Instruct model now available on watsonx.ai®. The overall goal is to perform efficient chunking to effectively implement RAG.
You need an IBM Cloud account® to create a watsonx.ai project.
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.
Log in to watsonx.ai by using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.
This step opens a notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite® tutorials, check out the IBM Granite Community. This Jupyter Notebook along with the datasets used can be found on GitHub.
Create a watsonx.ai Runtime service instance (select your appropriate region and choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service instance to the project that you created in watsonx.ai.
To set our credentials, we need the "WATSONX_APIKEY" and "WATSONX_PROJECT_ID" . We will also set the URL serving as the API endpoint.
For this tutorial, we suggest using IBM's Granite-3.0-8B-Instruct model as the LLM to achieve similar results. You are free to use any AI model of your choice. The foundation models available through watsonx can be found here.
This function extracts the text content from IBM's explainer page on machine learning. This function removes unwanted HTML elements (scripts, styles), and returns clean, readable text.
Instead of using a fixed-length chunking method, we used an LLM to split the text based on meaning. This function leverages an LLM to intelligently split text into semantically meaningful chunks based on topics.
Let's print the chunks for better understanding of their output structure.
Great! The chunks were successfully created by the agents in the output.
Now that we have experimented with agentic chunking on the text, let's move along with our RAG implementation.
For this tutorial, we choose the chunks produced by the agents and convert them to vector embeddings. An open source vector store that we can use is Chroma DB. We can easily access Chroma functionality through the langchain_chroma package. Let's initialize our Chroma vector database, provide it with our embeddings model and add our documents produced by agentic chunking.
Create a Chroma vector database
Convert each text chunk into a document object
Add the documents to the vector database.
Now, we can create a prompt template for our LLM. This template ensures that we can ask multiple questions while maintaining a consistent prompt structure. Additionally, we can integrate our vector store as the retriever, finalizing the RAG framework.
Using our agentic chunks in the RAG workflow, let's start a user query. First, we can strategically prompt the model without any additional context from the vector store we built to test whether the model is using its built-in knowledge or truly by using the RAG context. Using the machine learning explainer from IBM, let's ask the question now.
Clearly, the model was not trained on information about the model optimization process and without outside tools or information, it cannot provide us with the correct information. The model hallucinates. Now, let's try providing the same query to the RAG chain with the agentic chunks we built.
Great! The Granite model correctly used the agentic RAG chunks as context to provide us with correct information about the model optimization process while preserving semantic coherence.
In this tutorial, you created chunks by using agentic chunking and built a RAG pipeline. Agentic chunks improved the LLM output by dynamically retrieving, refining and structuring information for more accurate and context-aware responses.
Using the Granite 3.1 model, we successfully produced appropriate model responses to a user query related to the documents provided as context.
The text we used for this RAG implementation was loaded from a blog on ibm.com announcing the release of Granite 3.1. The model provided us with information only accessible through the provided context because it was not part of the model's initial knowledge base.
