The way language models process and segment text is changing from the traditional static approach, to a better, more responsive process. Unlike traditional fixed-size chunking , which chunks large documents at fixed points, agentic chunking employs AI-based techniques to analyze content in a dynamic process, and to determine the best way to segment the text.
Agentic chunking makes use of AI-based text-splitting methods, recursive chunking, and chunk overlap methods, which work concurrently to polish chunking ability, preserving links between notable ideas while optimizing contextual windows in real time. With agentic chunking, each chunk is enriched with metadata to deepen retrieval accuracy and overall model efficiency. This is particularly important in RAG applications applications , where segmentation of data can directly impact retrieval quality and coherence of the response. Meaningful context is preserved in all the smaller chunks, making this approach incredibly important to chatbots, knowledge bases, and generative ai (gen ai) use cases. Frameworks like Langchain or LlamaIndex further improve retrieval efficiency, making this method highly effective.
1. Adaptive chunking strategy: Dynamically choose the best chunking method based on the type of content, the intent behind the query and the needs for retrieval to ensure effective segmentation.
2. Dynamic chunk sizing: Modifying chunk sizes in real time by considering the semantic structure and context, instead of sticking to fixed token limits.
3. Context-preserving overlap: Smartly assessing the overlap between chunks to keep coherence intact and avoid losing essential information, thereby enhancing retrieval efficiency.
Agentic chunking offers advantages over traditional chunking:
a. Retains context: Maintains crucial information without unnecessary breaks.
b. Smart sizing: Adjusts chunk boundaries according to meaning and significance.
c. Query-optimized: Continuously refines chunks to match specific queries.
d. Efficient retrieval: Improves search and RAG systems output by minimizing unnecessary fragmentation.
In this tutorial, you will experiment with agentic chunking strategy by using the IBM Granite-3.0-8B-Instruct model now available on watsonx.ai®. The overall goal is to perform efficient chunking to effectively implement RAG.
You need an IBM Cloud account® to create a watsonx.ai project.
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.
Log in to watsonx.ai by using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.
This step opens a notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite® tutorials, check out the IBM Granite Community. This Jupyter Notebook along with the datasets used can be found on GitHub.
Create a watsonx.ai Runtime service instance (select your appropriate region and choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service instance to the project that you created in watsonx.ai.
You will need few libraries and modules for this tutorial. Make sure to import the following ones and if they're not installed, a quick pip installation resolves the problem.
Note, this tutorial was built using Python 3.12.7
To set our credentials, we need the "WATSONX_APIKEY" and "WATSONX_PROJECT_ID" . We will also set the URL serving as the API endpoint.
For this tutorial, we suggest using IBM's Granite-3.0-8B-Instruct model as the LLM to achieve similar results. You are free to use any AI model of your choice. The foundation models available through watsonx can be found here.
This function extracts the text content from IBM's explainer page on machine learning. This function removes unwanted HTML elements (scripts, styles), and returns clean, readable text.
Instead of using a fixed-length chunking method, we used an LLM to split the text based on meaning. This function leverages an LLM to intelligently split text into semantically meaningful chunks based on topics.
Let's print the chunks for better understanding of their output structure.
Great! The chunks were successfully created by the agents in the output.
Now that we have experimented with agentic chunking on the text, let's move along with our RAG implementation.
For this tutorial, we choose the chunks produced by the agents and convert them to vector embeddings. An open source vector store that we can use is Chroma DB. We can easily access Chroma functionality through the langchain_chroma package. Let's initialize our Chroma vector database, provide it with our embeddings model and add our documents produced by agentic chunking.
Create a Chroma vector database
Convert each text chunk into a document object
Add the documents to the vector database.
Now, we can create a prompt template for our LLM. This template ensures that we can ask multiple questions while maintaining a consistent prompt structure. Additionally, we can integrate our vector store as the retriever, finalizing the RAG framework.
Using these agentic chunks in the RAG workflow, let's start a user query. First, we can strategically prompt the model without any additional context from the vector store we built to test whether the model is using its built-in knowledge or truly by using the RAG context. Using the machine learning explainer from IBM, let's ask the question now.
Clearly, the model was not trained on information about the model optimization process and without outside tools or information, it cannot provide us with the correct information. The model hallucinates. Now, let's try providing the same query to the RAG chain with the agentic chunks we built.
Great! The Granite model correctly used the agentic RAG chunks as context to provide us with correct information about the model optimization process while preserving semantic coherence.
In this tutorial, we generated smaller pieces of relevant information using AI agents in the chunking process and constructed a retrieval-augmented generation (RAG) pipeline.
This method improves information retrieval and context window optimization using artificial intelligence and natural language processing (NLP). It streamlines data chunks to enhance retrieval efficiency when leveraging large language models (LLMs) like OpenAI's GPT models for better results.
Get started with building and deploying agents by using watsonx.ai.
Shape generative AI by making contributions to LLMs in an open and accessible way.
Will 2025 be the year of AI agents? On this episode of Mixture of Experts, we review AI models, agents, hardware and product releases with some of the top industry experts.
Learn ways to use AI to be more creative and efficient. Start adapting to a future that involves working closely with AI agents.
Learn the potential opportunities and risks of agentic AI for IT leaders and learn how to prepare for this next wave of AI innovation.
Explore the difference between AI agents and assistants and learn how they can be a gamechanger for enterprise productivity.
Join the community for AI architects and builders to learn, share ideas and connect with others.
Enable developers to build, deploy and monitor AI agents with the IBM watsonx.ai studio.
Create breakthrough productivity with one of the industry's most comprehensive set of capabilities for helping businesses build, customize and manage AI agents and assistants.
Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.