Skip to main content
In this recipe, you’ll learn how to harness the power of advanced tools to build an AI-powered multimodal RAG pipeline. This tutorial will guide you through the following processes:
  1. Document preprocessing: Learn how to handle documents from various sources, parse and transform them into usable formats and store them in vector databases by using Docling. You will use a Granite LLM to generate image descriptions of images in the documents.
  2. RAG: Understand how to connect LLMs such as Granite with external knowledge bases to enhance query responses and generate valuable insights.
  3. LangChain for workflow integration: Discover how to use LangChain to streamline and orchestrate document processing and retrieval workflows, enabling seamless interaction between different components of the system.
This recipe uses three cutting-edge technologies:
  • Docling: An open-source toolkit used to parse and convert documents.
  • Granite: A state-of-the-art MLLM that provides robust natural language capabilities and a vision language model that provides image to text generation.
  • LangChain: A powerful framework used to build applications powered by language models, designed to simplify complex workflows and integrate external tools seamlessly.
You will need a Replicate API token to run this recipe in Colab. Instructions for obtaining this credential can be found here.

Get started

Explore sample code in a GitHub repo
https://mintcdn.com/ibmgranite/m3dncz2KrKeb3pcV/granite/docs/images/icons8-google-colab.svg?fit=max&auto=format&n=m3dncz2KrKeb3pcV&q=85&s=fb39ef667c012d0fcef53599b6c5c0fd

Try it out

Execute sample code in Colab