Documentation Index
Fetch the complete documentation index at: https://wwwpoc.ibm.com/llms.txt
Use this file to discover all available pages before exploring further.
Model Collection
View the full Granite Vision collection on Hugging Face
Run locally with Ollama
Download and run Granite Vision with Ollama
Demo
Check out Granite Vision in action
Multimodal RAG
Try out this Granite x Docling Multimodal RAG recipe
Overview
The Granite Vision models are designed for enterprise applications, specializing in visual document understanding. They are capable of performing a wide range of tasks, including extracting information from tables, charts, diagrams, sketches, and infographics, as well as general image analysis. The family of vision models also includes Granite Vision Embedding, a novel multimodal embedding model for document retrieval. It enables queries on documents containing tables, charts, infographics, and complex layouts. By eliminating the need for text extraction, Vision Embedding simplifies and accelerates retrieval-augmented generation (RAG) pipelines. Despite its lightweight architecture, Granite Vision 4.1 achieves excellent performance on chart, table, and semantic key-value pair (KVP) extraction. For detailed performance metrics and full evaluation results, see the Granite Vision 4.1 model card. Granite Vision models are released under the Apache 2.0 license, making them freely available for both research and commercial purposes, with full transparency into their training data. Granite Vision Paper Granite Vision 4 Technical BlogGetting started
Follow the steps below to get started with Granite Vision 4.1 4B. This model is optimized for chart, table, and key-value pair extraction from enterprise documents.Setup
Tested with python=3.11Usage with Transformers
Chart and Table Tasks
You can pass tags and the chat template handles the rest:Key-Value Pair Extraction (KVP)
For KVP extraction use the VAREX prompt format. Provide a JSON Schema describing the fields to extract and the model will return a JSON object with the extracted values.Usage with vLLM
Granite Vision 4.1 is supported natively in vLLM as of commit d249a9e. Until an official release ships, install vLLM from source:Serving with native LoRA
The model ships as a LoRA adapter on top of Granite 4.1 Micro. vLLM applies the adapter automatically for image requests while text-only requests use the base model:Client example
Query the running server using the OpenAI-compatible API:Usage with Docling
Docling integrates Granite Vision for document conversion pipelines:- Table structure recognition — use Granite Vision instead of the default TableFormer model (pip install docling[vlm])
- Chart data extraction — extract structured data from bar, pie, and line charts (pip install docling[granite_vision])