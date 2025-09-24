Granite-Docling is purpose-built for accurate and efficient document conversion, unlike most VLM-based approaches to optical character recognition (OCR) that aim to adapt large, general-purpose models to the task. Even at an ultra-compact 258M parameters, Granite-Docling’s capabilities rival those of systems several times its size, making it extremely cost-effective. The model goes well beyond mere text extraction: it handles both inline and floating math and code, excels at recognizing table structure and preserves the layout and structure of the original document. Whereas conventional OCR models convert documents directly to Markdown and lose connection to the source content, Granite-Docling’s unique method of faithfully translating complex structural elements makes its output ideal for downstream RAG applications.

Granite-Docling was developed by the team behind the celebrated open source Docling library, which turned one year old earlier this month. Docling provides tools, models and a command-line interface for document conversion, as well as plug-and-play integration with agentic AI workflows. Whereas the Docling library enables customizable ensemble pipelines, Granite-Docling is a single 258M parameter VLM that parses and processes documents in one shot.

The new Granite-Docling is a product-ready evolution of the experimental SmolDocling-256M-preview model released by IBM Research in partnership with Hugging Face in March 2025. Granite-Docling replaces the SmolLM-2 language backbone used for SmolDocling’s with a Granite 3-based architecture and replaces the SigLIP visual encoder with the updated SigLIP2, but otherwise retains the general methodology of SmolDocling (while exceeding its performance).

Crucially, Granite-Docling addresses certain instabilities present in SmolDocling-256M-preview, such as the occasional tendency to get stuck in loops of repeating the same token at a certain spot of a page. While some imperfections are inevitable from any model, reliable enterprise use at scale requires the confidence that no individual errors will derail the workflow itself. IBM Research mitigated these instabilities for Granite-Docling through extensive dataset filtering and cleaning to remove samples with inconsistent or missing annotations, as well as any samples with irregularities that introduced counterproductive ambiguities.

Like SmolDocling before it, Granite-Docling accurately captures document content and structure at a fraction of the computational requirements of most competitive offerings. Performance evaluations on common document understanding benchmarks are provided in Granite-Docling-258M’s Hugging Face model card.