This article was featured in the Think newsletter. Get it in your inbox.
Using a novel technique to improve AI’s ability to remember by mimicking human memory, a new model from AI research lab DeepSeek has gone viral. DeepSeek-OCR garnered 4,000 stars on GitHub within the first 24 hours and has stayed atop Hugging Face’s most popular model list for well over a week.
Instead of feeding text into the model as tokens, as most models do, DeepSeek-OCR converts documents into images, which the lab’s research shows could be more efficient than text tokens for training LLMs. The approach is “innovative,” said Kaoutar El Maghraoui, a Principal Research Scientist at IBM, in an interview with IBM Think. Using “vision token compression to convert document images into compact tokens helps slash context length and cost significantly,” she said. For enterprises, this could mean a cheaper way to harness more of AI’s power.
Using “vision tokens” instead of text tokens could dramatically cut down on the time and cost enterprises need to process large documents. For every 10 text tokens, for example, the DeepSeek-OCR model only needs one “vision token” to represent the same information with 97% accuracy, according to a technical paper accompanying the model’s release. Putting this higher accuracy with fewer vision tokens into perspective, DeepSeek-OCR can process 200k+ pages per day on a single GPU, according to the researchers. A typical LLM might be able to process several thousand pages a day using at least twenty or more GPUs. For this reason, DeepSeek-OCR addresses a common pain point for AI developers, El Maghraoui said: handling long, multi-page documents efficiently.
Using visual tokens to train LLMs has caught the attention of the broader research community, including one of OpenAI’s founders, Andrej Karpathy. On X, Karpathy wrote that what he found most interesting about DeepSeek-OCR was the suggestion that “pixels” could potentially be “better inputs to LLMs than text.”
In addition to the efficient use of visual tokens, a second way the model achieves unprecedented efficiency, the researchers wrote, is by mimicking human memory and “forgetting” certain details over time. The “optical compression method enables a form of memory decay that mirrors biological forgetting ... Recent information maintains high fidelity, while distant memories naturally fade through increased compression ratios.”
DeepSeek isn’t the only company releasing open-source models to unlock value in enterprise data, said El Maghraoui. IBM’s Docling, launched more than a year ago, tackles the challenge of making documents machine-readable, but from a different angle. “Where DeepSeek-OCR shines in raw OCR throughput and token efficiency, Docling excels in end-to-end conversion and structure,” she said.
Whilst it might seem that DeepSeek-OCR and Docling would compete with one another for developers’ attention, the opposite is in fact true. DeepSeek-OCR and Docling could be highly “complementary,” El Maghraoui said. “DeepSeek-OCR could serve as a high-speed, high-accuracy text extraction front-end, while Docling could take those outputs and enrich them with formatting, tables and semantic structure for downstream analytics or enterprise workflows.”
Most experts, including IBM Master Inventor Martin Keen, agree that the new DeepSeek model is as intriguing for what it’s accomplished as for the new questions it raises. “Is there actually some utility in having large language models mimic human memory a bit more? Is there a reason that we evolved to remember things in the present very well, and then just to remember things more in abstract as time goes on?” he asked on a recent Mixture of Experts episode.
In addition to improving efficiency, teaching models to forget can also help remove specific unwanted data or behavior. From Google to IBM, researchers are working on a variety of projects teaching LLMs to forget or “unlearn” sensitive, untrusted or copyrighted data, using techniques that are faster than retraining models from scratch. Building on this unlearning work, DeepSeek’s popular OCR model further emphasizes that teaching LLMs to forget and mimicking the human brain more closely could be one of AI’s next frontiers.
Industry newsletter
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.