DeepSeek isn’t the only company releasing open-source models to unlock value in enterprise data, said El Maghraoui. IBM’s Docling, launched more than a year ago, tackles the challenge of making documents machine-readable, but from a different angle. “Where DeepSeek-OCR shines in raw OCR throughput and token efficiency, Docling excels in end-to-end conversion and structure,” she said.

Whilst it might seem that DeepSeek-OCR and Docling would compete with one another for developers’ attention, the opposite is in fact true. DeepSeek-OCR and Docling could be highly “complementary,” El Maghraoui said. “DeepSeek-OCR could serve as a high-speed, high-accuracy text extraction front-end, while Docling could take those outputs and enrich them with formatting, tables and semantic structure for downstream analytics or enterprise workflows.”

Most experts, including IBM Master Inventor Martin Keen, agree that the new DeepSeek model is as intriguing for what it’s accomplished as for the new questions it raises. “Is there actually some utility in having large language models mimic human memory a bit more? Is there a reason that we evolved to remember things in the present very well, and then just to remember things more in abstract as time goes on?” he asked on a recent Mixture of Experts episode.

In addition to improving efficiency, teaching models to forget can also help remove specific unwanted data or behavior. From Google to IBM, researchers are working on a variety of projects teaching LLMs to forget or “unlearn” sensitive, untrusted or copyrighted data, using techniques that are faster than retraining models from scratch. Building on this unlearning work, DeepSeek’s popular OCR model further emphasizes that teaching LLMs to forget and mimicking the human brain more closely could be one of AI’s next frontiers.