How IBM’s latest research is leading the optical character recognition (OCR) revolution and pushing the boundaries of capabilities.

Documents have always been (and continue to be) a significant data source for any business or corporation. It’s crucial to be able to scan and digitize physical documents to extract their information and represent them in a way that allows for further analysis (e.g., for a mortgage or loan process for a bank) no matter how the data is captured. Even for documents created digitally (e.g., PDF documents) the process of extracting information can be a challenge.

At IBM, we are treating this as a multi-disciplinary challenge spanning across computer vision, natural language understanding, information representation and model optimization. With this approach, we are advancing the state-of-the-art in document understanding, which allows our models to analyze the layout and reading order in complex documents and understand visuals and represent them in multimodality manners that understand plots, chart and diagrams.

This work led to the new enhanced optical character recognition (OCR) IBM has created to digitize important, valuable business documents more easily and accurately for the enterprise to extract information for analysis.

Cleaner and more accurate extraction creates multiple benefits, including the following:

  • Accelerated workflows
  • Automated document routing and content processing
  • Reduced costs
  • Superior data security
  • Disaster recovery

Also, there are a variety of use cases that utilize optical character recognition technology that will benefit from the enhancements being made by IBM. From data extraction to automating big data processing workflows, OCR powers many systems and services used every day.

Document understanding

Document understanding is the ability to read these business documents—either programmatically or by OCR—and interpret their content so it can take part in an automatic business process. An example of an automatic business process utilizing OCR would be insurance automated claims processing, where data is extracted from ID cards, claims forms and claim descriptions, among others.

To perform the digitization of documents, optical character recognition (OCR) is utilized. OCR is composed of two stages:

  • Detection: Localize the various words in the document.
  • Recognition: Identify the comprising characters in the detected words.

This means that with OCR, we know where the words are on the document and what those words are. However, when using OCR, challenges arise when documents are captured under any number of non-ideal conditions. This can include incorrect scanner settings, insufficient resolution, bad lighting (e.g., mobile capture), loss of focus, unaligned pages and added artifacts from badly printed documents.

Our team focused on these two challenging areas to address how the next generation of OCR technology can detect and extract data from low-quality and natural-scene image documents.

Better training and accuracy

Imagine for a moment that you are going to build a computer vision system for reading text in documents or extracting structure and visual elements. To train this system, you will undoubtedly need a lot of data that has to be correctly labeled and sanitized for human errors. Furthermore, you might realize that you require a different granularity of classes to train a better model—but acquiring new labeled data is costly. The cost will likely force you to make some compromises or use a narrower set of training regimens which may affect accuracy.

But what if you could quickly synthesize all of the data you need? How would that affect the way you approach the problem?

Synthetic data is at the core of our work in document understanding and our high-accuracy technology. As we developed our OCR model, we required significant amounts of data—data that is hard to acquire and annotate. As a result, we created new methods to synthesize data and apply optimization techniques to increase our architecture accuracy given that the synthetic data can be altered.

Now we are synthesizing data for object segmentation, text recognition, NLP-based grammatical correction models, entity grouping, semantic classification and entity linkage.

Another advantage of synthetic data generation is the ability to control the granularity and format of the labels, including different colors, font, font sizes, background noise, etc. This enables us to design architectures that can recognize punctuation, layout, handwritten characters and form elements.

By leveraging synthetic data to train models mentioned previously, we’re excited to announce this effort has resulted in a major update to our core OCR model, providing a significant boost in accuracy and lower processing time.

Higher-level document understanding

Not all documents within an enterprise are of equal value. For example, business documents are central to the operation of business and are at the heart of digital transformation. Such documents include contracts, loan applications, invoices, purchase orders, financial statements and many more. The information in these business documents is presented in natural language and is unstructured. Understanding these documents poses a change due to the complex document layout and the poor-quality scans.

Now with IBM’s latest OCR technology, these critical documents can be read and the key information contained within can be extracted.


As data continues to provide the key insights enterprises need to analyze their business, understand their customers and automate workflows, document-understanding technology like optical character recognition (OCR) is more important than ever.

IBM’s latest research is leading the OCR revolution by pushing the boundaries of OCR capabilities and raising the standard for OCR in the development community. We’re committed to improving our product and providing our customers with the highest level of performance and accuracy possible.

This new OCR technology is being rolled out across all IBM products utilizing OCR and will allow users to digitize important, valuable business documents more easily and accurately for the enterprise to extract information for analysis.

To learn more, check out the documentation and release notes.

The new OCR technology is already available in IBM Watson Discovery—try it out and get started today.

Was this article helpful?

More from Automation

What you need to know about the CCPA draft rules on AI and automated decision-making technology

9 min read - In November 2023, the California Privacy Protection Agency (CPPA) released a set of draft regulations on the use of artificial intelligence (AI) and automated decision-making technology (ADMT). The proposed rules are still in development, but organizations may want to pay close attention to their evolution. Because the state is home to many of the world's biggest technology companies, any AI regulations that California adopts could have an impact far beyond its borders.  Furthermore, a California appeals court recently ruled that…

Enhancing triparty repo transactions with IBM MQ for efficiency, security and scalability

3 min read - The exchange of securities between parties is a critical aspect of the financial industry that demands high levels of security and efficiency. Triparty repo dealing systems, central to these exchanges, require seamless and secure communication across different platforms. The Clearing Corporation of India Limited (CCIL) recently recommended (link resides outside IBM® MQ as the messaging software requirement for all its members to manage the triparty repo dealing system. Read on to learn more about the impact of IBM MQ…

How an AI Gateway provides leaders with greater control and visibility into AI services 

2 min read - Generative AI is a transformative technology that many organizations are experimenting with or already using in production to unlock rapid innovation and drive massive productivity gains. However, we have seen that this breakneck pace of adoption has left business leaders wanting more visibility and control around the enterprise usage of GenAI. When I talk with clients about their organization’s use of GenAI, I ask them these questions: Do you have visibility into which third-party AI services are being used across…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters