January 5, 2022 By IBM Cloud Education 4 min read

Optical character recognition (OCR) technology is an efficient business process that saves time, cost and other resources by utilizing automated data extraction and storage capabilities.

Optical character recognition (OCR) is sometimes referred to as text recognition. An OCR program extracts and repurposes data from scanned documents, camera images and image-only pdfs. OCR software singles out letters on the image, puts them into words and then puts the words into sentences, thus enabling access to and editing of the original content. It also eliminates the need for manual data entry.

OCR systems use a combination of hardware and software to convert physical, printed documents into machine-readable text. Hardware — such as an optical scanner or specialized circuit board — copies or reads text; then, software typically handles the advanced processing.

OCR software can take advantage of artificial intelligence (AI) to implement more advanced methods of intelligent character recognition (ICR), like identifying languages or styles of handwriting. The process of OCR is most commonly used to turn hard copy legal or historical documents into pdf documents so that users can edit, format and search the documents as if created with a word processor.

The history of optical character recognition

In 1974, Ray Kurzweil started Kurzweil Computer Products, Inc., whose omni-font optical character recognition (OCR) product could recognize text printed in virtually any font. He decided that the best application of this technology would be a machine-learning device for the blind, so he created a reading machine that could read text aloud in a text-to-speech format. In 1980, Kurzweil sold his company to Xerox, which was interested in further commercializing paper-to-computer text conversion.

OCR technology became popular in the early 1990s while digitizing historical newspapers. Since then, the technology has undergone several improvements. Today’s solutions have the abilitiy to deliver near-to-perfect OCR accuracy. Advanced methods are used to automate complex document-processing workflows. Before OCR technology was available, the only option to digitally format documents was to manually retype the text. Not only was this time-consuming, but it also came with inevitable inaccuracies and typing errors. Today, OCR services are widely available to the public. For example, Google Cloud Vision OCR is used to scan and store documents on your smartphone.

How does optical character recognition work?

Optical character recognition (OCR) uses a scanner to process the physical form of a document. Once all pages are copied, OCR software converts the document into a two-color or black-and-white version. The scanned-in image or bitmap is analyzed for light and dark areas, and the dark areas are identified as characters that need to be recognized, while light areas are identified as background. The dark areas are then processed to find alphabetic letters or numeric digits. This stage typically involves targeting one character, word or block of text at a time. Characters are then identified using one of two algorithms — pattern recognition or feature recognition.

Pattern recognition is used when the OCR program is fed examples of text in various fonts and formats to compare and recognize characters in the scanned document or image file.

Feature detection occurs when the OCR applies rules regarding the features of a specific letter or number to recognize characters in the scanned document. Features include the number of angled lines, crossed lines or curves in a character. For example, the capital letter “A” is stored as two diagonal lines that meet with a horizontal line across the middle. When a character is identified, it is converted into an ASCII code (American Standard Code for Information Interchange) that computer systems use to handle further manipulations.

An OCR program also analyzes the structure of a document image. It divides the page into elements such as blocks of texts, tables or images. The lines are divided into words and then into characters. Once the characters have been singled out, the program compares them with a set of pattern images. After processing all likely matches, the program presents you with the recognized text.

The benefits of optical character recognition

The main benefit of optical character recognition (OCR) technology is that it simplifies the data-entry process by creating effortless text searches, editing and storage. OCR allows businesses and individuals to store files on their computers, laptops and other devices, ensuring constant access to all documentation.

The benefits of employing OCR technology include the following:

  • Reduce costs
  • Accelerate workflows
  • Automate document routing and content processing
  • Centralize and secure data (no fires, break-ins or documents lost in the back vaults)
  • Improve service by ensuring employees have the most up-to-date and accurate information

Optical character recognition use cases

The most well-known use case for optical character recognition (OCR) is converting printed paper documents into machine-readable text documents. Once a scanned paper document goes through OCR processing, the text of the document can be edited with a word processor like Microsoft Word or Google Docs.

OCR is often used as a hidden technology, powering many well-known systems and services in our daily life. Important — but less-known — use cases for OCR technology include data-entry automation, assisting blind and visually impaired persons and indexing documents for search engines, such as passports, license plates, invoices, bank statements, business cards and automatic number plate recognition.

OCR enables the optimization of big-data modeling by converting paper and scanned image documents into machine-readable, searchable pdf files. Processing and retrieving valuable information cannot be automated without first applying OCR in documents where text layers are not already present.

With OCR text recognition, scanned documents can be integrated into a big-data system that is now able to read client data from bank statements, contracts and other important printed documents. Instead of having employees examine countless image documents and manually feed inputs into an automated big-data processing workflow, organizations can use OCR to automate at the input stage of data mining. OCR software can identify the text in the image, extract text in pictures, save the text file and support jpg, jpeg, png, bmp, tiff, pdf and other formats.

Optical character recognition and IBM

As a leader in global technology, IBM is constantly producing new and improved software applications for both business and personal use. Over the decades, IBM has improved upon its optical character recognition capability by combining it with artificial intelligence (AI).

Simply creating templates of documents is no longer sufficient because enterprises want insights, as well. Combining AI and OCR together is proving to be a winning strategy for data capture, while recognition software is simultaneously collecting information and comprehending the content. In practice, this means that AI tools can check for mistakes independent of a human user, providing streamlined fault management and saving time.

IBM Cloud Pak® for Business Automation, IBM’s leading offering for document processing, also helps take your automation a step further by infusing artificial intelligence (AI). Its features are designed to improve both your internal processes and your customers’ experiences.

To get more insights into document processing, optical character recognition, automation and the latest in AI, subscribe to the IBM Business Automation Insider. Learn how the latest products work, implement best practices and maximize your tech investments.

Was this article helpful?
YesNo

More from Automation

Understanding glue records and Dedicated DNS

3 min read - Domain name system (DNS) resolution is an iterative process where a recursive resolver attempts to look up a domain name using a hierarchical resolution chain. First, the recursive resolver queries the root (.), which provides the nameservers for the top-level domain(TLD), e.g.com. Next, it queries the TLD nameservers, which provide the domain’s authoritative nameservers. Finally, the recursive resolver  queries those authoritative nameservers.   In many cases, we see domains delegated to nameservers inside their own domain, for instance, “example.com.” is delegated…

Using dig +trace to understand DNS resolution from start to finish

2 min read - The dig command is a powerful tool for troubleshooting queries and responses received from the Domain Name Service (DNS). It is installed by default on many operating systems, including Linux® and Mac OS X. It can be installed on Microsoft Windows as part of Cygwin.  One of the many things dig can do is to perform recursive DNS resolution and display all of the steps that it took in your terminal. This is extremely useful for understanding not only how the DNS…

The future of application delivery starts with modernization

5 min read - IDC estimates that 750 million cloud native will be built by 2025. Where and how these applications are deployed will impact time to market and value realization. The reality is that application landscapes are complex, and they challenge enterprises to maintain and modernize existing infrastructure, while delivering new cloud-native features. Three in four executives reported disparate systems in their organizations and that a lack of skills, resources and common operational practices challenge business objectives. Executives know they must modernize. In…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters