Optical character recognition (OCR) is a technology that uses automated data extraction to quickly convert images of text into a machine-readable format.
OCR is sometimes referred to as text recognition. An OCR program extracts and repurposes data from scanned documents, camera images and image-only PDFs. OCR software singles out letters on the image, puts them into words, and then puts the words into sentences, thus enabling access to and editing of the original content. It also eliminates the wasted effort of redundant manual data entry.
OCR systems use a combination of hardware and software to convert physical, printed documents into machine-readable text. Hardware, such as an optical scanner or specialized circuit board, copies or reads text, then software typically handles the advanced processing.
OCR software can take advantage of artificial intelligence (AI) to implement more advanced methods of intelligent character recognition (ICR) for identifying languages or handwriting. Organizations often use the process of OCR to turn printed legal or historical documents into PDF documents so that users can edit, format and search the documents as if created with a word processor.
Learn how to choose the right approach in preparing data sets and employing AI models, plus how to use the model selection framework to balance performance cost, risks and deployment needs.
In 1974, Ray Kurzweil started Kurzweil Computer Products, Inc., whose omni-font OCR product might recognize text printed in virtually any font. He decided that the best application of this technology would be a machine learning (ML) device for the vision-impaired, so he created a reading machine that might read text aloud in a text-to-speech format. In 1980, Kurzweil sold his company to Xerox, which was interested in further commercializing paper-to-computer text conversion.
OCR technology became popular in the early 1990s while digitizing historical newspapers. Since then, the technology has undergone several improvements. Today, products can deliver near-to-perfect OCR accuracy. Advanced methods can automate complex document-processing workflows.
Before OCR technology became available, the only option to digitally format documents was manually reentering the text. Not only is the redundant input time-consuming, but it also comes with inevitable inaccuracies and typing errors. Today, OCR services are widely available to the public. For example, Google Cloud Vision OCR can be used to scan and store documents on your smartphone.
OCR software uses a scanner to reprocess the physical form of a document to editable, digital text. The OCR software can run as a free-standing program, OCR application programming interface or web-based service.
Image acquisition: All document pages are copied and then the OCR engine converts the digital document into a two-color or black-and-white version. The scanned-in image or bitmap is analyzed for light and dark portions. The program then identifies the dark portions as characters that need to be recognized, while light areas are identified as background.
Preprocessing: The digital image is cleaned to remove extraneous pixels. This preprocessing can include deskewing to correct for the image being improperly aligned during scanning, removing graphic rules and boxes that were part of the printed image and determining whether script text is included.
Text recognition: The dark portions are processed to find alphabetic letters, numeric digits or symbols. This stage typically involves targeting one character, word or block of text at a time. Characters are then identified by using one of two algorithms, either pattern recognition or feature recognition.
Layout recognition: A more complete OCR program will also analyze the structure of a document image. It divides the page into elements, such as blocks of text, tables or images. The lines are divided into words and then into characters. After the characters have been singled out, the program compares them with a set of pattern images. After processing all likely matches, the program returns the recognized text.
Postprocessing: The gathered information is stored as a digital file, either in an editable form or PDF. Some systems retain both the input image and the post-OCR versions for easier comparison and more complete document management.
There are 4 types of OCR programs, with increasing sophistication:
Simple OCR: Analysis is character-by-character pattern-matching, comparing scanned characters to the stored glyphs. With so many potential font and language combinations, the types of documents that can be analyzed are limited.
Optical mark recognition (OMR): For identifying checked boxes and other marks, such as bubbles in surveys or a signature on a form, plus logos, symbols and watermarks. All can be identified by matching to stored images, as with simple OCR.
Intelligent character recognition (ICR): As mentioned previously, ICR brings in the power of AI. By using ML or deep learning, the OCR program learns to read just as humans do: through continual practice and training. A neural network reviews text repeatedly looking for distinctive attributes: the locations of curves, intersections, lines and loops.
Intelligent word recognition: This is the natural evolution of the previous ICR recognition, but now the AI has been trained to recognize a word in a single image, making it ultimately faster.
The benefits of employing OCR technology include the ability to:
Cut costs by reducing or eliminating redundant manual input.
Streamline workflows with the input of preprinted documents or written forms and speed research with searchable digital data.
Automate document routing, content processing and preparation for text mining.
Save the cost of storing yet more paper records.
Centralize and secure data sets for protection against fires, break-ins and documents lost in the bank vaults.
Enable greater access to data for visually impaired staff and customers.
Improve service by giving employees the most up-to-date and accurate information.
The most well-known use case for OCR is converting printed paper documents into machine-readable text documents. After a scanned paper document goes through OCR processing, the text of the document can be edited with a word processor such as Microsoft Word or Google Docs. Multiple use cases can accelerate workloads in many industries, including education, finance, healthcare, logistics and transportation, processing and retrieving loan documents, patient records, insurance forms, labels, invoices and receipts.
OCR is often used as a hidden technology, powering many well-known systems and services in our daily lives. Important, but less-known, use cases for OCR technology include data-entry automation, assisting blind and visually impaired persons and indexing documents for search engines, such as passports, license plates, invoices, bank statements, check processing and transcription, business cards and automatic number plate recognition.
OCR enables the optimization of big-data modeling by converting paper and scanned image documents into machine-readable, searchable PDF files. Processing and retrieving valuable information requires first applying OCR in documents where text layers are not already present.
With OCR text recognition, scanned documents can be integrated into a big-data system that is then able to read client data from bank statements, contracts and other important printed documents. Instead of having employees examine countless image documents and manually feed inputs into an automated big-data processing workflow, organizations can use OCR to automate that process at the input stage of data mining. OCR software can extract text seen in pictures, save the text file and support multiple formats, including jpg, jpeg, png, bmp, tiff and pdf.
OCR has advanced significantly past the first business systems in 1974 and progress continues. Superior OCR programs can provide extraction of key insights from documents in suboptimal conditions, such as irregular fonts, insufficient resolution, bad lighting from mobile capture and various colors and backgrounds.
By incorporating computer vision and natural language processing, improved information representation and model optimization, businesses can now enjoy state-of-the-art document understanding. Improvements can include analysis of layout and reading order in complex documents understanding visuals and representing them as charts and diagrams. Some OCR programs now are driven by generative AI to help structure document data even faster. An “old” technology continues to learn new tricks.
Accelerate business decisions and processes with an AI-powered intelligent document understanding and content analysis platform.
Scan or import files to capture and store digital and paper-based information in various formats.
Faster processes and shorter client wait times with AI-powered automation.
If AI enables computers to think, computer vision enables them to see, observe and understand.
Task mining reviews user interaction data, also known as desktop data, to assess the efficiency of a task within a larger process.
OCR can assist with the data processing of structured text, image and video for tasks, such as data enrichment, transformation, validation and cleansing.
Advancing the state-of-the-art in document understanding, OCR now enables models to analyze the layout and reading order in complex documents, and understand visuals and represent them including plots, chart and diagrams.