What is IBM Automation Document Processing?

Automation Document Processing provides the capabilities that help you build an AI-powered data enrichment tool for document processing and storage.

Manual document processing is a major obstacle for many enterprises, bringing digital transformation initiatives to a halt and demanding time and resources. With the amount of data and documents continuing to grow exponentially, it’s more important than ever to automate your document processing. IBM Automation Document Processing combines AI and deep learning with low-code tooling to help you design, configure, and deploy a solution for document classification and data extraction.

Labor-intensive manual processing of documents is instead handled by the document processing application. The document processing user can quickly catch and correct issues on documents or batches of documents that have already been categorized and extracted. And when the processing completes, your documents and your data are stored and ready for use by downstream applications.

How does Automation Document Processing work?

Document Processing Designer
You use the Designer interface to create a set of document types and related fields that comprise your Document Processing project. Document Processing Designer combines an intuitive interface with a set of AI and deep learning tools that identify and learn the document types that matter to your organization. For each document type, you designate which pieces of information to extract as data for that document to be used by downstream applications. You can also apply tools to clean up and standardize the data as it is extracted.

You choose which documents you want to process, for example, invoices. You collect multiple electronic samples of invoices to create a document classification model. The Document Processing Designer uses this set to train the model to recognize documents as invoices.

An invoice can contain many data points. But what are the important pieces of data that are useful to you for records, searching, and integration with downstream applications?
  • Total amount
  • Vendor
  • Date of transaction
  • Product ID
  • Customer name
  • Invoice number

You use Document Processing Designer to create a data extraction model by teaching the field location on the document, naming the field, and creating a method for collecting and enriching the value of the field for each document. Again, you use sample documents to train the data extraction model.

After you create and train your classification and extraction models, you formalize your project in two ways:
  • Determining which document types to deploy to your repository object store for longer term storage.
  • Determining the property types that tag your document types in the object store, and their values.

To make it easier for your data to be used by downstream integrations, you create data definitions for these fields, or map them to existing data definitions. This provides consistency across automations and solutions.

Deployment tools
After you build the Document Processing project in the Designer, you deploy the project to make it available for building your document processing application. The deployment process is also used to configure the repository to receive the processed documents from your end-user application.
Application templates and toolkits

You use the no- or low-code application building capabilities of Application Designer, customized templates and toolkits, and the AI model of your Document Processing project to create a document processing end-user application. This application recognizes your documents, extracts your relevant data, and presents issues to fix before sending the documents to storage and using the data in other systems.

You can preview, snapshot, and deploy your application to use for processing documents.

More advanced developers can use the toolkits to customize and extend the document processing application.

Document processing application and document management

The application that you build uses the AI and deep learning to automatically detect, extract, and standardize the data in all your documents. Any anomalies are flagged according to your customized model and the priority that you set so that your document processing user can correct issues before the documents are finalized.

When you deploy your document processing application, you connect it to a content repository that manages the document types and the extracted data for each document. The solution is fully integrated with IBM FileNet® Content Manager, simplifying document and data storage by applying your existing filing architecture and business rules to each processed document. The content and metadata are automatically saved in FileNet within the appropriate document class.

End result
  • Your invoice is stored as a document in the content repository, with appropriate retention and access controls.
  • An associated JSON file reflects all the extracted data for the document.
  • Properties are set on the document with the data definition-controlled values.
  • Your extracted data is cleaned, standardized, and ready for use in other applications.

With IBM Automation Document Processing, you can eliminate manual document processing and establish an automated solution that reads documents, refines data, and applies it to downstream applications.