Understanding documents in watsonx.ai
Convert high-quality business documents into a simpler file format that can be used by AI models with multiple text processing APIs in the document understanding library. Process text in documents to find and isolate key pieces of information from documents such as contracts.
Simplifying your business documents by converting them into a text-based format is especially useful for retrieval-augmented generation tasks where you want to find information that is relevant to a user query and include it with the input to a foundation model. Including accurate contextual information in model input helps the foundation model to incorporate factual and up-to-date information in the model output.
Capabilities
You can use a combination of multiple APIs from the document understanding library to analyze your documents. You can classify your document with the classification API into one of several supported common document types without running a longer extraction task. You can then use the results from the pre-processing step to efficiently extract unstructured data as well as structured data that is limited to the classified document type with the extraction API.
You can use the text processing APIs with documents that have different characteristics such as:
- Documents with content in multiple languages. See Supported languages.
- Documents in various file types that are stored in different storage types. For details, see:
The document understanding technology uses the following capabilities to scan and process your document:
- Optical character recognition
- Optical character recognition (OCR) detects text from images, scanned documents, and tables, and is useful for preserving information that is depicted in images, diagrams, or in text that is embedded in files such as scanned PDFs. Although OCR can extract text from noisy images, the quality of the image files must meet the minimum requirement of 80 DPI (dots per inch).
- Document structure identification
- The APIs process document content from various data structures including tables, section titles, bulleted lists, paragraphs, and footnotes. The API also identifies and removes commonly used content such as headers and footers.
- Key-value pair classification and extraction
- Use key-value pair extraction to process documents that contain generic or domain-specific structured data, like invoices, utility bills, and more. The extraction mode classifies documents based on the document type. The extracted text is stored in a data structure called a schema where each piece of data (the value) is associated with a unique identifier (the key). The mode uses a pre-defined schema or a custom schema that you define. Key-value pairs are extracted with large language models (LLMs) and advanced vision-language processing.
You can configure the various document understanding capabilities by using settings that are common to both the text classification and extraction APIs. For details, see Common text processing parameters.
Restrictions
- The text classification API can only be used with English language documents.
- Key-value pair extraction is only supported for English language documents.
Ways to work
You can process documents stored in your watsonx.ai project programmatically with the following text processing REST API methods:
- Text classification: Use the text classification API to find out whether a document can be classified into a supported pre-defined schema for a common document type.
- Text extraction: Use the text extraction API to extract specific entities or types of information based on the document structure.
Workflow
You can use the following high-level steps to process your documents with the document understanding library:
-
Store your input document in a supported storage type and define a connection to the storage type to access your document in a watsonx.ai project.
-
Optional: Classify your document with the watsonx.ai text classification API into one of the supported common document types or into a custom document type. See Classifying text in documents
-
Extract various types of information from your document with the watsonx.ai text extraction API. In your text extraction request, specify the storage type in your project where the extracted results should be stored. See Extracting text from documents.
If applicable, use the classification results to accurately configure your text extraction request to extract structured data from your document efficiently.
-
Use the results from the text extraction process in a RAG solution. See Adding extracted text to your RAG solution.
Costs
The text processing APIs in the document understanding library are available with all plans. Billing is based on the number of pages that are processed. For details, see Billing details for generative AI assets.