Setting up watsonx.ai enhanced extraction handler
Extract text from images and structured data by configuring the watsonx.ai enhanced extraction handler.
Before you begin
Before you configure the watsonx.ai enhanced extraction handler, complete the following prerequisites:
- Install the Enhanced Extraction Extensions add-on. This add-on is a prerequisite for both the Watsonx add-on and the PDF Forms add-on. For more information, see Enhanced Extraction Extensions.
- Install the Watson Enhanced Extraction Extensions add-on. For more information, see Watson Extraction Extensions.
- Enable Persistent Text Extraction on the document classes where you want to use enhanced extraction. Persistent Text Extraction is an out-of-the-box feature in version 5.7.0 and 26.0.0 and does not require an add-on installation. However, you must enable it on each document class. For more information, see Enabling persistent text extraction.
- Deploy Watson Document Understanding locally or configure watsonx.ai Text Extraction as a SaaS solution.
About this task
The watsonx.ai enhanced extraction handler extracts text from images and structured data with either Watson Document Understanding (local deployment) or watsonx.ai Text Extraction (SaaS). This handler is configured through the Content Conversion Actions in the Administration Console for Content Platform Engine.
Procedure
Complete the following steps to set up the watsonx.ai enhanced extraction handler.
Results
The watsonx.ai enhanced extraction handler is configured and ready to process documents. When documents are uploaded or updated, the handler extracts content and creates annotations. The Text Extraction Annotation (plain text) is always created. Depending on the extract types configured, the handler can also create markdown, kvps (key-value pairs in JSON format), and tables (tables in JSON format) annotations.
What to do next
After the handler setup is complete, enable enhanced extraction for the document classes where you want to use this handler. For more information, see Enabling enhanced extraction.