Setting up watsonx.ai enhanced extraction handler

Extract text from images and structured data by configuring the watsonx.ai enhanced extraction handler.

Before you begin

Before you configure the watsonx.ai enhanced extraction handler, complete the following prerequisites:

  • Install the Enhanced Extraction Extensions add-on. This add-on is a prerequisite for both the Watsonx add-on and the PDF Forms add-on. For more information, see Enhanced Extraction Extensions.
  • Install the Watson Enhanced Extraction Extensions add-on. For more information, see Watson Extraction Extensions.
  • Enable Persistent Text Extraction on the document classes where you want to use enhanced extraction. Persistent Text Extraction is an out-of-the-box feature in version 5.7.0 and 26.0.0 and does not require an add-on installation. However, you must enable it on each document class. For more information, see Enabling persistent text extraction.
  • Deploy Watson Document Understanding locally or configure watsonx.ai Text Extraction as a SaaS solution.

About this task

The watsonx.ai enhanced extraction handler extracts text from images and structured data with either Watson Document Understanding (local deployment) or watsonx.ai Text Extraction (SaaS). This handler is configured through the Content Conversion Actions in the Administration Console for Content Platform Engine.

Procedure

Complete the following steps to set up the watsonx.ai enhanced extraction handler.

  1. Open the Administration Console for Content Platform Engine.
  2. Go to the Content Conversion Actions folder.
    1. In the domain navigation pane, select the object store.
    2. In the object store navigation pane, select the Events, Actions, Processes > Content Conversion Actions folder.
  3. Locate the watsonx.ai enhanced extraction handler.
    The handler is automatically created when the Watson Enhanced Extraction Extensions add-on is installed.
  4. Open the handler properties.
    Double-click the watsonx.ai handler to view its properties.
  5. On the Connection tab, select the service type and specify the connection parameters.
    • If you select WDU (Watson Document Understanding), specify the Service URL for your local Watson Document Understanding deployment.
    • If you select WatsonX AI, specify the Service URL, Watsonx API Key, Watsonx Space ID, ICOS URL, ICOS API Key, and ICOS Connection ID for your watsonx.ai SaaS service.
  6. On the Operation tab, configure the processing parameters for the watsonx service.
    Specify Processing mode, OCR mode, OCR languages, and Extract types desired to control how documents are processed and what content is extracted.
  7. Enable the handler.
    On the General tab, select the Enabled checkbox to activate the watsonx.ai enhanced extraction handler.
  8. Save the configuration.
    Click Save to save the handler configuration.

Results

The watsonx.ai enhanced extraction handler is configured and ready to process documents. When documents are uploaded or updated, the handler extracts content and creates annotations. The Text Extraction Annotation (plain text) is always created. Depending on the extract types configured, the handler can also create markdown, kvps (key-value pairs in JSON format), and tables (tables in JSON format) annotations.

What to do next

After the handler setup is complete, enable enhanced extraction for the document classes where you want to use this handler. For more information, see Enabling enhanced extraction.