Getting started with document processing

Edit online

In this topic, the watsonx Orchestrate document processing and its capabilities are introduced. The information in the topic is designed for new users to help understand the features effectively. You will learn how to configure agents that can classify documents, extract fields from documents (such as contracts and invoices), and test how the extracted data is displayed to users in a chat.

The workflow starts by configuring the document classifier to identify the type of ingested document. Based on the classification results, the data is extracted from the documents. The workflow ends after the extracted fields are displayed to the users.

Creating an agent for document processing

Let's start by creating a new agent from scratch instead of using any pre-configured templates.

To create an agent:

From the menu , go to Build.
For IBM Cloud environment, select a workspace in the workspace list or create a workspace for your agent.
Select All Agents and click Create agent.
Select Create from scratch.
Enter a meaningful name for your agent such as Demo_Document_Processing.
Enter a description for your agent such as This agent classifies documents and extracts data from documents. To learn how to write a good description for your agent, click the What makes a good description? option.
Click Create.

The page where you can configure and test your new agent is displayed.

Example showing the new agent configuration page

The screen has three main areas:

Navigation panel: Move between sections of the agent builder such as Profile, Knowledge, and others.
Settings panel: Set up and configure the functions of your agent.
Preview panel: Test the agent and adjust settings to improve the user experience.

For more information about creating agents, see Building agents.

Adding the document processing tools to the agent

On the navigation panel, you can add tools to the agent. We will create an agentic workflow from the scratch and add document processing tools to the workflow.

To add the workflow to the agent:

On the navigation panel, click Toolset.
Click Add tool in the settings panel.
Select Agentic workflow.
Enter a meaningful name for the workflow. We will name the workflow Document Processing Demo.
Click Start building. The workflow page is displayed with the start and end nodes.

Now, let us add a description for the new workflow and specify any inputs and outputs for the workflow.

Naming and defining the inputs and outputs

Descriptions explain what each tool does and how to use the tool, but they don’t affect how the tool functions. To help AI agents effectively choose the right tool to assist users, follow these guidelines when writing descriptions:

Specify what the tool does and the specific tasks it supports. Clearly state its capabilities and any limitations.
Explain when to use the tool. Include relevant keywords, user actions, or request types.
Describe the situations or user requests that activate this tool. Focus on the intent or keywords that signal when the tool is relevant. It helps the agent decide the right moment to call the tool.
If the tool works with other tools or agents, explain how they interact and when to use them together.

Inputs and outputs are essential in agentic workflows because they define how data enters and exits the workflow. They apply to the entire workflow. Inputs provide the necessary starting information for the workflow or enable the workflow to start with context. Outputs deliver the final result or structured data after the workflow completes. They make the workflow useful by returning meaningful results.

In the example Demo_Document_Processing agent, we will add a description and an output parameter.

To add a description and an output parameter:

On the workflow page, click Edit details.
For this example workflow, enter the description This workflow classifies and extracts values from documents.
Click the Parameters tab to specify an output parameter for this workflow. We don't need to specify any input parameters because the documents will be uploaded by the user to the workflow during a chat.
Click Add output, and select string.
In the Name field, enter class_name and for Description, enter This parameter provides the document classification result.
Click Add.
Click Done.

We have now provided a description for the tool and also added an output parameter.

Now, to collect the document for ingestion, either add a user activity node to the workflow or the document classifier node that we will create later can prompt the user for a file upload.

Adding a user activity to collect documents (Optional)

To add a user activity node to the workflow:

Hover your mouse over the connector line between the start and end nodes.
Click the add flow items icon and select Flow nodes > User activity.

We want the users to upload files, so let us add a file upload interaction type to the user activity.

In the user activity node, click Add, and select Collect from user > File upload.

The user activity node with the file upload interaction is added to the workflow.

Example showing the user activity node

The next step after the user uploads a document is to classify the document.

Classifying the uploaded documents

To classify the uploaded document, let us add a document classifier node to the workflow. A document classifier can automatically identify the document type. You can define the document classes such as invoices and contracts, and the classifier uses AI to categorize documents based on those classes.

To add a document classifier to the workflow:

Hover your mouse over the connector line between the user activity and end nodes.
Click the add flow items icon and select Flow nodes > Document classifier.
Click the Document classifier node, click Add class, and add two document classes for this example workflow: Contract and Invoice.

The document classifier with the specified classes is added to the workflow.

Example showing the document classifier

Now, let's test this document classifier node to ensure that it is able to classify the documents as expected.

To test this document classifier:

Select the document classifier in the workflow.
Select Test classifier and upload a contract document and an invoice document.

For this example workflow, we are using a simple contract and an invoice document. Similarly, you can upload any of your contract and invoice documents to test the classifier.

It might take some time to upload the documents. After your documents are processed, you can see the predicted classification for each of your documents. Documents that do not match the defined classes are classified as Other.

After the documents are processed, the document classifier predicts the document type as contract or invoice.

Example showing the document classifier with predicted results

After you get the predicted results, click Done in the document classifier dialog.

For more information about document classifier, see Adding document classifiers.

Our example workflow can now classify a document that is uploaded by the user. So, next we will add a branch to the workflow to check the document type.

Adding a branch to check the document type

Branch can decide which path to take in a workflow based on a condition. We use a branch in this example workflow to take two paths: one for extracting data from contracts and the other from invoices.

To add a branch to the workflow:

Hover your mouse over the connector line between the document classifier and end nodes.
Click the add flow items icon and select Flow nodes > Branch.
Click the branch to define the path conditions. You can define path conditions either using the condition builder or expression editor. When creating a branch, two paths are generated by default, but you can add additional paths.

To set the path condition, we use the condition builder in this example.

For Path 1, click Edit condition.
Click the Plus icon, select the Document classifier node, and then select the class_name output.
Select the == operator and for the value, enter Contract.

When the class_name value is Contract, the workflow will now follow Path 1. You only need to define the condition for Path 1 because the workflow will follow Path 2 by default when the class_name value is not Contract.

Example showing the path 1 condition

Now, the example workflow can process a document and check its output type. Next, we can add the document extractors to the workflow to extract contract and invoice details.

Adding document extractors to extract data

Document extractors can extract fields or entities such as date, names, and others from documents.

To add a document extractor to Path 1 for extracting contract data:

Hover your mouse over the connector line for Path 1 between the branch node and end node.
Click the add flow items icon and select Flow nodes > Document extractor.
Select Unstructured.
Click the Edit fields icon and edit the name to Contract extractor.
Use the watsonx/meta-llama/llama-3-2-11b-vision-instruct model in the Model list. However, you can change the model anytime and select one that is most accurate for extracting data.
Upload a sample contract document.

The uploaded sample document helps in creating the fields to extract. The document does not train the model, and is also not a part of the agent that is being configured.

It might take some time to upload the document. When the upload is complete, you can see a document preview.

Click Add field to add fields for the information to extract from the documents such as buyer, supplier, date, and others.

Example showing an uploaded contract document with added fields to extract data

In the previous example, personal, effective date, supplier, and buyer fields are added, which are searched and shown on the document preview panel.

To edit the field details, hover over the field, and click the View field details icon edit . You can edit the field name, description, and data type. You can add examples of the field to help the model understand what information you want to extract.

Similarly, add a document extractor to Path 2 for extracting invoice data:

Hover your mouse over the connector line for Path 2 between the branch node and end node.
Click the add flow items icon and select Flow nodes > Document extractor.
Select Unstructured.
Click the Edit fields icon and edit the name to Invoice extractor.
Use the watsonx/meta-llama/llama-3-2-11b-vision-instruct model in the Model list. However, you can change the model anytime and select one that is most accurate for extracting data.
Upload a sample invoice document.
Click Add field to add fields for the information to extract from the documents such as buyer, supplier, date, and others.

Example showing an uploaded invoice document with added fields to extract data

In the previous example, address, item, quantity, and other fields are added, which are searched and shown on the document preview panel.

For more information such as how to manage uploaded documents, add examples for fields, and other options, see Adding document extractors.

Now, the workflow can extract contract and invoice data from uploaded documents. Next, we will add a user activity node to display results to users. This step allows us to display the results, based on the extracted fields, which we defined for each document type.

Displaying extracted data to users

We will add two user activity nodes to the workflow. One node to display the extracted contract data and the other to display the extracted invoice data.

To add a user activity node to display the contract data:

Hover your mouse over the connector line between the contract extractor node and end node.
Click the add flow items icon and select Flow nodes > User activity.
In the user activity node, click Add, and select Display to user > Message.

The user activity node with the message interaction is added to the workflow. Rename the message interaction to Display contract details.

Click the message node, and add variables from the contract extractor node to display the extracted data.

Example showing the variables to display the extracted contract data

To add a user activity node to display the invoice data:

Hover your mouse over the connector line between the invoice extractor node and end node.
Click the Add flow items icon and select Flow nodes > User activity.
In the user activity node, click Add, and select Display to user > Message.

The user activity node with the message interaction is added to the workflow. Rename the message interaction to Display invoice details.

Click the message node, and add variables from the invoice extractor node to display the extracted data.

Example showing the variables to display the extracted invoice data

Now that the workflow can display the output to users, we can now conclude the flow by updating the end node.

Defining the end node

We will update the end node to use the class_name output parameter that we defined at the beginning of the topic.

To update the end node:

Click the end node and select Edit data mapping.

You can view that auto-mapping is already applied to the class_name parameter. In this case, we have only defined one output parameter for the workflow in the beginning, so auto-mapping works well. For more complex workflows, such as when there are two class_name parameters, you can use explicit variable mapping to handle them.

For example, if you want to explicitly map the class_name parameter instead of using auto-mapping, click the Variable icon , and choose the class_name parameter from the Document classifier node.

If you have followed the steps in this topic, your workflow looks similar to the one in the following image:

Example showing the variables to display the extracted invoice data

Click Done to save your changes. The page where you can configure and test your new agent is displayed.

Adding a behavior to your agent

To trigger the correct agent, you must define the instructions within the Behavior section of your new agent.

To add a behavior for your new agent:

On the navigation panel of the page, click Behavior.
Define how and where your new agent should react to requests and respond to users. For example, for this agent, you can enter Invoke the Demo_Document_Processing tool and output the result.

Now, you can test your new agent by uploading documents.

Testing your agent

To trigger your agent for testing:

Type Invoke the Demo_Document_Processing tool in the preview panel and press Enter.
Upload your contract document. You can also repeat the test by uploading an invoice document.

After you upload your document, the agent automatically runs the workflow steps in the background. It extracts the relevant data based on the specified fields and generates a response containing the extracted information. In addition, it provides insights into the document’s classification.

Your results should look similar to the one in the following image:

Example showing the preview panel after uploading a contract

This section concludes the building an agent from scratch, configuring the agent to classify documents, extracting fields from documents, and testing the agent to view how the extracted data is displayed to users in a chat.