watsonx.ai actions

Default instance name: watsonx_ai.Actions.

Details

The watsonx.ai actions allow a Datacap application to submit questions, via an AI prompt, to the watsonx.ai service. The result from the prompt is saved in a Datacap field or DCO variable. The application can then store or act on the response from the AI model. The actions can provide page data to the model and have the model return desired information. One might use these actions to ask the model to classify the page based on the page text. Another use might be to ask the model to find metadata or key-value pairs in the page data. While these are common steps within a Datacap application, the actions are also intended to be open, allowing an application developer to customize the prompt to any appropriate prompt where a model can be useful in processing of page data.

These actions allow connectivity to the IBM watsonx.ai service. These actions are not intended for any other service provider.

Responses from large language models (LLMs) can vary greatly based on the question and provided context. It may take some experimentation to create prompts that return data in a consistent manner.

watsonx.ai Account Information

It is required to have a watsonx.ai account to use the watsonx.ai actions. At the time of publishing, a watsonx.ai account can be created at this address: http://watsonx.ai.

For the Datacap watsonx.ai actions to submit requests, the target account needs to be properly configured with a workspace and an API Key must be generated. The API Key is used to authenticate the Datacap actions with your watsonx.ai account. The watsonx.ai URLs, models, and configuration information are separate from the Datacap product. Refer to the watsonx.ai documentation for steps to setup an account and to configure the required workspace and API Key. It will also be critical to keep current on the watsonx.ai URLs and models as these can be deprecated and changed over time. Updating your application before a change from the service is required to keep your Datacap watsonx.ai actions to continue functioning as required.

Refer to the watsonx.ai website for all documentation regarding account setup, available models, pricing, and deprecation schedules.

Important Model Considerations

Any model, the default model or a user specified model, can eventually be deprecated or removed by the service provider. This change is out of the hands of the Datacap product. It is important to keep up with any notices from the service provider about an upcoming model deprecation or removal.

When changing to a new model, comprehensive testing is required to help ensure that the model performs enough to be processed by the Datacap actions and that the results are acceptable for the application.

It is recommended to review the service documentation as to the purpose of the selected model. Models that are targeted to responding to programmatic requests will generally work better in a "zero-shot" approach than a "chat" model, which typically uses a back-and-forth approach to refine an answer.

Important Prompt Considerations

There are two aspects to getting good results, the first is the model that is selected and the second is the prompt. One large language model may work well with a specific prompt while another model will respond poorly to the same prompt. Some of the actions in this library allow for any prompt while others have restrictions on the prompt and on the required response from the model. Review the help text for the actions to determine whether the action fits the required use case. Perform sufficient testing to ensure that the model is returning results in the format that is required by the action and the results are acceptable for the application. It is up to the application to handle situations where the model does not return any results or the model returns inaccurate results. The models available are at the discretion of the watson.ai service. The way the models return results, and the contents of the results, are beyond the control of the Datacap product.

Action Summary

AskAFreeFormQuestion: Allows any prompt text and stores the result in a single location.
AskAQuestionUsingPageText:: Allows a custom question and builds it into a prompt with page text.
AskForKVPsAndCreateFieldsUsingPageText: Asks the LLM for the keys it finds on page text and creates fields with those keys.
AskForPageValuesUsingKeys: Allows specification of specific keys to obtain and asks the LLM to find those keys in the page text. The results are stored in pre-defined fields where the corresponding key is specified.
SetAccessTokenURL: Sets the token API endpoint.
SetAPIKey: The unique API key for the target watsonx.ai account. The API Key must be configured with the key for the target account.
SetDecodingMethod: Allows configuration of "greedy" or "sample".
SetEndpointURL: The prompt API endpoint.
SetMaximumNewTokens: The maximum tokens is returned.
SetModel: The model that evaluates the prompt.
SetProjectID: The watsonx.ai project ID. Required to be configured with the watsonx.ai project ID.
SetTemperature: Controls the variation in a sample.
SetTimeLimit: The maximum time allowed for the transaction.

The action AskForPageValuesUsingKeys is a good candidate to retrieve values of keys from a page of text. It allows fields to be found on a page without using the traditional Datacap method of setting up a fingerprint that defines the location of all fields on a page. To do this, page fields are created at setup time in the usual way for Datacap, and then tagged with the expected name of the key to find on the page. While there is no predetermined list of keys for a document type, testing has shown that a model will generally understand expected keys for a document type. For example, on an Invoice document, the LLM understands that "Invoice Number" is a key-value pair that typically exists. AskForPageValuesUsingKeys will then fill all of the fields with the values that are returned from the model. Classification would be performed before using AskForPageValuesUsingKeys, so the correct key names are requested for the page type. Page classification could be performed with standard Datacap techniques or might be performed by using the AskFreeFormQuestion action, submitting the document text to the model and asking it to classify the document, then assigning the page type based on the model response.

How To Use The Actions

The application must first configure the service parameters by using the "Set" actions. Once configured, the "Ask" actions can be used to submit query prompts and obtain results for use within a Datacap application.

The actions that must be called are the following:

SetProjectID
SetAPIKey
SetMaximumNewTokens

Additional Set actions can be called, if needed to change the default values. The Set actions can be performed in any order, as long as they are configured before calling an Ask action. It is allowed to call a set action again to change parameters during processing. For example, a specific model may be set using SetModel and then an Ask action that is called, then SetModel might be called again, providing a different model name and then calling another Ask action.

Typically it is necessary to call SetMaximumNewTokens to increase the allowed tokens to the value allowed by the model.

Issues With Results

The nature of models can cause returned results to change or sometimes be unpredictable. These issues are beyond the scope of the action. If the action is not returning results in the correct format, then try changing the keys that are being requested for the page or a different model.

When models are updated by the model provider, the change may cause different results or no results. Always fully test before using updated models in production. If a model is changed while in production, it can cause results to be returned in unexpected ways. Changes by the service are independent of the Datacap product.

Model updates occur as determined by the service provider and are beyond the control of the Datacap product. It is up to the implementer that uses these actions to monitor model changes and fully test as needed.

If the results are truncated due to exceeding the maximum allowed tokens, increase the maximum token limit to that supported by the model. It is possible for the token limit to reach the maximum that is allowed by the model when generating results for a single page of text. In this situation, the prompt would need to be changed to have the LLM return fewer results.

Page Text Format

When page text is included with the prompt sent to the model, it is included as plain text by default. If the page has a table or structured information, the model usually responds better when the page text is sent in HTML format. This allows the model to better understand the relationships of the text. It can often help the model provide better answers. If the HTML option is enabled in the action, a layout file must exist in addition to the CCO file. The Recognize() actions in the recognition libraries create a layout file containing the page results. A CCO can then be created from the layout file by calling the action CreateCcoFromLayout in the SharedRecognitionTools action library.

It is suggested to use a recognition engine that does the best at detecting the tables on the document. Table detection quality varies between recognition engines.

Tip: The prompt sent to the model can be found in the log, when SetEnhancedLogging is enabled. When developing a prompt, it is suggested to copy the basic prompt, with the page text, from the log and place it into the watsonx.ai prompt lab interactive prompt. Using the prompt lab, prompts can be quickly tweaked and tested. Be sure to delete the answer before trying a changed prompt. Once you have a prompt that is responding as desired, that new prompt can be used in the action instead of the default prompt.

Custom Prompts Using Text Files

The Ask actions allow the prompt to be customized and provided as a parameter to the action. While it is possible to construct the prompt as a long input parameter, this can be difficult to create and maintain. An alternative approach is to create a text file with the custom prompt. The text file can contain the custom prompt, plus any replacement parameters supported by the action.

To input a text file as the custom prompt, first create the prompt text in a plain text file. It can be helpful to use an interactive prompt to help create prompt text that provides desired results. Replacement parameters, such as PAGETEXT and LLMKEYS, can also be placed into the prompt text file.

The text file must be placed in a location accessible by the rules engine. A good location would be within the application directories. If needed, multiple text files can be created.

The file IO action, ReadFile, will read a plain text file into a DCO variable. The contents of this variable can then be passed to the Ask questions. This method allows for a clear way of creating prompt text, providing easy maintenance.

Placing the prompt text into a DCO variable will increase the size of the DCO. This text will be in addition to all of the information about the batch and pages that also exist in the DCO. A large DCO does not perform as well as a small DCO. It is highly recommended to remove the prompt text from the DCO after the action has completed by setting the variable to @EMPTY.

ReadFile("C:\MyApplication\PromptText1.txt","@P.MyPrompt")
AskAQuestionUsingPageText("@P.MyPrompt", "@P\InvoiceNum", "")
rrSet("@EMPTY", "@P.MyPrompt")

The example reads the custom prompt from a text file, passes it to an Ask question, then removes the prompt text from the DCO.

Token Use

When response is returned, it uses tokens from the server. The amount of tokens used by the request are listed in the log file.

Bad Request Error

The most common type of error is a "Bad Request". The error will usually appear when working to get the actions setup with the correct URLs, keys, and so on. It is a generic error that means something is wrong with the request that is made to the endpoint. If you are receiving this error, confirm that specified URLs are correct. Likewise, the API key and model names must be correct as well.

If the model specified does not exist on the service, or it has been spelled wrong, it will typically cause a "Bad Request" error and the response can not specifically state the model name is invalid. Model names can change over time, including the default model name provided by these actions. Check the log for the exact model name being provided to the endpoint and confirm that the modelname exactly matches the expected model name by the endpoint.

Prompt Engineering

By default, each of the Ask actions create a prompt to the model. The built-in prompt may not satisfy all possible use cases. This is not necessarily a defect in the model, but simply how the model interpreted the question and what it believes to be the best answer to the prompt.

Common problems with the answer:

The required data is not returned.
The required data is returned but contains extra text that is not relevant.

The results of a prompt are dependent on these things:

The prompt or question.
The data from the page and the format of the data.
The selected model to process the prompt.

The crafting of an appropriate prompt is called Prompt Engineering. Slight changes to the wording of a prompt can create notable differences in the response. Prompt Engineering is an interesting aspect of the AI world. The Datacap actions use a "zero-shot" approach to prompting. This means that the action has one chance to ask the question and get the correct answer. The prompt must be crafted in a way to provide enough information and direction to the model so it returns the answer on the first try.

Each of the Ask actions allows a custom prompt to be provided. The custom prompt overrides the built-in prompt. The actions also allow the custom prompt to contain a special tag, which will be replaced with the page text when the action submits the prompt to the model. The custom prompt can control every aspect of the prompt and the location of the page text within the page.

One approach to crafting a custom prompt is to use the interactive prompt page in the watsonx.ai system. When the actions have the extended logging enabled, the prompt is placed into the log file. An application developer can copy the prompt text from the log as a starting point and paste it into the freeform section of the watsonx.ai prompt GUI. From there, the text can be easily tweaked and run interactively to immediately see how a particular model responds. This interactive approach allows quick testing of different wording and different models to determine which works the best for a use case. Once a good prompt has been created, it can be added to the action as a custom prompt. Refer to the help for the action for more information for how to include page data within the custom prompt.