Configuring your project settings
From the Configure tab in Document Processing Designer, you can import and export a project ontology. An ontology is a set of document type and field type definitions, along with associated enrichments. You can also set the extraction and display name language for your project, set up webhooks to receive notifications for certain events, and configure your Git server connection. You use the Git server to share and version the document processing project.
Procedure
Import / Export ontology
- To export a project as a .zip file, click Export project. By default, the exported file contains the document types, field types and enrichments, which you can use to start training with new sample files. You can also decide to include the training model and the sample training files in your export, if you want to move your entire project to a new instance of Document Processing for example.
- To import a project, click Import project and select a JSON or a
.zip file to import.
When you import a .zip file, you have two options: overwrite the existing project, or merge the existing project. If you merge the existing project, document types, field types, enrichments, and sample training files are imported unless there is a conflict. Models are not imported.
Warning: Importing a JSON file or a .zip file with the option Overwrite the existing project erases all your existing data, including document types, field definitions, aliases, classification models, data models, and sample files. Any existing sample file must be uploaded and annotated again. Because it is destructive, this type of import is only recommended when you begin a new project.If your project has already been deployed, you cannot import a project.
Note:- The maximum size of an exported project is 1 GB. If the size of your exported .zip file exceeds this limit, clear the option Training sample files when you export the project. This action keeps the project file size within the allowed maximum.
- You can only import a project with the overwrite option from the previous version and you can only merge a project from the current version. For more information, see Known limitations.
Clients are recommended to run classification training after a project, which was created in a previous version (23.0.1 or before), is imported, or after an upgrade from the previous version is done.
Document processing
- Select the engine to use for Optical Character Recognition (OCR). By default, engine 1 is used. If you have low-quality documents to process or want to use advanced features such as handwriting recognition, you should select engine 2. Advanced features are not supported for production. Using both engines might impact processing time.
Language settings
-
In Extraction language, select which languages are used in the documents
that you plan to process. You can choose English, Dutch, French, German, Brazilian Portuguese, or
Spanish.
Make sure to choose only the language or languages that are likely to be used in your document sets. Choosing more than one language can affect the accuracy of your document processing model.
- In Display name language, select the language that you use to
enter display names for fields and document types. These are the names that are displayed in the
Designer and in the applications.
The display name language is also used in the Content Engine as the localized string locale setting for document classes and properties. Document Processing project deployment supports only one language per project. If your organization has multiple projects with different language settings, these projects cannot be deployed to the same Content Engine server if they share common properties. For example, when you define data definitions during data standardization, you cannot map a field to an existing data definition that was created in a different language.
- Technology preview In Extraction language (technology preview), you can select an additional language to test data extraction. This feature is available only if you selected engine 1 as your extraction engine in non-production environments.
Git server configuration
- Create a connection to the Git server for the first project that you create in Document Processing Designer. This setting
applies to all subsequent projects that you create.
- Select your Git vendor.
- Enter the value for your Git server organization
URL. The Git server REST API URL is automatically populated according to the two previous parameters.
- Choose API key or Password, and enter your Git Credentials.
- Click Test to verify that the connection is successful, then click Save.
In a production deployment, your administrator created a Git organization for you.
In a starter deployment, a Git server was set up for your environment during the installation process. You do not need to manually configure this setting.
Note: If you installed your deployment with network policies and your Git service requires an external connection, then see Configuring cluster security to create a customized egress network policy.
Webhook configuration
- Webhooks are HTTP custom callbacks that are triggered by certain events to send
notifications to external applications. You must have a web server that can supply URLs to receive
Document Processing notifications.
- To create a webhook, click Add a webhook.
- In the General tab, provide the following information:
- A name for your webhook.
- The URL to receive notifications.
- The number of times your webhook can be called in case of failure. The default value is 5, the minimum value is 0, and the maximum value is 100.
- Select the types of events that trigger your webhook. You can select one of the
following events:
- Document Processing CompleteNote: If you select all documents, you are notified whenever the system finishes processing each of your documents. If you select specified documents, you must use an API to upload the documents and the payload of the request must contain the
webhookIdparameter with a value that is equal to the current webhook ID. The ID is generated after the webhook creation. However, if you set the scope to "specified events" but do not specify thewebhookIdparameter in the payload, that webhook is not notified when the document processing is complete. - Classification Model Training Complete
- Extraction Model Training Complete
- Project Exporting Complete
- Project Importing Complete
- Document Processing Complete
- In the Advanced settings tab, provide the following optional
information:
- Authorization token: When sending requests to the webhook URL, this token is sent as the Authorization HTTP header.
- HMAC signature secret: IBM Automation Document Processing sends the request body along with a signature that is generated by using the body and the HMAC cryptography algorithm to the client through a Digest HTTP header. When the client receives the message, it uses the HMAC algorithm and the message received to deduce its signature. Currently, only Hash-based Message Authentication Code (HMAC) is supported as cryptography algorithm.
- Receiver registration ID: An identifier that you provide and that is sent to the webhook URL when the event occurs.
- Click Add webhook. After you create your webhook, its ID is generated. To retrieve it, edit the webhook and copy-paste the webhook ID.
For more information, see Document Processing event webhooks.