Configuring your project settings

From the Configure tab in Document Processing Designer, you can import and export a project ontology. An ontology is a set of document type and field type definitions, along with associated enrichments. You can also set the extraction and display name language for your project, set up webhooks to receive notifications for certain events, and configure your Git server connection. You use the Git server to share and version the document processing project.

Procedure

Import / Export ontology

  • To export a project as a .zip file, click Export project. By default, the exported file contains the document types, field types and enrichments, which you can use to start training with new sample files. You can also decide to include the training model and the sample training files in your export, if you want to move your entire project to a new instance of Document Processing for example.
  • To import a project, click Import project and select a JSON or a .zip file to import.

    When you import a .zip file, you have two options: overwrite the existing project, or merge the existing project. If you merge the existing project, document types, field types, enrichments, and sample training files are imported unless there is a conflict. Models are not imported.

    Warning: Importing a JSON file or a .zip file with the option Overwrite the existing project erases all your existing data, including document types, field definitions, aliases, classification models, data models, and sample files. Any existing sample file must be uploaded and annotated again. Because it is destructive, this type of import is only recommended when you begin a new project.

    If your project has already been deployed, you cannot import a project.

    Note:
    • By default, you can only import projects under 1 GB. If the size of your exported .zip file is equal or higher than 1 GB, try to clear the option Training sample files when you export the project. This can reduce the project file size so that you are able to reimport the ontology successfully.
    • You can only import a project with the overwrite option from the previous version and you can only merge a project from the current version. For more information, see Known limitations.

    Clients are recommended to run classification training after a project, which was created in a previous version (23.0.1 or before), is imported, or after an upgrade from the previous version is done.

Document processing

  • Select the engine to use for Optical Character Recognition (OCR). By default, engine 1 is used. If you have low-quality documents to process or want to use advanced features such as handwriting recognition, you should select engine 2. Advanced features are not supported for production. Using both engines might impact processing time.

Language settings

  • In Extraction language, select which languages are used in the documents that you plan to process. You can choose English, Dutch, French, German, Brazilian Portuguese, or Spanish.

    Make sure to choose only the language or languages that are likely to be used in your document sets. Choosing more than one language can affect the accuracy of your document processing model.

  • In Display name language, select the language that you use to enter display names for fields and document types. These are the names that are displayed in the Designer and in the applications.

    The display name language is also used in the Content Engine as the localized string locale setting for document classes and properties. Document Processing project deployment supports only one language per project. If your organization has multiple projects with different language settings, these projects cannot be deployed to the same Content Engine server if they share common properties. For example, when you define data definitions during data standardization, you cannot map a field to an existing data definition that was created in a different language.

  • In Extraction language (technology preview), you can select an additional language to test data extraction. This feature is a technology preview and is available only for non-production environments, only if you selected engine 1 as your extraction engine.

Git server configuration

  • Create a connection to the Git server for the first project that you create in Document Processing Designer. This setting applies to all subsequent projects that you create.
    1. Select your Git vendor.
    2. Enter the value for your Git server organization URL.
      The Git server REST API URL is automatically populated according to the two previous parameters.
    3. Choose API key or Password, and enter your Git Credentials.
    4. Click Test to verify that the connection is successful, then click Save.

    In a production deployment, your administrator created a Git organization for you.

    In a starter deployment, a Git server was set up for your environment during the installation process. You do not need to manually configure this setting.

    Note: In the CR, if you configured the shared_configuration.sc_egress_configuration.sc_restricted_internet_access parameter to true and you're connecting to a git service that requires an external connection, see Configuring cluster security on how to create a customized egress network policy.

Webhook configuration

  • Webhooks are HTTP custom callbacks that are triggered by certain events to send notifications to external applications. You must have a web server that can supply URLs to receive Document Processing notifications.
    1. To create a webhook, click Add a webhook.
    2. In the General tab, provide the following information:
      • A name for your webhook.
      • The URL to receive notifications.
      • The number of times your webhook can be called in case of failure. The default value is 5, the minimum value is 0, and the maximum value is 100.
    3. Select the types of events that trigger your webhook. You can select one of the following events:
      • Document Processing Complete
        Note: If you select all documents, you are notified whenever the system finishes processing each of your documents. If you select specified documents, you must use an API to upload the documents and the payload of the request must contain the webhookId parameter with a value that is equal to the current webhook ID. The ID is generated after the webhook creation. However, if you set the scope to "specified events" but do not specify the webhookId parameter in the payload, that webhook is not notified when the document processing is complete.
      • Classification Model Training Complete
      • Extraction Model Training Complete
      • Project Exporting Complete
      • Project Importing Complete
    4. In the Advanced settings tab, provide the following optional information:
      • Authorization token: When sending requests to the webhook URL, this token is sent as the Authorization HTTP header.
      • HMAC signature secret: IBM Automation Document Processing sends the request body along with a signature that is generated by using the body and the HMAC cryptography algorithm to the client through a Digest HTTP header. When the client receives the message, it uses the HMAC algorithm and the message received to deduce its signature. Currently, only Hash-based Message Authentication Code (HMAC) is supported as cryptography algorithm.
      • Receiver registration ID: An identifier that you provide and that is sent to the webhook URL when the event occurs.
    5. Click Add webhook.
      After you create your webhook, its ID is generated. To retrieve it, edit the webhook and copy-paste the webhook ID.

      For more information, see Document Processing event webhooks.