Automation Document Processing API

Automation Document Processing, using the Document Processing API, offers the power of intelligent capture with the flexibility of an API that enables you to extend the value of the technology stack for your core enterprise content management and helps you rapidly accelerate extraction and classification of data in your documents.

Prerequisites

  1. Make sure Document Processing is deployed and initialized successfully by retrieving the overall deployment status:
    acacm=$(oc get cm -o name |grep aca-config)
      oc get $acacm -o jsonpath='{.data.ACA_INIT_STATUS}'

    This command should return True. If that is not the case, review your deployment.

  2. Create a new project using Document Processing Designer if you have not done so already.

Retrieve the required information

Retrieve the Document Processing backend API URL information.

Document Processing backend URL:
echo "https://$(oc get route cpd -o jsonpath="{.spec.host}")/adp/aca"
You can verify the Document Processing build by running the following command:
curl -k https://$(oc get route cpd -o jsonpath="{.spec.host}")/adp/aca/ping
The result should be similar to the following:
<h1>IBM Content Analyzer Ping Page</h1><p>Build: APD-Backend/master_21.0.3.0.1200  Thu Sep 30 09:57:46 PDT 2021

Document Processing API details

You can authenticate either through a Zen API Key or a Zen token.

Authenticating with a Zen API Key
A Zen API Key does not store any credentials or have an expiration time. It is also a good choice if you need to call the Document Processing with a service count. To authenticate with a Zen API Key:
  1. Open the Cloud Pak Platform UI (Zen) home page, for example https://<adp_url>/zen/#/homepage.
  2. Generate an API Key by clicking Profile and settings > API Key > Generate new key.
    Screenshot of the zen homepage
    Screenshot of the zen homepage
  3. Encode the Zen API Key with the username by using Base64 as follows:
    <username>:<api key> => <Base64 encoded>
  4. Send it as the Authorization header with the prefix ZenApiKey:
    Authorization: ZenApiKey <encoded value>
    For example:
    curl --location --request POST 'https://${Zen_host}/adp/aca/v1/projects/$project_name/analyzers' \
    --header 'Authorization: ZenApiKey Y2VhZG1pbjo4akJZeFVtY296NFBWd2hHaEMzeE5GYThVcDFkRlpWWWhXVFVabXI4' \
    --form 'file=@"/C:/Users/SampleFiles/APT003.pdf"' \
    --form 'responseType="\"json\""' \
    --form 'jsonOptions="\"HR\", \"DC\", \"KVP\", \"TH\", \"OCR\", \"SN\", \"MT\", \"CB\", \"ST\", \"DS\", \"AI\", \"CHAR\""'© Copyright IBM Corporation 2022
    
Authenticating with a Zen token
By using the jq JSON command-line processor and the curl tool, generate the Zen JSON web token (JWT). As the username value, pass the username that belongs to the appropriate group, such as captureadmins or projectadmins.
Zen_host=$(oc get route cpd -o jsonpath="{.spec.host}")
username=<The username that belongs  to the appropriate group such as `captureadmins` or `projectadmins`.>
pwd= <password>
Zen_JWT=$(curl -u "$username:$pwd" -Ssk -X GET "https://$Zen_host/v1/preauth/validateAuth" | jq -r '.accessToken')
For more information about roles, see Business Teams permissions.
Setting the Document Processing project ID
  1. Open your project in Document Processing Designer and open the spbackend container log. Search for project_id to find the ID.
  2. Send in the request with the Zen_JWT token:
    curl -k --location --request POST "https://${Zen_host}/adp/aca/v1/projects/$project_name/analyzers" \
    --header "Authorization: Bearer $Zen_JWT" \
    --form "file=@"/tmp/TLG6TV.pdf"" \
    --form "responseType="json"" \
    --form "jsonOptions="ocr,dc,kvp,sn,hr,th,mt,ai,ds,char""
    Attention: The char option causes high memory usage from the postprocessing pod, which might result in the process running out of memory and stopping. If you need to use this option, monitor and increase the RAM accordingly for postprocessing pods.
Response handling
Response properties and examples. The Document Processing API uses standard HTTP response codes to indicate whether a call is successful or not. A 200 code indicates success. A 202 code indicates that the file is accepted for processing. A 4xx code indicates errors that might be caused by clients input. A 5xx code indicates a server-related error. Only one document per API call is allowed. For more information about response codes, see Response codes.
Property Data Type Description
code integer The status associated with the response.
messageId string Document Processing message code.
message string The status associated with the response.
analyzerId string The ID associated with the API call. It is used in the other requests later.
fileNameIn string Name of the uploaded file.
type array The type of file that is generated.
errorId string Document Processing error code.
explanation string Explanation for the error.
action string The action that is needed to correct the error.

Response examples

Example of a successful response
{ 
  "status": {
    "code": 202,
    "messageId": "CIWCA50000",
    "message": "Success"
  },
  "result": [
    {
      "status": {
        "code": 202,
        "messageId": "CIWCA11106",
        "message": "Content Analyzer request was created"
      },
      "data": {
        "message": "json processing request was created successful",
        "fileNameIn": "Legal Invoice 15.pdf",
        "analyzerId": "ac3afc50-2c52-11ec-b296-c35cda005f89",
        "type": [
          "json"
        ]
      }  
    }
  ]
}
Example of an error response
{
  "error": "invalid_token",
  "error_description": "access token is missing or invalid."
}