Automation Document Processing API
Automation Document Processing, using the Document Processing API, offers the power of intelligent capture with the flexibility of an API that enables you to extend the value of the technology stack for your core enterprise content management and helps you rapidly accelerate extraction and classification of data in your documents.
Prerequisites
-
Make sure Document Processing is deployed and initialized successfully by retrieving the overall deployment status:
acacm=$(oc get cm -o name |grep aca-config) oc get $acacm -o jsonpath='{.data.ACA_INIT_STATUS}'
This command should return True. If that is not the case, review your deployment.
- Create a new project using Document Processing Designer if you have not done so already.
Retrieve the required information
Retrieve the Document Processing backend API URL information.
Document Processing backend
URL:
echo "https://$(oc get route cpd -o jsonpath="{.spec.host}")/adp/aca"
You can verify the Document Processing build by running the following
command:
curl -k https://$(oc get route cpd -o jsonpath="{.spec.host}")/adp/aca/ping
The result should be similar to the
following:
<h1>IBM Content Analyzer Ping Page</h1><p>Build: APD-Backend/master_21.0.3.0.1200 Thu Sep 30 09:57:46 PDT 2021
Document Processing API details
You can authenticate either through a Zen API Key or a Zen token.
- Authenticating with a Zen API Key
- A Zen API Key does not store any credentials or have an expiration time. It is also a good
choice if you need to call the Document Processing with a service count. To
authenticate with a Zen API Key:
- Open the Cloud Pak Platform UI (Zen) home page, for example https://<adp_url>/zen/#/homepage.
- Generate an API Key by clicking .
- Encode the Zen API Key with the username by using Base64 as
follows:
<username>:<api key> => <Base64 encoded>
- Send it as the Authorization header with the prefix
ZenApiKey
:
For example:Authorization: ZenApiKey <encoded value>
curl --location --request POST 'https://${Zen_host}/adp/aca/v1/projects/$project_name/analyzers' \ --header 'Authorization: ZenApiKey Y2VhZG1pbjo4akJZeFVtY296NFBWd2hHaEMzeE5GYThVcDFkRlpWWWhXVFVabXI4' \ --form 'file=@"/C:/Users/SampleFiles/APT003.pdf"' \ --form 'responseType="\"json\""' \ --form 'jsonOptions="\"HR\", \"DC\", \"KVP\", \"TH\", \"OCR\", \"SN\", \"MT\", \"CB\", \"ST\", \"DS\", \"AI\", \"CHAR\""'© Copyright IBM Corporation 2022
- Authenticating with a Zen token
- By using the jq JSON command-line processor and the curl
tool, generate the Zen JSON web token (JWT). As the username value, pass the
username that belongs to the appropriate group, such as
captureadmins
orprojectadmins
.
For more information about roles, see Business Teams permissions.Zen_host=$(oc get route cpd -o jsonpath="{.spec.host}") username=<The username that belongs to the appropriate group such as `captureadmins` or `projectadmins`.> pwd= <password> Zen_JWT=$(curl -u "$username:$pwd" -Ssk -X GET "https://$Zen_host/v1/preauth/validateAuth" | jq -r '.accessToken')
- Setting the Document Processing project ID
-
- Open your project in Document Processing Designer and open the
spbackend
container log. Search forproject_id
to find the ID. - Send in the request with the
Zen_JWT
token:curl -k --location --request POST "https://${Zen_host}/adp/aca/v1/projects/$project_name/analyzers" \ --header "Authorization: Bearer $Zen_JWT" \ --form "file=@"/tmp/TLG6TV.pdf"" \ --form "responseType="json"" \ --form "jsonOptions="ocr,dc,kvp,sn,hr,th,mt,ai,ds,char""
Attention: The char option causes high memory usage from thepostprocessing
pod, which might result in the process running out of memory and stopping. If you need to use this option, monitor and increase the RAM accordingly forpostprocessing
pods.
- Open your project in Document Processing Designer and open the
- Response handling
- Response properties and examples. The Document Processing API uses standard HTTP response codes
to indicate whether a call is successful or not. A 200 code indicates success. A 202 code indicates
that the file is accepted for processing. A 4xx code indicates errors that might be caused by
clients input. A 5xx code indicates a server-related error. Only one document per API call is
allowed. For more information about response codes, see Response codes.
Property Data Type Description code integer The status associated with the response. messageId string Document Processing message code. message string The status associated with the response. analyzerId string The ID associated with the API call. It is used in the other requests later. fileNameIn string Name of the uploaded file. type array The type of file that is generated. errorId string Document Processing error code. explanation string Explanation for the error. action string The action that is needed to correct the error.
Response examples
- Example of a successful response
-
{ "status": { "code": 202, "messageId": "CIWCA50000", "message": "Success" }, "result": [ { "status": { "code": 202, "messageId": "CIWCA11106", "message": "Content Analyzer request was created" }, "data": { "message": "json processing request was created successful", "fileNameIn": "Legal Invoice 15.pdf", "analyzerId": "ac3afc50-2c52-11ec-b296-c35cda005f89", "type": [ "json" ] } } ] }
- Example of an error response
-
{ "error": "invalid_token", "error_description": "access token is missing or invalid." }