Document Processing event webhooks

A webhook is a way for IBM Automation Document Processing to provide near real-time information to other interested applications or services. When the subscribed event occurs, Document Processing makes an HTTP POST request to the URL that is configured for the webhook.

Overview

Webhooks are user-defined HTTP callbacks that are made with HTTP POST. They provide a loosely coupled means of integration between different services. Document Processing supports making such callbacks when the document processing completes.

In order for a webhook implementation to receive these event notifications, it must be registered with the Document Processing project that is the source of the intended events. Registration includes providing a callback URL, authentication information, and signature information. When the event happens, it makes an HTTP request to the webhook URL with the event payload as JSON and security tokens. For example, when the document processing completes, it can notify users with webhook.

You register, update, or delete a webhook in the settings of your project in Document Processing Designer. For more information, see Configuring your project settings.

Payload

After the event occurs in Document Processing, the server looks at the current subscriptions. If there is any subscription for the event, the server serializes the event into JSON and sends it as an HTTP POST payload to the webhook receiver.

The JSON payload depends on the type of event that is being serialized. At the high level every JSON contains the following elements:
eventId
The ID of an event. The webhook can use this ID to check for duplicate receipts of the event payload.
eventDateTime
The date and time of the event, in Coordinated Universal Time time zone.
eventType
The type of the webhook event.
webhookId
The unique ID of the webhook, which is generated when you register the webhook.
sourceObjectId
The event source object ID. Webhooks can use this ID to retrieve the object that caused this event.
properties
The detailed information, which might be different for each event type:
DocumentProcessingComplete
  content: 
  {
    status: “success” or “fail“,
    uniqueId: “xxx“,
    analyzerId: “xxxxx“
  }

ClassificationModelTrainingComplete
  content: 
  {
    status: “success” or “fail“,
    trainingId: “xxxxx“
  }

ExtractionModelTrainingComplete
  content: 
  {
    status: “success” or “fail“,
    trainingId: “xxxxx“
  }

ProjectExportingComplete
  content: 
  {
    status: “success” or “fail“,
    actionId: “xxxxx“
  }

ProjectImportingComplete
  content: 
  {
    status: “success” or “fail“,
    actionId: “xxxxx“
  }
Example of a request body in JSON:
{
  eventId: “xxxxx“,
  eventDateTime: ““,
  eventType: ““,
  webhookId: ““,
  projectId: “xxx“,
  sourceObjectId: “xxx“, 
  receiverRegistrationId: “xxxx“,
  properties: {
    ...
  }
}

Security and validation

The webhook request and response happen asynchronously. Sometimes the webhook might be waiting for the desired event for a long time. When Document Processing sends the event response to webhook, the server must secure the payload so that the integrity of the data is verifiable and the source of the data is confirmed.

If the authorizationToken parameter is not blank, it is attached as Authorization HTTP header.

If the signature parameter is not blank, then the signature of the request body is computed and attached as a Digest HTTP header, for example:
Digest: SHA-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

Document Processing sends the request body along with a signature that is generated by using the body and the HMAC cryptography algorithm to the client. Document Processing uses the Hash-based Message Authentication Code, or HMAC to provide both message integrity and authentication. HMAC is based on sharing a private secret key between the webhook and the Document Processing server. During the webhook registration, the credentials property is supplied with the JSON that contains the credential type and credentialSecret.

The JSON looks something like the following example:
{
CredentialType: “HMAC”,                       
CredentialSecret: “xxxxxxxx”
}

When Document Processing works on the event queue it generates the HMAC code and adds the code into the HTTP header HMAC. The payload for the webhook request is in plain text. The webhook, upon receiving the POST request from Document Processing, can read the HTTP header with the HMAC name and compute the HMAC, the JSON Payload, and the CredentialSecret that are shared with the Document Processing server.

If the computed HMAC is different from the HMAC that was included in the header, it implies that the data is being tampered with or is compromised, and the webhook can discard the message and send the appropriate HTTP status back. The Document Processing server is going to retry posting the event after 1 second until it exhausts the retry count or receives the 200 status code from webhook.