IBM Support

Integrating Content Platform Engine with Business Automation Content Analyzer

Product Documentation


Abstract

You can use Business Automation Content Analyzer to classify documents from your Content Platform Engine. Upon classification, document metadata like document class, document title, other custom properties are updated, automatically enriching your metadata.

Content

You can enable the integration of Business Automation Content Analyzer with Content Platform Engine by deploying the tool to the object store that you designate. You use configuration and mapping files with Java commands to enable the deployment.  Then you use the Administration Console for Content Platform Engine to enable the classification method that you choose.

Getting started with the Business Automation Content Analyzer integration

Before you are ready to deploy the integration to your object store, you must obtain the following prerequisite systems and files:
  • Linux system with Java 1.8 or higher installed.
  • A FileNet Content Manager system, deployed in a container environment.
  • Content Platform Engine WSI client JAR files (V5.5.2 or later).
    • Download the CEWS client archive from the Administration Console for Content Platform Engine client download
      In the administration console, expand your domain, go to Client API Download > IBM FileNet Content Manager > Java CEWS client, and click Download Feature Set. The JavaCEWSclient.zip file is downloaded to your system.
  • OKHttp open source JAR files, including dependencies. Download the three required files from the maven repository:
  • A FileNet Content Manager administrative account with full control access to the object store that you designate. All Business Automation Content Analyzer and Content Platform Engine integration metadata and sweeps will be created under the context of this user and will be owned by this user.
  • The Business Automation Content Analyzer for Content Platform Engine JAR file.

    You need an IBM login and password and your customer number to access this download.

    1. Log in to IBM Fix Central with your IBM login and password.
    2. Under Find Product, in the Product selector field, type FileNet Content Engine.
    3. In the Installed version field choose 5.5.3.0, in the Platform field, choose All, then click Continue.
    4. Click Individual fix IDs, enter the following value: 5.5.3.0-P8CPE-ALL-LA005:your_customer_number, and click Continue.
    5. Select 5.5.3.0-P8CPE-ALL-LA005 and click Continue to download the archive.
Before you deploy, you must plan for the details of the integration. During deployment, you provide these details in a configuration file and a mapping file, which you create using the examples and parameter descriptions in the subsequent procedures:
  • Configuration JSON file: baca_config.json
  • Mapping JSON file: baca_mapping.json
For details about the required and optional parameters, see Configuration and mapping reference.
To deploy the Business Automation Content Analyzer integration to Content Platform Engine:
 
  1. On your Linux system, create a directory for the deployment, for example, /baca.
  2. Unzip the JavaCEWSclient.zip archive, and move the Content Platform Engine client files to the /baca directory.
  3. Create a subdirectory under the /baca directory called libs.
  4. Move the following files to the /baca/libs directory:
    • Business Automation Content Analyzer integration JAR file (from the Fix Central archive)
    • kotlin-stdlib-1.1.0.jar
    • okhttp-3.13.1.jar
    • okio-2.2.2.jar
  5. Create a file called baca_config.json and save it to the /baca directory. The following example shows the file contents. The values are examples only, and you should update them to match your environment:

    {
      "url": "https://<hostname>/backendsp/ca/rest/content/v1/contentAnalyzer",
      "certificateValidationEnabled": false,
      "connectionTimeout": 30000,
      "readTimeout": 30000,
      "writeTimeout": 30000,
      "functionalId": "fidname@companyname.com",
      "maximumJsonPages": 10,
      "maximumContentSize": 5242880,
      "deferralSeconds": 60,
      "maximumDeferrals": 20,
      "supportLogEnabled": true,
      "tempDirectory": "/opt/ibm/textext",
      "disableBacaFileDeletion": false,
      "details": [
        {
          "mimetype": "image/tiff",
          "fileExtension": "tiff",
          "responseTypes": "JSON",
          "jsonOptions": "dc,kvp"
        },
        {
          "mimetype": "application/pdf",
          "fileExtension": "pdf",
          "responseTypes": "JSON",
          "jsonOptions": "dc,kvp"
        },
        {
          "mimetype": "image/jpeg",
          "fileExtension": "jpeg",
          "responseTypes": "JSON",
          "jsonOptions": "dc,kvp"
        },
        {
          "mimetype": "image/png",
          "fileExtension": "png",
          "responseTypes": "JSON",
          "jsonOptions": "dc,kvp"
        },
        {
          "mimetype": "application/msword",
          "fileExtension": "doc",
          "responseTypes": "JSON",
          "jsonOptions": "dc,kvp"
        },
        {
          "mimetype": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
          "fileExtension": "docx",
          "responseTypes": "JSON",
          "jsonOptions": "dc,kvp"
        }
      ]
    }

  6. Create a file called baca_mapping.json and save it to the /baca directory. The following example shows the file contents. The values are examples only, and you should update them to match your environment:

    {
      "caseSensitiveAliasNames": false,
      "caseSensitiveKeyNames": false,
      "overridePopulatedTitle": true,
      "overridePopulatedProperties": false,
      "truncateStringProperties": true,
      "dateFormats": [
        "MM/dd/yy",
        "MM/dd/yyyy",
        "yyyy/MM/dd"
      ],
      "defaultTZDBTimezoneID": "America/Los_Angeles",
      "bacaClassAliasNames": {
        "invoice": "MappedInvoice",
        "police report": "MappedIncident",
        "estimates": "MappedIncident",
        "pricing schedule": "MappedPricing"
      },
      "bacaClassDefs": {
        "MappedInvoice": {
          "cmClassName": "McInvoice",
          "cmTitlePropertyName": "DocumentTitle",
          "keyvaluePropertyNames": {
            "zip": "McZipCode",
            "address": "McAddress",
            "order no.": "McOrderNumber",
            "ship to": "McAddress",
            "invoice#": "McOrderNumber",
            "ship": "McAddress",
            "po": "McOrderNumber",
            "date": "McCurrentDates",
            "invoice date": "McDate",
            "due date": "McDate",
            "phone": "McPhoneNumbers",
            "account": "McAccounts"
          }
        },
        "MappedIncident": {
          "cmClassName": "McIncident",
          "cmTitlePropertyName": "DocumentTitle",
          "keyvaluePropertyNames": {
            "date": "McDate",
            "report date": "McDate",
            "report type": "McIncident",
            "damage description": "McIncident",
            "estimate total": "McAmount",
            "instructions": "McInstructions"
          }
        },
        "MappedPricing": {
          "cmClassName": "McPricing",
          "cmTitlePropertyName": "DocumentTitle",
          "keyvaluePropertyNames": {
            "[date]": "McDate",
            "[property address]": "McAddress"
          }
        }
      }
    }

  7. Use a command like the following one to deploy the integration to the object store that you designate:

    java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd deploy -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -apikey 33553355-4770-4ea5-bf68-9d032820fb54 -fidpassword mypwd -configfile baca_config.json -mapfile baca_mapping.json -libraries libs

    For more information on the command options, see Command reference.

Submitting documents to Business Automation Content Analyzer for processing

Checked in Content Manager documents can be submitted to Business Automation Content Analyzer by using either a sweep job, or a document check-in event subscription.

Use the Administration Console for Content Platform Engine to configure the method you choose.

To create a job sweep to submit documents to Business Automation Content Analyzer:
  1. Log in to the Administrative Console for Content Platform Engine as an administrative user, ideally the same user that deployed the integration.
  2. Expand the object store where you want to create the sweep, and expand Sweep Management > Job Sweeps.
  3. Click Custom Jobs > New.
  4. Enter a name for the sweep, for example, bacajob1.
  5. Set Sweep mode to Normal.
  6. Click Enabled, then click Next.
  7. Enter your values or the Target Class, which must be Document or a subclass of Document.
  8. For the Filter expression, select only the documents to be processed by this sweep.
  9. Enable Include Sub-classes, if this setting is appropriate for this sweep.
  10. Enable Record Failures.
  11. For the Sweep Action value, select BACA Job Sweep Action.
  12. Click Next, then Finish to create the sweep job. 
To create a check-in event subscription to submit documents to Business Automation Content Analyzer:
  1. Log in to the Administrative Console for Content Platform Engine as an administrative user, ideally the same user that deployed the integration.
  2. Expand the object store where you want to create the sweep, and expand Data Design > Classes.
  3. Select Document or a subclass of Document, and under Actions, select New subscription...
  4. Enter a Display Name and Description for the subscription, for example, BACA Checkin Action, and click Next.
  5. Select a value for the Scope of the subscription, and click Next.
  6. Click Checkin Event, and click Next.
  7. For Event action, select BACA Document Event Handler, and click Next.
  8. Check Enable this subscription.
  9. Check Run synchronously.
  10. Optionally enable Include sub-classes.
  11. Click Next, then click Finish to create the subscription.
 

Configuration and mapping reference

Configuration file details

Note that the items in bold must be updated. All non-bold items can optionally be updated, but the default values that are provided in the sample are typically adequate for initial configuration.
Configuration File Parameters
Field Description
url URL of the Business Automation Content Analyzer service, note that this must use HTTPS.
certificateValidationEnabled

If false, the SSL certificate is not validated (secure sockets will still be used, but host name validation will not be performed).

connectionTimeout Timeout value in milliseconds for a connection to Business Automation Content Analyzer.
readTimeout Timeout value in milliseconds for a Business Automation Content Analyzer read operation.
writeTimeout Timeout value in milliseconds for a Business Automation Content Analyzer write operation.
functionalId The Business Automation Content Analyzer functional Id that will be used to access Business Automation Content Analyzer.
maximumJsonPages Maximum number of JSON pages to request from Business Automation Content Analyzer.
maximumContentSize Maximum size of the content to be sent to Business Automation Content Analyzer. If the content size exceeds the maximum, the Document will not be sent to Business Automation Content Analyzer for processing.
deferralSeconds The number of seconds the request sweep waits to poll Business Automation Content Analyzer to determine the state of a request.
maximumDeferrals The number of times the request sweep waits for Business Automation Content Analyzer to complete a request (deferralSeconds * maximumDeferrals = maximum time the request sweep waits for Business Automation Content Analyzer to complete a request).
supportLogEnabled If true, changes made to documents are recorded in the BACASupportLog class.  Note that the class is not created by default, and must be created through a separate deployment step.
tempDirectory The temporary directory where content is stored before being sent to Business Automation Content Analyzer (content is always copied to the temporary directory before being sent to Business Automation Content Analyzer, and is deleted once Business Automation Content Analyzer has received the content).  Note that normally this can be omitted since the default value (/opt/ibm/textext) should be sufficient in most cases.
disableBacaFileDeletion Normally the Business Automation Content Analyzer file is deleted when the Content Platform Engine analyzer document is deleted, if this flag is set to true, the Business Automation Content Analyzer file is not deleted (will be cleaned up automatically by Business Automation Content Analyzer).
details (below) Detailed information for each supported mimetype.
Details
These fields are repeated for each supported mime type.
Mime Type Details
Field Description
mimetype A mimetype supported by Business Automation Content Analyzer.
fileExtension The proper file extension for the mimetype (only very specific file extensions are supported by Business Automation Content Analyzer).
responseTypes Type of content response types requested from Business Automation Content Analyzer, only JSON is currently supported by the Business Automation Content Analyzer/Content Platform Engine integration.
jsonOptions Options to be applied by Business Automation Content Analyzer when producing the JSON output (see Business Automation Content Analyzer documentation for details).

Mapping file parameters

You update the mapping file with values for your object store and environment details.
Mapping File Parameters
Field Description
caseSensitiveAliasNames If true, class alias names are treated as case sensitive, if false, alias names are case insensitive.
caseSensitiveKeyNames If true, key names (of key value pairs) are treated as case sensitive, if false, key names are case insensitive.
overridePopulatedTitle If true, a populated mapped title property will be overridden if a value is supplied by Business Automation Content Analyzer, if false, a populated title value will not be overridden.
overridePopulatedProperties If true, populated mapped properties will be overridden if a value is supplied by Business Automation Content Analyzer, if false, a populated property will not be overridden.
truncateStringProperties If true, a Business Automation Content Analyzer string that exceeds the maximum length of the Content Platform Engine property will be truncated to the maximum length of the Content Platform Engine property.   If false, the Content Platform Engine property will not be altered if the Business Automation Content Analyzer string length exceeds the Content Platform Engine property maximum size.
dateFormats List of date formats for converting Business Automation Content Analyzer values to Content Platform Engine date-time properties, each format must follow the JAVA SimpleDateFormat rules.  Note that the formats are applied in the order listed until one of the formats produces a valid date.
defaultTZDBTimezoneID If a date-time supplied by Business Automation Content Analyzer lacks a time zone indicator, the time zone of the Content Platform Engine server is used by default, this can be overridden by supplying a value from the timezone database. Note that the show.timezones command (see the Command reference information) can be used to list the timezones in the timezone database.
bacaClassAliasNames (further explained in later table) List of Business Automation Content Analyzer classification names and corresponding mapping name.
bacaClassDefs (further explained in later table) Details for each Business Automation Content Analyzer to Content Platform Engine class mapping item.
bacaClassAliasNames list
This is a mapping of the Business Automation Content Analyzer classification name to an alias name contained in the mapping.  Each alias name is described by an entry in the bacaClassDefs list.
Classification Name to Alias Name
Field Description
BACA classification name bacaClassDefs mapping alias name
bacaClassDefs list
 
Each entry in bacaClassDefs is an alias named listed in bacaClassAliasNames.
Class Definition to Alias
Field Description
cmClassName The Content Manager class name (symbolic name).
cmTitlePropertyName The Content Manager property name (symbolic name) of the title property of the class (for example, 'DocumentTitle').
keyvaluePropertyNames (below) List of Business Automation Content Analyzer field names extracted from the key value pair data, and the corresponding Content Manager property name.
keyvaluePropertyNames list
Key Value to Property
Field Description
Business Automation Content Analyzer field name (from key value pair table) Content Manager property name.
 

Command reference

Commands and parameters

The following JAVA commands are supported for use with the Business Automation Content Analyzer integration:

Supported Commands (-cmd command)
Command (-cmd) Description Required Parameters
deploy Deploy the Content Platform Engine/Business Automation Content Analyzerintegration to the specified Object Store. 
-user
-url
-objectstore
-apikey
-fidpassword
-configfile
-mapfile
-libraries
redeploy Redeploy the code module (software upgrade).  Deletes and recreates sweep actions, sweeps, custom metadata - does not alter configuration or mapping.  Before running this command, replace the Business Automation Content Analyzer/Content Platform Engine JAR in the libs directory with the new version of the JAR.  Note that due to caching in the Content Platform Engine, the Content Platform Engine must be restarted to begin using the new code module.
-user
-url
-objectstore
-libraries
update.apikey Update the Business Automation Content Analyzer API Key (configuration object).
-user
-url
-objectstore
-apikey
update.fidpassword Update the Business Automation Content Analyzer Functional Id password (configuration object).
-user
-url
-objectstore
-fidpassword
update.configuration Update the Business Automation Content Analyzer configuration object, replace the object JSON with the supplied JSON (does not update API Key or Functional Id password).
-user
-url
-objectstore
-configfile
update.map Update the mapping data of the Business Automation Content Analyzer configuration object.  This is used to replace the mapping after editing the JSON in the mapping file.
-user
-url
-objectstore
-mapfile
show.timezones Show the list of timezone names from the timezone database (see mapping parameter defaultTZDBTimezoneID). (none)
create.supportlog Create the BACASupportLog class and realted properties (noramlly only created by IBM support team).
-user
-url
-objectstore
 

The following parameters apply for the supported commands:

Command Parameters
Parameter Description
-user Content Manager administrative user name.
-url FileNet Content Manager URL for WSI client access.
-objectstore Symbolic name of the Object Store where the integration will be deployed.
-apikey Business Automation Content Analyzer API key generated by the user listed in the configuration file as functionalId.
-fidpassword Password of the Business Automation Content Analyzer functionalId.
-configfile Full path to the configuration file.
-mapfile Full path to the mapping file
-libraries Full path to the directory where the dependent libraries are located (see the previous Dependent libraries topic).

Command examples

The following JAVA commands show usage examples for interacting with the Business Automation Content Analyzer integration:

Deploy

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd deploy -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -apikey 33553355-4770-4ea5-bf68-9d032820fb54 -fidpassword mypwd -configfile baca_config.json -mapfile baca_mapping.json -libraries libs

 

Redeploy

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd redeploy -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -libraries libs

Update Business Automation Content Analyzer API Key

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd update.apikey -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -apikey 33553355-4770-4ea5-bf68-9d032820fb54

Update Functional Id password

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd update.fidpassword -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -fidpassword mypwd 

Update configuration object

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd update.configuration -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -configfile baca_config.json -verbose

Update mapping object

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd update.map -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore -mapfile baca_mapping.json

Create support log

 

java -cp libs/baca-integration-5.5.3-0-0.jar:* com.ibm.internal.cpe.baca.CmdLine -cmd create.supportlog -user P8Admin -url http://<CPE-hostname>:9080/wsi/FNCEWS40MTOM -objectstore P8ObjectStore

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSNVNV","label":"FileNet Content Manager"},"Component":"","Platform":[{"code":"PF040","label":"RedHat OpenShift"}],"Version":"5.5.3","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
06 August 2019

UID

ibm10886811