IBM Datacap integration

IBM® Datacap integration with IBM Business Automation Content Analyzer helps you for extracting business insight from documents.

IBM® Datacap is a complete solution for data and documents, which helps you to streamline the capture, recognition and classification of business documents and extract important information. The Datacap Insight Edition helps you to use IBM Business Automation Content Analyzer process documents that include OCR, document classification, headers, and KVP field data extraction. The actions libraries in Datacap assist to call Content Analyzer.

Content Analyzer actions are embedded in Datacap. The actions provide you an easy way to connect to the IBM Business Automation Content Analyzer API services to digitize, classify, and extract unstructured documented content that use OCR and PDF text extraction. It can also enable Watson™ and other AI technologies to reveal business insight from documents.

The following functional enhancements are included in this integration:

Recognition during peak volume can be offloaded to Content Analyzer, which can scale up as needed, reducing the burden on the machines, which are running other Datacap Rulerunner tasks.
Content Analyzer can be used as an extra alternative for page or document classification within Datacap.
Extract key-value pairs from paragraphs and tables in the document.
After recognizing the text on the page, you can populate a field, or a variable based on the extract targeted data saved as key-value pairs.
A page or a document can be recognized and a searchable PDF file or a text file can be created in a single step.
Get a complete picture of the data held in the documents by using IBM AI algorithms, and save the output in a JSON file.

These functions can be applied to different document types such as PDF, TIFF, TIF, JPG, JPEG, PNG, DOC, and DOCX.

Content Analyzer libraries are added in Datacap including the following functions:

ContentAnalyzerConfiguration: This action sets the URL and credentials that is used to connect to the Content Analyzer API services.
ContentAnalyzerSubmitRequest: This action uses the previously supplied credentials to connect to the Content Analyzer API services to upload the files for the processing.
ContentAnalyzerGetOutput: This action uses the previously supplied credentials to connect to the Content Analyzer API services to download the processing results from the previous ContentAnalyzerSubmitRequest action.
ContentAnalyzerRetry: This action customizes the number of retries and the time to wait between each communication attempt when the Content Analyzer actions failed.
ContentAnalyzerCreateLayout: This action converts the JSON output file that is generated by Content Analyzer to a layout XML file. OCR results and key-value-pair results are included in the layout file.
ContentAnalyzerSetTimeout: This action sets the number of seconds to wait before it is determined that a Content Analyzer related call action is no longer running properly.
ContentAnalyzerFindExtractedText: This action populates a field or variable by using the key-value pairs results saved in the Layout xml file that is generated by the ContentAnalyzerCreateLayout action.
ContentAnalyzerSetProxy: This action allows optional specification of proxy service settings.

For more information, see IBM Datacap Knowledge Center.