Datacap application development
This tutorial introduces you to the concepts and tasks that help you to develop your Datacap applications. Throughout the tutorial, you develop an application to process travel documents.
- Business Requirements and Application Architecture
The first step in developing any Datacap application is to define the business requirements. - Datacap Studio
Datacap Studio is the Datacap application development environment that provides the tools that you need to develop and test your application. - Document hierarchy
Document hierarchy defines the structure of the documents that you are processing and how Datacap processes each element within the structure. Document hierarchy is also referred to as the Setup DCO. - The Datacap workflow
During the data capture process, documents go through a workflow that consists of several tasks, including page identification, character recognition, field validation, verification, and export. Some tasks require operator intervention, while other tasks run automatically. - Document input
Datacap works primarily with TIFF image files. So, the first activity in any Datacap workflow is to convert the documents to TIFF format and insert the documents into an input repository. - Page Identification
Page identification is one of the first steps in any Datacap application. All incoming pages are initially assigned the default page type Other. Before Datacap can assemble those pages into documents and extract data from the pages, it must determine the correct type for each page. - Rule Execution
Rule execution refers to how rules are associated with specific objects in the document hierarchy and how Datacap processes a batch of documents. - Document assembly
Datacap identifies incoming pages and assigns the correct page type by using fingerprint matching or one of the other identification methods. The next step assembles the batch of individual pages into documents according to the rules that are defined within the document hierarchy. - Data recognition
Data recognition is the stage during which you locate the fields that you want to capture and then convert the fields into character-based data. - Data Validation
Data validation determines whether captured data complies with the data integrity rules that are defined in your business requirements. - Data verification
During verification, Datacap displays pages to an operator for manual checking and possible correction. - Data export
Datacap can export data to a text file, an XML file, a database, a Document Management system, or a custom business process. The default output format is a text file, but you can use some actions to export data to a database and an XML file. - Application Debugging
Application debugging requires that you review two runtime log files, which are the Rulerunner Service (RRS) log and the task log. The RRS log provides detailed information about each action and is most helpful to application developers. The task log documents internal calls and is used mostly by IBM software support. - Handling line item grids
The techniques that you implemented rely upon data at predictable locations on the page. When you receive an invoice, you do not know how many items the invoice might contain. There might be just one item, or there might be a hundred items, possibly spanning multiple pages. Datacap includes actions to handle line items grids. You define the region on the page that might contain line items and define the structure of one line item. Datacap can then scan the region and locate all of the individual line items. - Smart parameters
Smart parameters are action arguments that get evaluated at run time. - Text matching
You can add flexibility to your applications by using text matching to identify pages and locate data. - Pattern Matching
You can use Datacap pattern matching to identify pages and adjust misaligned or distorted images. - Workflow automation, routing, and automatic fingerprint generation
You can configure Rulerunner to monitor the job queue and run background tasks like PageID, Profiler, and Export automatically whenever batches are pending. - Datacap Web Client and remote scanning
You can now update your application by using Datacap Web Client Administrator and run a batch through the entire workflow by using a combination of web components and Rulerunner. - Filter batches by group in the Job Monitor (Datacap Web Client)
In the Datacap Web Client, you can filter batches by groups in the Job Monitor based on your ADSI, LDAP, or LLLDAP group authentication. - Fingerprint Management
Fingerprints are used both for page identification and for specifying recognition zones. The following topics review basic fingerprint functions, provide more details about the fingerprint database, and examine an alternative method for storing zone position information with fingerprint XML (FPXML) files. Later, you can update the TravelDocs application to use FPXML. - Configuring content classification for XML layout block parsing
Some XML configuration file changes might be needed for IBM® Content Classification to properly parse the text blocks sent to it by the RunDecisionPlanForBlocks action. - Application translation
You can translate the text in Datacap applications that is displayed in Datacap clients: Datacap Desktop, FastDoc (Job Monitor only), and Datacap Navigator. The following text can be translated: workflow names, job names, task names, shortcuts, descriptions, field names, document types, page types, and validation error messages.
Related tasks: