Identifying unrecognized pages by using text matching

You can identify unrecognized pages by creating a ruleset with the Identify using Text Match function. You use the rule to recognize car rental pages by looking for text that is unique to that page type.

Procedure

To identify unrecognized pages by using text matching:

  1. In the Datacap Studio Rulesets pane, select the PageID ruleset and click Lock/Unlock ruleset for editing.
  2. Expand the PageID rule.
  3. Change the name of the existing function from PageID: Other Function 1 to Identify using Fingerprint. Then, change the parameter on the FindFingerprint action to False.
    Attention: Setting the parameter to False ensures that Datacap does not automatically generate a fingerprint file for unrecognized pages. If the current page does not match one of the existing fingerprints, this action fails and Datacap starts the next function, if there is one.
  4. Right-click the PageID rule and choose Add Function. Then, rename the new function to Identify using Text Match.
  5. Click the Actions library tab and add the actions that are shown in the following table to the Identify using Text Match function by clicking Add to function. Then, set the action parameters as shown.
    Library Action Parameter
    Locate RegExFind Car
    Locate RegExFind Pickup
    DCO SetPageType Rental_Agreement
    rrunner rrSet

    varSource = Text

    varTarget = @P.MatchType

    Important: The Car and Pickup parameters are tested to avoid mismatches on insurance pages. The rrSet action sets up a page variable that you must later identify which pages to process by using text matching.
    Attention: If the Identify using Fingerprint function succeeds in identifying the page, the Identify using Text Match function does not start.
  6. In the Rulesets pane, click Save. Then, click Lock/Unlock ruleset and choose Publish Ruleset.