FindPatterns

Finds patterns in the document or page.

Restriction: This action does not support regular expressions containing a line break nor tab expressions.

Syntax

bool FindPatterns (string patternsFilePath)

Parameters

patternsFilePath
Path to XML file containing patterns to match.

Returns

True.

Level

Page level

Details

Analyzes all blocks of text to determine if addresses, dates, or custom expressions are present. Regular expressions are stored in an XML file as a list of patterns with the properties listed at the end of this topic. Each pattern must have a unique id attribute. The pattern type attribute is the DCO field to populate. The pattern value is the regular expression.

FindPatterns requires a previously created layout file (for example: tm000001_layout.xml) where text is grouped into blocks. See DocumentAnalytics actions for information on the layout XML file.

Example

Recognize()

FindPatterns("@APPVAR(values/gen/patternsPath)")

Format of a patter in the pattern XML file

<Pattern id="uniqueString" type="documentHierarchyFieldType" enabled="true">
   regularExpression
</Pattern>

Example of a pattern XML file

<Patterns>
<Pattern id="addressPattern1" type="us_address" enabled="true">
   (\d{1,5}.{1,16}(Alley|Avenue|(Ave\.?)|(Bvd\.?)|Blvd|Boulevard|Circle|(Cir\.?)
   |Street|(St\.?)|([P]\.?\s*?[O]\.?\s*?Box)|Drive|(Dr\.?)|(Cres\.?)|Crescent|Court|(Ct\.?)
   |Way|(Tr\.?)|Terrace|Trail|(Rd\.?)|Road|Lane|Highway|(Hwy\.?)|(Apt\.?)|(Pl\.?)|Place).*?
   ((?:(A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|P[AR]
   |RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY]))|(Alabama|Alaska|Arizona|Arkansas|California|Colorado
   |Connecticut|Delaware|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky
   |Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana
   |Nebraska|Nevada|New\s?(Hampshire|Jersey|Mexico|York)|North\s?(Carolina|Dakota)|Ohio|Oklahoma
   |Oregon|Pennsylvania|Rhode\s?Island|South\s?(Carolina|Dakota)|Tennessee|Texas|Utah|Vermont
   |Virginia|Washington|West\s?Virginia|Wisconsin|Wyoming)|(ALABAMA|ALASKA|ARIZONA|ARKANSAS
   |CALIFORNIA|COLORADO|CONNECTICUT|DELAWARE|FLORIDA|GEORGIA|HAWAII|IDAHO|ILLINOIS|INDIANA|IOWA
   |KANSAS|KENTUCKY|LOUISIANA|MAINE|MARYLAND|MASSACHUSETTS|MICHIGAN|MINNESOTA|MISSISSIPPI|MISSOURI
   |MONTANA|NEBRASKA|NEVADA|NEW\s?(HAMPSHIRE|JERSEY|MEXICO|YORK)|NORTH\s?(CAROLINA|DAKOTA)|OHIO
   |OKLAHOMA|OREGON|PENNSYLVANIA|RHODE\s?ISLAND|SOUTH\s?(CAROLINA| DAKOTA)|TENNESSEE|TEXAS|UTAH
   |VERMONT|VIRGINIA|WASHINGTON|WEST\s?VIRGINIA|WISCONSIN|WYOMING))\s*\d{5}((\-|\s*)\d{4})?)
</Pattern>
<Pattern id="datePattern1" type="date" enabled="true">
   (((?:J(anuary|u(ne|ly))|February|Ma(rch|y)|A(pril|ugust)|(((Sept|Nov|Dec)em)|Octo)ber)
   |(Jan|Feb|Mar|Apr|May|Aug|Sep|Sept|Oct|Nov|Dec))(\s*|\-)\d{1,2}\,?(\s*|\-)\d{4})
   |(\d{2}\/\d{2}\/\d{4})|(\d{2}th\s*((?:J(anuary|u(ne|ly))|February|Ma(rch|y)|A(pril|ugust)
   |(((Sept|Nov|Dec)em)|Octo)ber)|(Jan|Feb|Mar|Apr|May|Aug|Sep|Sept|Oct|Nov|Dec))[\s*\,]\d{4})
</Pattern>
</Patterns>