FindPatterns
Finds patterns in the document or page.
Restriction: This action does not
support regular expressions containing a line break nor tab expressions.
Syntax
bool FindPatterns (string patternsFilePath)
Parameters
- patternsFilePath
- Path to XML file containing patterns to match.
Returns
True.Level
Page levelDetails
Analyzes all blocks of text to determine if addresses, dates, or custom expressions are present. Regular expressions are stored in an XML file as a list of patterns with the properties listed at the end of this topic. Each pattern must have a unique id attribute. The pattern type attribute is the DCO field to populate. The pattern value is the regular expression.
FindPatterns requires a previously created layout file (for example: tm000001_layout.xml) where text is grouped into blocks. See DocumentAnalytics actions for information on the layout XML file.
Example
Recognize()
FindPatterns("@APPVAR(values/gen/patternsPath)")
Format of a patter in the pattern XML file
<Pattern id="uniqueString" type="documentHierarchyFieldType" enabled="true">
regularExpression
</Pattern>
Example of a pattern XML file
<Patterns>
<Pattern id="addressPattern1" type="us_address" enabled="true">
(\d{1,5}.{1,16}(Alley|Avenue|(Ave\.?)|(Bvd\.?)|Blvd|Boulevard|Circle|(Cir\.?)
|Street|(St\.?)|([P]\.?\s*?[O]\.?\s*?Box)|Drive|(Dr\.?)|(Cres\.?)|Crescent|Court|(Ct\.?)
|Way|(Tr\.?)|Terrace|Trail|(Rd\.?)|Road|Lane|Highway|(Hwy\.?)|(Apt\.?)|(Pl\.?)|Place).*?
((?:(A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|P[AR]
|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY]))|(Alabama|Alaska|Arizona|Arkansas|California|Colorado
|Connecticut|Delaware|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky
|Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana
|Nebraska|Nevada|New\s?(Hampshire|Jersey|Mexico|York)|North\s?(Carolina|Dakota)|Ohio|Oklahoma
|Oregon|Pennsylvania|Rhode\s?Island|South\s?(Carolina|Dakota)|Tennessee|Texas|Utah|Vermont
|Virginia|Washington|West\s?Virginia|Wisconsin|Wyoming)|(ALABAMA|ALASKA|ARIZONA|ARKANSAS
|CALIFORNIA|COLORADO|CONNECTICUT|DELAWARE|FLORIDA|GEORGIA|HAWAII|IDAHO|ILLINOIS|INDIANA|IOWA
|KANSAS|KENTUCKY|LOUISIANA|MAINE|MARYLAND|MASSACHUSETTS|MICHIGAN|MINNESOTA|MISSISSIPPI|MISSOURI
|MONTANA|NEBRASKA|NEVADA|NEW\s?(HAMPSHIRE|JERSEY|MEXICO|YORK)|NORTH\s?(CAROLINA|DAKOTA)|OHIO
|OKLAHOMA|OREGON|PENNSYLVANIA|RHODE\s?ISLAND|SOUTH\s?(CAROLINA| DAKOTA)|TENNESSEE|TEXAS|UTAH
|VERMONT|VIRGINIA|WASHINGTON|WEST\s?VIRGINIA|WISCONSIN|WYOMING))\s*\d{5}((\-|\s*)\d{4})?)
</Pattern>
<Pattern id="datePattern1" type="date" enabled="true">
(((?:J(anuary|u(ne|ly))|February|Ma(rch|y)|A(pril|ugust)|(((Sept|Nov|Dec)em)|Octo)ber)
|(Jan|Feb|Mar|Apr|May|Aug|Sep|Sept|Oct|Nov|Dec))(\s*|\-)\d{1,2}\,?(\s*|\-)\d{4})
|(\d{2}\/\d{2}\/\d{4})|(\d{2}th\s*((?:J(anuary|u(ne|ly))|February|Ma(rch|y)|A(pril|ugust)
|(((Sept|Nov|Dec)em)|Octo)ber)|(Jan|Feb|Mar|Apr|May|Aug|Sep|Sept|Oct|Nov|Dec))[\s*\,]\d{4})
</Pattern>
</Patterns>