ExtractText
Finds entities such as names and addresses in the text by using text analytics. The results are saved and can then be used by subsequent actions, such as FindExtractedText.
Syntax
bool ExtractText (string extractors)
Parameters
- extractors
- Smart parameter for a comma-separated list of extractors to process.
Returns
True.Level
Document or Page levelDetails
Finds entities such as names and addresses in the text by using text analytics. The results are saved and can then be used by subsequent actions, such as FindExtractedText. To extract from non-English text, set the page variable hr_locale to the desired language before calling this action. For example, for Japanese call rrset("ja","@P.hr_locale").
The entities that are found are determined by AQL extractors. An initial set of pre-built extractors are provided and while they work in many instances they may not work in every case. See the IBM BigInsights documentation for pre-built extractors. Additional extractors may be created using the IBM BigInsights tools for creating AQL extractors.
Extractors are saved in compiled files with the extension tam. All tam, dictionary, and table files present in the \rrs\aql folder will be loaded. The extractors provided by Datacap are exposed in DatacapPreBuilt_BasicFeatures.tam. You can add or remove tam files to the \rrs\aql folder to control if they are executed or not.
ExtractText requires a previously created layout file (for example: tm000001_layout.xml) where text is grouped into blocks. See DocumentAnalytics actions for information on the layout XML file.
Example
The following example populates the city field with the first instance of an address where the state is California.
ExtractText(DateTime.DateTime,Address.Address)
FindExtractedText(@P\City,First,Address.Address,city,stateorprovince,(California)|(CA))
Support for external dictionaries
The ExtractText action supports AQL external dictionary. Using this feature, you can write annotators that do not need to be recompiled when a change is needed.
You can export from the IBM® InfoSphere® BigInsights web tools and place the exported folders into rrs\aql\src location. The AQL is compiled at the run time.
It is recommended that you keep a back up of RRS folder, in case a file or folder gets corrupted at the time of copying or due to misconfiguration.
Detailed steps to configure Custom annotators in Datacap:
Complete the following steps to configure Custom annotators in Datacap.
- Once you create Custom Extractor using BigInsights Web tool, export the extractor as "Executables" with an option of including "Source files" . Export in a zip format.
- Copy TAMs file from export to the \rrs\aql folder. Do NOT copy the InputDocumentProcessor.TAM file. Leave the original in rrs\aql folder itself.
- Copy the SRC folder from Export (one from exported zip) folder to rrs\aql.
- Make sure to copy all the supported *.DICT files provided by BigInsights in rrs\aql folder.
Custom extractors must be called with ExtractText action in the format Module.Viewname.
Verify the modulename from corresponding aql file.
For e.g. ZIPCODE_BasicFeatures.ZCView
After the compilation process completes, the compiled TAM files are saved in \rrs\aql location. Ensure that you remove the rrs\aql\src folders after the compilation process.
List of pre-built extractors
The following Datacap extractor names consist of the following two parts, separated by a period: the InfoSphere BigInsights extractor name followed by the InfoSphere BigInsights attribute name.
See the IBM InfoSphere BigInsights documentation for more information about pre-built extractors.Address.Address
City.City
Continent.Continent
Country.Country
Date.Dates
DateTime.DateTime
EmailAddress.EmailAddress
Facility.Facility
FinancialAnnouncements.CompanyEarningsAnnouncement
FinancialAnnouncements.AnalystEarningsEstimate
FinancialAnnouncements.CompanyEarningsGuidance
FinancialEvents.Alliance
FinancialEvents.Acquisition
FinancialEvents.JointVenture
FinancialEvents.Merger
Location.Location
NotesEmailAddress.NotesEmailAddress
Organization.Organization
Person.Person
PhoneNumber.PhoneNumber
StateOrProvince.StateOrProvince
URL.URL WaterBody.WaterBody
ZipCode.ZipCode
BigInsightsChineseNER.PersonChinese;
BigInsightsChineseNER.LocationChinese;
BigInsightsChineseNER.OrganizationChinese;
BigInsightsJapaneseNER.PersonJapanese;
BigInsightsJapaneseNER.LocationJapanese;
BigInsightsJapaneseNER.OrganizationJapanese;