Extract Named Entities
Extracts named entities from plain text.
Command availability: IBM RPA SaaS and IBM RPA on premises
Description
Extracts named entities from plain text. You can set the type and language used to identify the named entity.
Named entities are real-world objects identified by their proper names. Named entities refer to a person, location, organization, product, or a moment in time for example.
The results that this command returns might vary according to the selected NLP provider. For more information, see Setting an NLP provider.
Script syntax
IBM RPA's proprietary script language has a syntax similar to other programming languages. The script syntax defines the command's syntax in the script file. You can work with this syntax in IBM RPA Studio's Script mode.
extractNamedEntities --entities(NaturalLanguageEntities) [--culture(String)] --text(String) (List<String>)=values (String)=first (DataTable)=valuesmapping (Boolean)=success
Input parameters
The following table displays the list of input parameters available in this command. In the table, you can see the parameter name when working in IBM RPA Studio's Script mode and its Designer mode equivalent label.
Designer mode label | Script mode name | Required | Accepted variable types | Description |
---|---|---|---|---|
Type of entity | entities |
Required |
NaturalLanguageEntities |
Type of entity that is extracted from the text. See the entities parameter options for supported types of entities. |
Language | culture |
Optional |
Text , Culture |
Language used to extract the named entities. If you define the language manually, make sure to enter only supported language codes. See the culture parameter options for supported
languages. |
Text | text |
Required |
Text |
Text that contains the named entities. |
entities
parameter options
The following table displays the options available for the entities
input parameter. The table shows the options available when working in Script mode and the equivalent label in the Designer mode.
Designer mode label | Script mode name | Supported provider | Entity type number | Description |
---|---|---|---|---|
All | All |
Legacy and Watson NLP |
|
Any named entity. |
Organization | Organization |
Legacy and Watson NLP | 2 | The name of a company or formally organized group. |
Person | Person |
Legacy and Watson NLP | 4 | Any being; living, nonliving, fictional or real. |
Numeric (Legacy provider) | Numeric |
Legacy | 8 | Monetary or any other type of numeric values. |
Time | Time |
Legacy and Watson NLP | 16 | Any reference to a specific time. |
Place | Place |
Legacy | 32 | A location's name. |
Art Production (Legacy provider) | ArtProd |
Legacy | 64 | A work of art type of named entity. |
Event (Legacy provider) | Event |
Legacy | 128 | A cultural, sport, or any other type of event. |
Thing (Legacy provider) | Thing |
Legacy | 256 | A product's name. |
Abstract (Legacy provider) | Abstract |
Legacy | 512 | An abstract named entity. |
Duration (IBM Watson NLP provider) | Duration |
Watson NLP | 1024 | Any amount of time. Not a specific point in time. |
Facility (IBM Watson NLP provider) | Facility |
Watson NLP | 2048 | Specific, named man-made structures, buildings. |
Geographic Feature (IBM Watson NLP provider) | GeographicFeature |
Watson NLP | 4096 | A geographic feature, such as the Grand Canyon, Niagara Falls, or Mount Everest, for example. |
Job Title (IBM Watson NLP provider) | JobTitle |
Watson NLP | 8192 | The name of a profession or an occupation. |
Location | Location |
Legacy and Watson NLP | 16384 | All geo-political regions, continents, countries, and street names/states/provinces/cities/towns/islands. |
Measure (IBM Watson NLP provider) | Measure |
Watson NLP | 32768 | Measured amounts, including vague or implied amounts. Not including money. |
Ordinal (IBM Watson NLP provider) | Ordinal |
Watson NLP | 65536 | Numbers that identify order, rank or position. |
Hashtag (IBM Watson NLP provider) | Hashtag |
Watson NLP | 131072 | Identifies a text preceded by a hashtag. |
IP Address (IBM Watson NLP provider) | IPAddress |
Watson NLP | 262144 | IPv4 and IPv6 addresses. |
Percent (IBM Watson NLP provider) | Percent |
Watson NLP | 524288 | Any percent symbol (%) amount or words identifying percentages. |
Twitter Handle (IBM Watson NLP provider) | TwitterHandle |
Watson NLP | 1048576 | Identifies a Twitter account's username in the text. |
URL (IBM Watson NLP provider) | URL |
Watson NLP | 2097152 | Identifies a link in the text. |
Number (IBM Watson NLP provider) | Number |
Watson NLP | 4194304 | Whole numbers, decimals and fractions, not including the item being counted. |
culture
parameter options
The following table displays the options available for the culture
input parameter. The table shows the options available when working in Script mode and the equivalent label in the Designer mode.
Designer mode label | Script mode name | Description | Supported provider |
---|---|---|---|
ar |
ar |
Arabic | Watson NLP |
zh-CN |
zh-CN |
Chinese (Simplified) | Watson NLP |
zh-TW |
zh-TW |
Chinese (Traditional) | Watson NLP |
cs |
cs |
Czech | Watson NLP |
da |
da |
Danish | Watson NLP |
nl |
nl |
Dutch | Watson NLP |
de-DE |
de-DE |
German | Watson NLP |
en-US |
en-US |
English | Legacy and Watson NLP |
fi |
fi |
Finnish | Watson NLP |
fr-FR |
fr-FR |
French (France) | Watson NLP |
fr-CA |
fr-CA |
French (Canada) | Watson NLP |
he |
he |
Hebrew | Watson NLP |
it-IT |
it-IT |
Italian | Watson NLP |
ja-JP |
ja-JP |
Japanese | Watson NLP |
ko-KR |
ko-KR |
Korean | Watson NLP |
nb |
nb |
Norwegian Bokmal | Watson NLP |
nn |
nn |
Norwegian Nynorsk | Watson NLP |
pt-BR |
pt-BR |
Portuguese (Brazil) | Legacy and Watson NLP |
pt-PT |
pt-PT |
Portuguese (Portugal) | Legacy and Watson NLP |
pl |
pl |
Polish | Watson NLP |
ro |
ro |
Romanian | Watson NLP |
ru-RU |
ru-RU |
Russian | Watson NLP |
sk |
sk |
Slovak | Watson NLP |
es-ES |
es-ES |
Spanish | Watson NLP |
sv |
sv |
Swedish | Watson NLP |
tr |
tr |
Turkish | Watson NLP |
Output parameters
Designer mode label | Script mode name | Accepted variable types | Description |
---|---|---|---|
Values | values |
List<Text> |
List of named entities extracted from text defined in the Text parameter. |
First value | first |
Text |
First entity name extracted from the text defined in the Text parameter. |
Values mapping | valuesmapping |
Data Table |
Returns a data table with the extracted entities and their mapped information. See valuesmapping parameter options for details. |
Success | success |
Boolean |
Returns True if the named entity was successfully extracted, otherwise returns False . |
valuesmapping
parameter options
This parameter returns a data table with details of the extracted values.
The following list shows the mapped values:
- The index value in the list of extracted named entities. Index values start at 1.
- Position in number of characters where the value first appears in the text.
- Length of the named entity.
- Extracted part of the text that contains the named entity.
- Entity type number. See the
Type number
column in theentities
parameter options for supported type numbers.
Example
Example 1: The following example uses the Legacy NLP provider to extract the names of famous authors and books from the text.
defVar --name entities --type String
defVar --name entitiesList --type List --innertype String
defVar --name result --type Boolean
defVar --name firstEntity --type String
defVar --name valueMappings --type DataTable
setVar --name "${entities}" --value "My favorite authors are Sir Arthur Conan Doyle, and Agatha Christie. I also like some books from Edgar Allan Poe, such as Eureka. But my favorite book of all time is Don Quixote, from Miguel de Cervantes."
setNlpProvider --provider "Legacy"
extractNamedEntities --entities "Person" --culture "en-US" --text "${entities}" entitiesList=values firstEntity=first valueMappings=valuesmapping result=success
logMessage --message "Named entities list: ${entitiesList}\r\nFirst entity found: ${firstEntity}\r\nMappings: ${valueMappings}\r\nResult: ${result}" --type "Info"
// Output:
// Named entities list: [Arthur Conan Doyle,Agatha Christie,Edgar Allan Poe,Don Quixote,Miguel de Cervantes]
// First entity found: Arthur Conan Doyle
// Mappings: 1, 28, 18, Arthur Conan Doyle, 4
// 2, 52, 15, Agatha Christie, 4
// 3, 97, 15, Edgar Allan Poe, 4
// 4, 166, 11, Don Quixote, 4
// 5, 184, 19, Miguel de Cervantes, 4
Example 2: The following example uses the IBM Watson NLP provider to extract names of locations and units of measurement from the text.
defVar --name entities --type String
defVar --name entitiesList --type List --innertype String
defVar --name result --type Boolean
defVar --name firstEntity --type String
defVar --name valueMappings --type DataTable
defVar --name farenheitValues --type List --innertype String
defVar --name farenheitFirstValue --type String
defVar --name farenheitMappings --type DataTable
setVar --name "${entities}" --value "Mike and Jenny went to New York last season. They said it was freezing cold there, but it couldn\'t be colder than Helsinki. Edward told me that Berlin was below 14 degrees Farenheit, and I couldn\'t believe him."
setNlpProvider --provider "Watson"
extractNamedEntities --entities "Place" --culture "en-US" --text "${entities}" entitiesList=values firstEntity=first valueMappings=valuesmapping result=success
extractNamedEntities --entities "Measure" --culture "en-US" --text "${entities}" farenheitValues=values farenheitFirstValue=first farenheitMappings=valuesmapping
logMessage --message "Named entities list: ${entitiesList}\r\nFirst entity found: ${firstEntity}\r\nMappings: ${valueMappings}\r\nResult: ${result}" --type "Info"
Limitations
Some entities are not supported in some languages by the Legacy
provider. See the Legacy
provider limitations below:
The following named entities are supported in en-US
:
- Person
The following named entities are not supported in pt-BR
and pt-PT
:
- Event
- Abstract