Extract Named Entities

Extracts named entities from plain text.

Command availability: IBM RPA SaaS and IBM RPA on premises

Description

Extracts named entities from plain text. You can set the type and language used to identify the named entity.

Named entities are real-world objects identified by their proper names. Named entities refer to a person, location, organization, product, or a moment in time for example.

The results that this command returns might vary according to the selected NLP provider. For more information, see Setting an NLP provider.

Script syntax

IBM RPA's proprietary script language has a syntax similar to other programming languages. The script syntax defines the command's syntax in the script file. You can work with this syntax in IBM RPA Studio's Script mode.

extractNamedEntities --entities(NaturalLanguageEntities) [--culture(String)] --text(String) (List<String>)=values (String)=first (DataTable)=valuesmapping (Boolean)=success

Input parameters

The following table displays the list of input parameters available in this command. In the table, you can see the parameter name when working in IBM RPA Studio's Script mode and its Designer mode equivalent label.

Designer mode label Script mode name Required Accepted variable types Description
Type of entity entities Required NaturalLanguageEntities Type of entity that is extracted from the text.

See the entities parameter options for supported types of entities.
Language culture Optional Text, Culture Language used to extract the named entities.

If you define the language manually, make sure to enter only supported language codes. See the culture parameter options for supported languages.
Text text Required Text Text that contains the named entities.

entities parameter options

The following table displays the options available for the entities input parameter. The table shows the options available when working in Script mode and the equivalent label in the Designer mode.

Designer mode label Script mode name Supported provider Entity type number Description
All All Legacy and Watson NLP
Any named entity.
Organization Organization Legacy and Watson NLP 2 The name of a company or formally organized group.
Person Person Legacy and Watson NLP 4 Any being; living, nonliving, fictional or real.
Numeric (Legacy provider) Numeric Legacy 8 Monetary or any other type of numeric values.
Time Time Legacy and Watson NLP 16 Any reference to a specific time.
Place Place Legacy 32 A location's name.
Art Production (Legacy provider) ArtProd Legacy 64 A work of art type of named entity.
Event (Legacy provider) Event Legacy 128 A cultural, sport, or any other type of event.
Thing (Legacy provider) Thing Legacy 256 A product's name.
Abstract (Legacy provider) Abstract Legacy 512 An abstract named entity.
Duration (IBM Watson NLP provider) Duration Watson NLP 1024 Any amount of time. Not a specific point in time.
Facility (IBM Watson NLP provider) Facility Watson NLP 2048 Specific, named man-made structures, buildings.
Geographic Feature (IBM Watson NLP provider) GeographicFeature Watson NLP 4096 A geographic feature, such as the Grand Canyon, Niagara Falls, or Mount Everest, for example.
Job Title (IBM Watson NLP provider) JobTitle Watson NLP 8192 The name of a profession or an occupation.
Location Location Legacy and Watson NLP 16384 All geo-political regions, continents, countries, and street names/states/provinces/cities/towns/islands.
Measure (IBM Watson NLP provider) Measure Watson NLP 32768 Measured amounts, including vague or implied amounts. Not including money.
Ordinal (IBM Watson NLP provider) Ordinal Watson NLP 65536 Numbers that identify order, rank or position.
Hashtag (IBM Watson NLP provider) Hashtag Watson NLP 131072 Identifies a text preceded by a hashtag.
IP Address (IBM Watson NLP provider) IPAddress Watson NLP 262144 IPv4 and IPv6 addresses.
Percent (IBM Watson NLP provider) Percent Watson NLP 524288 Any percent symbol (%) amount or words identifying percentages.
Twitter Handle (IBM Watson NLP provider) TwitterHandle Watson NLP 1048576 Identifies a Twitter account's username in the text.
URL (IBM Watson NLP provider) URL Watson NLP 2097152 Identifies a link in the text.
Number (IBM Watson NLP provider) Number Watson NLP 4194304 Whole numbers, decimals and fractions, not including the item being counted.

culture parameter options

The following table displays the options available for the culture input parameter. The table shows the options available when working in Script mode and the equivalent label in the Designer mode.

Designer mode label Script mode name Description Supported provider
ar ar Arabic Watson NLP
zh-CN zh-CN Chinese (Simplified) Watson NLP
zh-TW zh-TW Chinese (Traditional) Watson NLP
cs cs Czech Watson NLP
da da Danish Watson NLP
nl nl Dutch Watson NLP
de-DE de-DE German Watson NLP
en-US en-US English Legacy and Watson NLP
fi fi Finnish Watson NLP
fr-FR fr-FR French (France) Watson NLP
fr-CA fr-CA French (Canada) Watson NLP
he he Hebrew Watson NLP
it-IT it-IT Italian Watson NLP
ja-JP ja-JP Japanese Watson NLP
ko-KR ko-KR Korean Watson NLP
nb nb Norwegian Bokmal Watson NLP
nn nn Norwegian Nynorsk Watson NLP
pt-BR pt-BR Portuguese (Brazil) Legacy and Watson NLP
pt-PT pt-PT Portuguese (Portugal) Legacy and Watson NLP
pl pl Polish Watson NLP
ro ro Romanian Watson NLP
ru-RU ru-RU Russian Watson NLP
sk sk Slovak Watson NLP
es-ES es-ES Spanish Watson NLP
sv sv Swedish Watson NLP
tr tr Turkish Watson NLP

Output parameters

Designer mode label Script mode name Accepted variable types Description
Values values List<Text> List of named entities extracted from text defined in the Text parameter.
First value first Text First entity name extracted from the text defined in the Text parameter.
Values mapping valuesmapping Data Table Returns a data table with the extracted entities and their mapped information. See valuesmapping parameter options for details.
Success success Boolean Returns True if the named entity was successfully extracted, otherwise returns False.

valuesmapping parameter options

This parameter returns a data table with details of the extracted values.

The following list shows the mapped values:

  • The index value in the list of extracted named entities. Index values start at 1.
  • Position in number of characters where the value first appears in the text.
  • Length of the named entity.
  • Extracted part of the text that contains the named entity.
  • Entity type number. See the Type number column in the entities parameter options for supported type numbers.

Example

Example 1: The following example uses the Legacy NLP provider to extract the names of famous authors and books from the text.

defVar --name entities --type String
defVar --name entitiesList --type List --innertype String
defVar --name result --type Boolean
defVar --name firstEntity --type String
defVar --name valueMappings --type DataTable
setVar --name "${entities}" --value "My favorite authors are Sir Arthur Conan Doyle, and Agatha Christie. I also like some books from Edgar Allan Poe, such as Eureka. But my favorite book of all time is Don Quixote, from Miguel de Cervantes."
setNlpProvider --provider "Legacy"
extractNamedEntities --entities "Person" --culture "en-US" --text "${entities}" entitiesList=values firstEntity=first valueMappings=valuesmapping result=success
logMessage --message "Named entities list: ${entitiesList}\r\nFirst entity found: ${firstEntity}\r\nMappings: ${valueMappings}\r\nResult: ${result}" --type "Info"
// Output:
// Named entities list: [Arthur Conan Doyle,Agatha Christie,Edgar Allan Poe,Don Quixote,Miguel de Cervantes]
// First entity found: Arthur Conan Doyle
// Mappings: 1, 28, 18, Arthur Conan Doyle, 4
// 2, 52, 15, Agatha Christie, 4
// 3, 97, 15, Edgar Allan Poe, 4
// 4, 166, 11, Don Quixote, 4
// 5, 184, 19, Miguel de Cervantes, 4

Example 2: The following example uses the IBM Watson NLP provider to extract names of locations and units of measurement from the text.

defVar --name entities --type String
defVar --name entitiesList --type List --innertype String
defVar --name result --type Boolean
defVar --name firstEntity --type String
defVar --name valueMappings --type DataTable
defVar --name farenheitValues --type List --innertype String
defVar --name farenheitFirstValue --type String
defVar --name farenheitMappings --type DataTable
setVar --name "${entities}" --value "Mike and Jenny went to New York last season. They said it was freezing cold there, but it couldn\'t be colder than Helsinki. Edward told me that Berlin was below 14 degrees Farenheit, and I couldn\'t believe him."
setNlpProvider --provider "Watson"
extractNamedEntities --entities "Place" --culture "en-US" --text "${entities}" entitiesList=values firstEntity=first valueMappings=valuesmapping result=success
extractNamedEntities --entities "Measure" --culture "en-US" --text "${entities}" farenheitValues=values farenheitFirstValue=first farenheitMappings=valuesmapping
logMessage --message "Named entities list: ${entitiesList}\r\nFirst entity found: ${firstEntity}\r\nMappings: ${valueMappings}\r\nResult: ${result}" --type "Info"

Limitations

Some entities are not supported in some languages by the Legacy provider. See the Legacy provider limitations below:

The following named entities are supported in en-US:

  • Person

The following named entities are not supported in pt-BR and pt-PT:

  • Event
  • Abstract