Entity-mentions
At a glance
The entity-mentions
task encapsulates algorithms for extracting mentions of entities (person, organizations, dates) from the input text. The task offers implementations of strong entity extraction algorithms from each of three families:
rule-based, classic ML, and deep-learning.
Class definitions |
---|
watson_nlp.blocks.entity_mentions.rbr.RBR |
watson_nlp.workflows.entity_mentions.sire.SIRE |
watson_nlp.workflows.entity_mentions.transformer.Transformer |
watson_nlp.workflows.entity_mentions.bert.BERT |
watson_nlp.workflows.entity_mentions.bilstm.BiLSTM |
For language support, see Supported languages.
Pretrained models
Several pretrained models are available, for common entities such as person, organization, and dates. Model names are listed below.
Model ID | Container Image |
---|---|
BERT models | |
entity-mentions_bert-workflow_lang_multi_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bert-workflow_lang_multi_stock:1.2.1 |
BiLSTM models | |
entity-mentions_bilstm-workflow_lang_ar_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_ar_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_de_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_de_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_en_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_en_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_es_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_es_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_fr_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_fr_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_it_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_it_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_ja_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_ja_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_ko_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_ko_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_nl_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_nl_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_pt_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_pt_stock:1.2.1 |
entity-mentions_bilstm-workflow_lang_zh-cn_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_zh-cn_stock:1.2.1 |
ensemble-workflow | |
entity-mentions_ensemble-workflow_lang_multi_distilwatbert | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_ensemble-workflow_lang_multi_distilwatbert:1.2.1 |
entity-mentions_ensemble-workflow_lang_multi_distilwatbert-cpu | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_ensemble-workflow_lang_multi_distilwatbert-cpu:1.2.1 |
RBR models | |
entity-mentions_rbr_lang_ar_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ar_stock:1.2.1 |
entity-mentions_rbr_lang_cs_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_cs_stock:1.2.1 |
entity-mentions_rbr_lang_da_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_da_stock:1.2.1 |
entity-mentions_rbr_lang_de_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_de_stock:1.2.1 |
entity-mentions_rbr_lang_en_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_en_stock:1.2.1 |
entity-mentions_rbr_lang_es_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_es_stock:1.2.1 |
entity-mentions_rbr_lang_fi_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_fi_stock:1.2.1 |
entity-mentions_rbr_lang_fr_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_fr_stock:1.2.1 |
entity-mentions_rbr_lang_he_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_he_stock:1.2.1 |
entity-mentions_rbr_lang_hi_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_hi_stock:1.2.1 |
entity-mentions_rbr_lang_it_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_it_stock:1.2.1 |
entity-mentions_rbr_lang_ja_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ja_stock:1.2.1 |
entity-mentions_rbr_lang_ko_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ko_stock:1.2.1 |
entity-mentions_rbr_lang_nb_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_nb_stock:1.2.1 |
entity-mentions_rbr_lang_nl_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_nl_stock:1.2.1 |
entity-mentions_rbr_lang_nn_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_nn_stock:1.2.1 |
entity-mentions_rbr_lang_pl_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_pl_stock:1.2.1 |
entity-mentions_rbr_lang_pt_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_pt_stock:1.2.1 |
entity-mentions_rbr_lang_ro_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ro_stock:1.2.1 |
entity-mentions_rbr_lang_ru_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ru_stock:1.2.1 |
entity-mentions_rbr_lang_sk_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_sk_stock:1.2.1 |
entity-mentions_rbr_lang_sv_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_sv_stock:1.2.1 |
entity-mentions_rbr_lang_tr_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_tr_stock:1.2.1 |
entity-mentions_rbr_lang_zh-cn_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_zh-cn_stock:1.2.1 |
entity-mentions_rbr_lang_zh-tw_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_zh-tw_stock:1.2.1 |
SIRE models | |
entity-mentions_sire-workflow_lang_en_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_sire-workflow_lang_en_stock:1.2.1 |
Transformer models | |
entity-mentions_transformer-workflow_lang_multi_distilwatbert | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_transformer-workflow_lang_multi_distilwatbert:1.2.1 |
entity-mentions_transformer-workflow_lang_multi_distilwatbert-cpu | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_transformer-workflow_lang_multi_distilwatbert-cpu:1.2.1 |
entity-mentions_transformer-workflow_lang_multi_stock | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_transformer-workflow_lang_multi_stock:1.2.1 |
Entity models (PII) | |
entity-mentions_rbr_lang_multi_pii | docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_multi_pii:1.2.1 |
For details of the Entity-mention
type system, see Understanding model type systems.
The generic entity models
The models for entity type systems have been trained and tested on labeled data from news reports. These models have two parts:
-
A rule-based model (the
rbr
models), which handles syntactically regular entity types such as number, email and phone. -
A model trained on labeled data for the more complex entity types such as person, organization, or location.
The rbr
and bilstm
models are monolingual: each model knows how to analyze input text in a single language.
The bert
model is multilingual: the single model can analyze input texts from multiple languages.
The bilstm
models use GloVe embeddings trained on the Wikipedia corpus in each language.
The bert
model uses the Google Multilingual BERT model (Large, Cased, 104 languages).
The transformer
model is optimized for GPU, but supports CPU usage. The transformer model uses an IBM-trained multilingual Foundation Model.
All models output non-overlapping entity mention spans. That is, each character in the input text can belong to either no entity type or exactly one entity type and there are no overlapping entities.
The PII entity models
The PII models recognize personal identifiable information such as person names, SSN, bank account numbers, credit card numbers, etc.
Due to the nature of PII, it is difficult to train machine learning models for the majority of PII, especially credit card numbers, passport numbers and other identifiers. Therefore, the PII model has two parts:
-
A rule-based model handles the majority of the types by identifying common formats of PII entities and performing possible checksum/validations as appropriate for each entity type. For example, credit card number candidates are validated using the Luhn algorithm.
-
A model trained on labeled data for types where labeled data can be obtained, such as person and location. For this, use one of the models available for the entity v2 type system.
Running models
The Entity-mentions model request accepts the following fields:
Field | Type | Required Optional Repeated |
Description |
---|---|---|---|
raw_document |
watson_core_data_model.nlp.RawDocument |
required | The input document on which to perform entity analysis |
language_code |
str |
optional | Language code corresponding to the text of the raw_document |
Among other returned fields, Entity-mentions returns codes for EntityMentionClass
and EntityMentionType
, as noted below:
EntityMentionClass
Name | Number | Description |
---|---|---|
MENTC_UNSET | 0 | Not set by the mention tagger |
MENTC_SPC | 1 | The mention refers to a specific thing |
MENTC_NEG | 2 | The mention is negated |
MENTC_GEN | 3 | The mention is not SPC or NEG (note that this is different than UNSET) |
EntityMentionType
Name | Number | Description |
---|---|---|
MENTT_UNSET | 0 | Not set by the mention tagger |
MENTT_NAM | 1 | Named, loosely, proper noun |
MENTT_NOM | 2 | Nominal, descriptive noun |
MENTT_PRO | 3 | Pronoun, possessive determiner, or reference cardinal |
MENTT_NONE | 4 | None, a mention that is not NAM, NOM, or PRO (note that this is different than UNSET) |
Example requests
REST API
curl -s \
"http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/EntityMentionsPredict" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "Grpc-Metadata-mm-model-id: entity-mentions_rbr_lang_multi_pii" \
-d '{ "raw_document": { "text": "My email is john@ibm.com." }, "language_code": "en" }'
Response
{"mentions":[
{"span":{
"begin":12,
"end":24,
"text":"john@ibm.com"
},
"type":"EmailAddress",
"producerId":{
"name":"RBR mentions",
"version":"0.0.1"
},
"confidence":0.8,
"mentionType":"MENTT_UNSET",
"mentionClass":"MENTC_UNSET",
"role":""
}
],
"producerId":{
"name":"RBR mentions",
"version":"0.0.1"
}
}
Python
import grpc
from watson_nlp_runtime_client import (
common_service_pb2,
common_service_pb2_grpc,
syntax_types_pb2,
)
channel = grpc.insecure_channel("localhost:8085")
stub = common_service_pb2_grpc.NlpServiceStub(channel)
request = common_service_pb2.EntityMentionsRequest(
raw_document=syntax_types_pb2.RawDocument(text="My email is john@ibm.com"),
language_code='en'
)
response = stub.EntityMentionsPredict(
request, metadata=[("mm-model-id", "entity-mentions_rbr_lang_multi_pii")]
)
print(response)
Response
mentions {
span {
begin: 12
end: 24
text: "john@ibm.com"
}
type: "EmailAddress"
producer_id {
name: "RBR mentions"
version: "0.0.1"
}
confidence: 0.8
}
producer_id {
name: "RBR mentions"
version: "0.0.1"
}