Entity-mentions

At a glance

The entity-mentions task encapsulates algorithms for extracting mentions of entities (person, organizations, dates) from the input text. The task offers implementations of strong entity extraction algorithms from each of three families: rule-based, classic ML, and deep-learning.

Class definitions
watson_nlp.blocks.entity_mentions.rbr.RBR
watson_nlp.workflows.entity_mentions.sire.SIRE
watson_nlp.workflows.entity_mentions.transformer.Transformer
watson_nlp.workflows.entity_mentions.bert.BERT
watson_nlp.workflows.entity_mentions.bilstm.BiLSTM

For language support, see Supported languages.

Pretrained models

Several pretrained models are available, for common entities such as person, organization, and dates. Model names are listed below.

Model ID Container Image
BERT models
entity-mentions_bert-workflow_lang_multi_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bert-workflow_lang_multi_stock:1.2.1
BiLSTM models
entity-mentions_bilstm-workflow_lang_ar_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_ar_stock:1.2.1
entity-mentions_bilstm-workflow_lang_de_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_de_stock:1.2.1
entity-mentions_bilstm-workflow_lang_en_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_en_stock:1.2.1
entity-mentions_bilstm-workflow_lang_es_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_es_stock:1.2.1
entity-mentions_bilstm-workflow_lang_fr_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_fr_stock:1.2.1
entity-mentions_bilstm-workflow_lang_it_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_it_stock:1.2.1
entity-mentions_bilstm-workflow_lang_ja_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_ja_stock:1.2.1
entity-mentions_bilstm-workflow_lang_ko_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_ko_stock:1.2.1
entity-mentions_bilstm-workflow_lang_nl_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_nl_stock:1.2.1
entity-mentions_bilstm-workflow_lang_pt_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_pt_stock:1.2.1
entity-mentions_bilstm-workflow_lang_zh-cn_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_bilstm-workflow_lang_zh-cn_stock:1.2.1
ensemble-workflow
entity-mentions_ensemble-workflow_lang_multi_distilwatbert docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_ensemble-workflow_lang_multi_distilwatbert:1.2.1
entity-mentions_ensemble-workflow_lang_multi_distilwatbert-cpu docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_ensemble-workflow_lang_multi_distilwatbert-cpu:1.2.1
RBR models
entity-mentions_rbr_lang_ar_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ar_stock:1.2.1
entity-mentions_rbr_lang_cs_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_cs_stock:1.2.1
entity-mentions_rbr_lang_da_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_da_stock:1.2.1
entity-mentions_rbr_lang_de_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_de_stock:1.2.1
entity-mentions_rbr_lang_en_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_en_stock:1.2.1
entity-mentions_rbr_lang_es_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_es_stock:1.2.1
entity-mentions_rbr_lang_fi_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_fi_stock:1.2.1
entity-mentions_rbr_lang_fr_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_fr_stock:1.2.1
entity-mentions_rbr_lang_he_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_he_stock:1.2.1
entity-mentions_rbr_lang_hi_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_hi_stock:1.2.1
entity-mentions_rbr_lang_it_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_it_stock:1.2.1
entity-mentions_rbr_lang_ja_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ja_stock:1.2.1
entity-mentions_rbr_lang_ko_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ko_stock:1.2.1
entity-mentions_rbr_lang_nb_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_nb_stock:1.2.1
entity-mentions_rbr_lang_nl_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_nl_stock:1.2.1
entity-mentions_rbr_lang_nn_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_nn_stock:1.2.1
entity-mentions_rbr_lang_pl_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_pl_stock:1.2.1
entity-mentions_rbr_lang_pt_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_pt_stock:1.2.1
entity-mentions_rbr_lang_ro_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ro_stock:1.2.1
entity-mentions_rbr_lang_ru_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_ru_stock:1.2.1
entity-mentions_rbr_lang_sk_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_sk_stock:1.2.1
entity-mentions_rbr_lang_sv_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_sv_stock:1.2.1
entity-mentions_rbr_lang_tr_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_tr_stock:1.2.1
entity-mentions_rbr_lang_zh-cn_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_zh-cn_stock:1.2.1
entity-mentions_rbr_lang_zh-tw_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_zh-tw_stock:1.2.1
SIRE models
entity-mentions_sire-workflow_lang_en_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_sire-workflow_lang_en_stock:1.2.1
Transformer models
entity-mentions_transformer-workflow_lang_multi_distilwatbert docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_transformer-workflow_lang_multi_distilwatbert:1.2.1
entity-mentions_transformer-workflow_lang_multi_distilwatbert-cpu docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_transformer-workflow_lang_multi_distilwatbert-cpu:1.2.1
entity-mentions_transformer-workflow_lang_multi_stock docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_transformer-workflow_lang_multi_stock:1.2.1
Entity models (PII)
entity-mentions_rbr_lang_multi_pii docker-na-public.artifactory.swg-devops.com/wcp-ai-foundation-team-docker-virtual/watson-nlp_entity-mentions_rbr_lang_multi_pii:1.2.1

For details of the Entity-mention type system, see Understanding model type systems.

The generic entity models

The models for entity type systems have been trained and tested on labeled data from news reports. These models have two parts:

  • A rule-based model (the rbr models), which handles syntactically regular entity types such as number, email and phone.

  • A model trained on labeled data for the more complex entity types such as person, organization, or location.

The rbr and bilstm models are monolingual: each model knows how to analyze input text in a single language.

The bert model is multilingual: the single model can analyze input texts from multiple languages.

The bilstm models use GloVe embeddings trained on the Wikipedia corpus in each language.

The bert model uses the Google Multilingual BERT model (Large, Cased, 104 languages).

The transformer model is optimized for GPU, but supports CPU usage. The transformer model uses an IBM-trained multilingual Foundation Model.

All models output non-overlapping entity mention spans. That is, each character in the input text can belong to either no entity type or exactly one entity type and there are no overlapping entities.

The PII entity models

The PII models recognize personal identifiable information such as person names, SSN, bank account numbers, credit card numbers, etc.

Due to the nature of PII, it is difficult to train machine learning models for the majority of PII, especially credit card numbers, passport numbers and other identifiers. Therefore, the PII model has two parts:

  • A rule-based model handles the majority of the types by identifying common formats of PII entities and performing possible checksum/validations as appropriate for each entity type. For example, credit card number candidates are validated using the Luhn algorithm.

  • A model trained on labeled data for types where labeled data can be obtained, such as person and location. For this, use one of the models available for the entity v2 type system.

Running models

The Entity-mentions model request accepts the following fields:

Field Type Required
Optional
Repeated
Description
raw_document watson_core_data_model.nlp.RawDocument required The input document on which to perform entity analysis
language_code str optional Language code corresponding to the text of the raw_document

Among other returned fields, Entity-mentions returns codes for EntityMentionClass and EntityMentionType, as noted below:

EntityMentionClass

Name Number Description
MENTC_UNSET 0 Not set by the mention tagger
MENTC_SPC 1 The mention refers to a specific thing
MENTC_NEG 2 The mention is negated
MENTC_GEN 3 The mention is not SPC or NEG (note that this is different than UNSET)

EntityMentionType

Name Number Description
MENTT_UNSET 0 Not set by the mention tagger
MENTT_NAM 1 Named, loosely, proper noun
MENTT_NOM 2 Nominal, descriptive noun
MENTT_PRO 3 Pronoun, possessive determiner, or reference cardinal
MENTT_NONE 4 None, a mention that is not NAM, NOM, or PRO (note that this is different than UNSET)

Example requests

REST API

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/EntityMentionsPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "Grpc-Metadata-mm-model-id: entity-mentions_rbr_lang_multi_pii" \
  -d '{ "raw_document": { "text": "My email is john@ibm.com." }, "language_code": "en" }'

Response

{"mentions":[
  {"span":{
    "begin":12,
    "end":24,
    "text":"john@ibm.com"
    },
   "type":"EmailAddress",
   "producerId":{
    "name":"RBR mentions",
    "version":"0.0.1"
    },
   "confidence":0.8,
   "mentionType":"MENTT_UNSET",
   "mentionClass":"MENTC_UNSET",
   "role":""
   }
   ],
   "producerId":{
    "name":"RBR mentions",
    "version":"0.0.1"
   }
  }

Python

import grpc

from watson_nlp_runtime_client import (
    common_service_pb2,
    common_service_pb2_grpc,
    syntax_types_pb2,
)

channel = grpc.insecure_channel("localhost:8085")

stub = common_service_pb2_grpc.NlpServiceStub(channel)

request = common_service_pb2.EntityMentionsRequest(
    raw_document=syntax_types_pb2.RawDocument(text="My email is john@ibm.com"),
    language_code='en'
)

  response = stub.EntityMentionsPredict(
    request, metadata=[("mm-model-id", "entity-mentions_rbr_lang_multi_pii")]
)

print(response)

Response

mentions {
  span {
    begin: 12
    end: 24
    text: "john@ibm.com"
  }
  type: "EmailAddress"
  producer_id {
    name: "RBR mentions"
    version: "0.0.1"
  }
  confidence: 0.8
}
producer_id {
  name: "RBR mentions"
  version: "0.0.1"
}