Named Entity Recognition annotator

The Named Entity Recognition annotator uses predefined rules to extract person, locations, and organizations from the input text and maps this information to facets.

By default, this annotator is not enabled. You can enable and configure the annotator when you configure the parser in the administration console. For each supported language, you can add entries to specify additional words to recognize as entities when documents are parsed. For example, you might specify energy as a word to extract as an organization entity. You can also specify words that are to be blocked and not to be recognized as entities.

For content analytics collections, the annotator maps information to the following predefined facets:

Table 1. Predefined facets for named entity extraction
Facet path	Facet name
$._ne.person	Person
$._ne.location	Location
$._ne.organization	Organization

For enterprise search collections, you can specify to which facets the extracted information is mapped when you configure the annotator in the administration console. By default, the system maps the information to new facets that are named Person, Location, and Organization. You can specify different names for these new facets, or map the entities to existing flat facets. If you map the entities to existing facets, the entity values are added to any existing values for the facets.

After you make changes to the configuration of the Named Entity Recognition annotator, you must apply the changes to documents in the index. For enterprise search collections, you must start or restart the parser before you apply the changes to documents in the index. For content analytics collections, you must deploy the analytic resources before you apply the changes to documents in the index.

To apply the changes to documents in the index, run a full rebuild of the index. Alternatively, recrawl or reimport all documents so that they can be indexed again. If the document cache is not enabled for this collection, you must recrawl or reimport all documents.

The predefined named entity extraction rule supports the following languages:

English
Chinese
French
German
Japanese
Spanish