Creating custom dictionaries
You can create custom dictionaries that contain terms in a specific domain of knowledge. When you include a custom dictionary in your UIMA pipeline, the pipeline identifies and annotates instances of these terms that are found in your documents.
About this task
To configure a custom dictionary, you must create a dictionary database and include the compiled dictionary file in the lexical analysis stage of your UIMA pipeline. You can then manually add entries or import entries into the database.
In most European languages, the case that you specify for a term in the dictionary affects what
matching terms are identified in your documents:
- An entry in lowercase matches lowercase, title case, and uppercase instances of the word.
- An entry in title case matches title case and uppercase instances of the word.
- An entry in uppercase matches only uppercase instances of the word.
If you have a database or spreadsheet that contains terms
to add to your dictionary, you can save the data as a CSV file
and import the data directly into the Content Analytics Studio dictionary database. Each
row in the CSV file is treated as a separate entry in the database.
The column that contains the term can have a list of surface forms
that are delimited by a separator character. The first surface form
in the list is assumed to be the normal form. The separator can
be any character that does not occur in the data. For example,
the following data might be found in a CSV file that contains information
about cities:
City,POS,Country,Population
Dublin,Noun,Ireland,500000
New York|Big Apple,Noun,USA,8200000
Procedure
To configure a custom dictionary:
What to do next
Whenever you add or modify dictionary entries, you must rebuild the dictionary file from the database before your pipeline can use the updated dictionary to analyze documents.
Tip: If
you later need to edit the dictionary entry attributes, such as to
edit the default part of speech or constraint values for new database
entries, right-click the dictionary database and click Properties.
Click and double-click the row
for the normal form. The dictionary must be closed before you can
edit the attributes.