The
Dictionary Lookup operator is based on a dictionary
that contains a list of words to extract concepts like person names,
product names, or locations from a database table. With the Dictionary
Lookup operator, you can specify text columns to find concepts that
are specified in the dictionary.
Before you begin- Connect to a database, for example, DWESAMP.
- Create or
import a dictionary in the Dictionaries folder of a data warehousing project.
- Create a new mining flow.
- Place a Table Source operator
in the canvas and associate it with
a source table.
ProcedureTo define
a Dictionary Lookup operator,
follow these steps:
- From the Text Operators
Palette, drag the Dictionary Lookup
operator in the canvas.
The Properties view of the
Dictionary Lookup operator is opened below the canvas. The General
page of the Properties view is displayed.
- In
the canvas, connect the output port of the Table Source
operator with the Input port of the Dictionary Lookup operator.
- Optional: On the General page of the Properties
view of the Dictionary Lookup operator:
- In
the Label entry field, replace
the default label name for this Dictionary Lookup operator with a
name of your choice.
- In the Description entry
field, type a description
for this Dictionary Lookup operator.
- On the Dictionary Settings page of the Properties view:
The list of input text columns contains the text columns
of the source table that is associated with the Table Source operator.
In the Language field, the local language
of the operating system is displayed. If Data warehousing in Db2 does
not support the local language, English (United States) is displayed.
- From the list of input-text columns, select
the input-text
column to be analyzed.
- Optional:
If the language of the text does not
match the current locale, select the language that matches the language
of the selected input-text column from the list of languages.
- On the Analysis Results page
of the Properties view:
You can
create several output ports for this Text operator. For each output
port, an Output tab is provided on the Analysis Results tab. You can
add more output ports by clicking the Add new port icon
next to the output tabs.
On
the Output tabs, you can map an annotation type from a referenced
dictionary to the current output port. The lookup results for the
type of this dictionary are then stored in the target table that is
connected to this output port.
- Optional: If there are several output
ports,
click the first Output tab.
- From the
list of dictionaries, select the dictionary
that you want to map to this output port.
The list
of dictionaries contains the dictionaries that are created or imported
in the current data warehousing project.
- From the list of annotation types, select
the annotation
type that you want to map to this output port.
By default,
the first annotation type is selected.
Dictionaries that
are created in the design studio contain a single annotation type.
Third party LanguageWare® dictionaries
might contain multiple types in one dictionary.
- Optional: Delete or rename columns in the list of result
columns.
- Optional:
On the Output Columns tab of the Properties
view, you can move columns of the input table from the list of available
columns to the list of output columns.
Typically,
the primary key columns of the input table are added to the output
columns. By adding the primary key columns of the input table to the
output columns, you can relate the information that is extracted from
the input text column to the row from which the information was extracted.
- Optional: On the Runtime Options tab of the
Properties
view, specify the number of parallel threads, the maximum size of
text in runtime, and the maximum number of document errors.
- On the mining-editor canvas, connect the output ports of
the Dictionary Lookup operator to the input port of the Table Target
operator.
Example
This
example is based on the following sample data in the sample database
DWESAMP:
- The input table HEALTHCARE.HEART
- The dictionary
HeartDisease.dict
- The type system HeartDisease.typesystem
You can import the sample dictionary from the data warehousing in Db2 installation
directory samples\data\text.
The sample dictionary
is also included in the sample project TextAnalysisSample.
To create the sample project:
- Click File
-> New -> Example...
- On the Select Wizard
page of the New Example wizard, expand the
Data Warehousing Examples folder, select Text Analysis Sample from
the list of wizards, and click Next
- On
the New Project page of the New Example wizard, use the default
name for the new project or specify a different name and click Finish.
In this example, the output port of the source table HEALTHCARE.HEART
is connected with the input port of the Dictionary Lookup operator
DL_HEART. The output port of the Dictionary Lookup operator DL_HEART
is connected with the input port of the target table HeartDisease.
- From the Sources and Targets Palette, drag a Table Source operator
in the canvas.
The Select Database Table dialog is opened.
- In the Select Database Table dialog, select the table HEALTHCARE.HEART.
- From the Text Operators Palette, drag a Dictionary Lookup operator
in the canvas.
The Properties view of the Dictionary Lookup operator
is opened below the canvas.
- In the canvas, connect the
output port of the Source Table operator
with the input port of the Dictionary Lookup operator.
- Optional:
On the General page of the Properties view of the Dictionary
Lookup operator:
- Replace the default label with the label DLU_HeartDisease.
- Add a description for this Dictionary Lookup operator.
- On the Dictionary Settings page of the Properties view:
- Select
the input text column MEDICAL_HISTORY from the list of
input text columns.
- Select the language English (United States)
from the list of languages.
- On the Analysis Results
page of the Properties view:
- Select the dictionary HeartDisease
from the list of dictionaries.
- Select the annotation type
HeartDisease from the list of annotation
types.
- Delete the following columns from the list of result
columns by
selecting the columns and clicking the Delete icon:
- On the Output Columns
page of the Properties View, select the
column RECORD_ID in the list of available columns and click the right-arrow
to add it to the output columns.
- In the canvas, right-click
the output port of the Dictionary Lookup
operator and select Create Suitable Table... from
the popup menu.
- In the Create Suitable Table dialog:
- Type
the name HeartDisease in the Table
name entry field.
- Select the table schema HEALTHCARE
from the list of table schemas.
- Ensure that the check box Automatically
create and
connect to target operator is selected.
Now, you can start the mining flow.
Figure 1. Mining flow