Defining Dictionary Lookup operators

The Dictionary Lookup operator is based on a dictionary that contains a list of words to extract concepts like person names, product names, or locations from a database table. With the Dictionary Lookup operator, you can specify text columns to find concepts that are specified in the dictionary.
Before you begin
  1. Connect to a database, for example, DWESAMP.
  2. Create or import a dictionary in the Dictionaries folder of a data warehousing project.
  3. Create a new mining flow.
  4. Place a Table Source operator in the canvas and associate it with a source table.
Procedure

To define a Dictionary Lookup operator, follow these steps:

  1. From the Text Operators Palette, drag the Dictionary Lookup operator in the canvas.

    The Properties view of the Dictionary Lookup operator is opened below the canvas. The General page of the Properties view is displayed.

  2. In the canvas, connect the output port of the Table Source operator with the Input port of the Dictionary Lookup operator.
  3. Optional: On the General page of the Properties view of the Dictionary Lookup operator:
    1. In the Label entry field, replace the default label name for this Dictionary Lookup operator with a name of your choice.
    2. In the Description entry field, type a description for this Dictionary Lookup operator.
  4. On the Dictionary Settings page of the Properties view:

    The list of input text columns contains the text columns of the source table that is associated with the Table Source operator.

    In the Language field, the local language of the operating system is displayed. If Data warehousing in Db2 does not support the local language, English (United States) is displayed.

    1. From the list of input-text columns, select the input-text column to be analyzed.
    2. Optional: If the language of the text does not match the current locale, select the language that matches the language of the selected input-text column from the list of languages.
  5. On the Analysis Results page of the Properties view:

    You can create several output ports for this Text operator. For each output port, an Output tab is provided on the Analysis Results tab. You can add more output ports by clicking the Add new port iconAdd New Port icon next to the output tabs.

    On the Output tabs, you can map an annotation type from a referenced dictionary to the current output port. The lookup results for the type of this dictionary are then stored in the target table that is connected to this output port.

    1. Optional: If there are several output ports, click the first Output tab.
    2. From the list of dictionaries, select the dictionary that you want to map to this output port.

      The list of dictionaries contains the dictionaries that are created or imported in the current data warehousing project.

    3. From the list of annotation types, select the annotation type that you want to map to this output port.

      By default, the first annotation type is selected.

      Dictionaries that are created in the design studio contain a single annotation type. Third party LanguageWare® dictionaries might contain multiple types in one dictionary.

    4. Optional: Delete or rename columns in the list of result columns.
  6. Optional: On the Output Columns tab of the Properties view, you can move columns of the input table from the list of available columns to the list of output columns.

    Typically, the primary key columns of the input table are added to the output columns. By adding the primary key columns of the input table to the output columns, you can relate the information that is extracted from the input text column to the row from which the information was extracted.

  7. Optional: On the Runtime Options tab of the Properties view, specify the number of parallel threads, the maximum size of text in runtime, and the maximum number of document errors.
  8. On the mining-editor canvas, connect the output ports of the Dictionary Lookup operator to the input port of the Table Target operator.

Example

This example is based on the following sample data in the sample database DWESAMP:
  • The input table HEALTHCARE.HEART
  • The dictionary HeartDisease.dict
  • The type system HeartDisease.typesystem

You can import the sample dictionary from the data warehousing in Db2 installation directory samples\data\text.

The sample dictionary is also included in the sample project TextAnalysisSample.

To create the sample project:
  1. Click File -> New -> Example...
  2. On the Select Wizard page of the New Example wizard, expand the Data Warehousing Examples folder, select Text Analysis Sample from the list of wizards, and click Next
  3. On the New Project page of the New Example wizard, use the default name for the new project or specify a different name and click Finish.

In this example, the output port of the source table HEALTHCARE.HEART is connected with the input port of the Dictionary Lookup operator DL_HEART. The output port of the Dictionary Lookup operator DL_HEART is connected with the input port of the target table HeartDisease.

  1. From the Sources and Targets Palette, drag a Table Source operator in the canvas.

    The Select Database Table dialog is opened.

  2. In the Select Database Table dialog, select the table HEALTHCARE.HEART.
  3. From the Text Operators Palette, drag a Dictionary Lookup operator in the canvas.

    The Properties view of the Dictionary Lookup operator is opened below the canvas.

  4. In the canvas, connect the output port of the Source Table operator with the input port of the Dictionary Lookup operator.
  5. Optional: On the General page of the Properties view of the Dictionary Lookup operator:
    1. Replace the default label with the label DLU_HeartDisease.
    2. Add a description for this Dictionary Lookup operator.
  6. On the Dictionary Settings page of the Properties view:
    1. Select the input text column MEDICAL_HISTORY from the list of input text columns.
    2. Select the language English (United States) from the list of languages.
  7. On the Analysis Results page of the Properties view:
    1. Select the dictionary HeartDisease from the list of dictionaries.
    2. Select the annotation type HeartDisease from the list of annotation types.
    3. Delete the following columns from the list of result columns by selecting the columns and clicking the Delete icon:
      • begin
      • end
  8. On the Output Columns page of the Properties View, select the column RECORD_ID in the list of available columns and click the right-arrow to add it to the output columns.
  9. In the canvas, right-click the output port of the Dictionary Lookup operator and select Create Suitable Table... from the popup menu.
  10. In the Create Suitable Table dialog:
    1. Type the name HeartDisease in the Table name entry field.
    2. Select the table schema HEALTHCARE from the list of table schemas.
    3. Ensure that the check box Automatically create and connect to target operator is selected.
Now, you can start the mining flow.
Figure 1. Mining flow


Feedback