Concept model: Settings tab

The Settings tab is used to define the text field value for the new input data, if necessary. It is also the place where you define the data model for your output (scoring mode).

Note: This tab appears only when the model nugget is placed onto the canvas. It does not exist when you are accessing this dialog box directly in the Models palette.

Scoring mode: Concepts as records

With this scoring mode, a new record is created for each concept/document pair. Typically, there are more records in the output than there were in the input.

In addition to the input fields, the following new fields are added to the data:

Table 1. Output fields for "Concepts as records"
Field Description
Concept Contains the extracted concept name found in the text data field.
Type Stores the type of the concept as a full type name, such as Location or Person. A type is a semantic grouping of concepts. See the topic Type dictionaries for more information.
Count Displays the number of occurrences for that concept (and its underlying terms) in the text body (record/document).

When you select this option, all other options except Accommodate punctuation errors are disabled.

Scoring mode: Concepts as fields

In concept models, for each input record, a new record is created for every concept found in a given document. Therefore, there are just as many output records as there were in the input. However, each record (row) now contains one new field (column) for each concept that was selected (using the check mark) on the Model tab. The value for each concept field depends on whether you select Flags or Counts as your field value on this tab.

Note: If you are using very large data sets, for example with a Db2 database, using Concepts as fields may encounter processing problems due to the amount of data. In this case we recommend using Concepts as records instead.

Field Values. Choose whether the new field for each concept will contain a count or a flag value.

  • Flags. This option is used to obtain flags with two distinct values in the output, such as Yes/No, True/False, T/F, or 1 and 2. The storage types are set automatically to reflect the values chosen. For example, if you enter numeric values for the flags, they will be automatically handled as an integer value. The storage types for flags can be string, integer, real number, or date/time. Enter a flag value for True and for False.
  • Counts. Used to obtain a count of how many times the concept occurred in a given record.

Field name extension. Specify an extension for the field name. Field names are generated by using the concept name plus this extension.

  • Add as. Specify where the extension should be added to the field name. Choose Prefix to add the extension to the beginning of the string. Choose Suffix to add the extension to the end of the string.

Accommodate punctuation errors. This option temporarily normalizes text containing punctuation errors (for example, improper usage) during extraction to improve the extractability of concepts. This option is extremely useful when text is short and of poor quality (as, for example, in open-ended survey responses, e-mail, and CRM data), or when the text contains many abbreviations.