Concept Model: Fields tab

The Fields tab defines the text field value for the new input data, if necessary.

Note: This tab appears only when the model nugget is placed in the stream. It does not exist when you are accessing this output directly in the Models palette.

Text field. Select the field containing the text to be mined. This field depends on the data source.

Document type. The document type specifies the structure of the text. Select one of the following types:

Full text. Use for most documents or text sources. The entire set of text is scanned for extraction. Unlike the other options, there are no additional settings for this option.
Structured text. Use for bibliographic forms, patents, and any files that contain regular structures that can be identified and analyzed. This document type is used to skip all or part of the extraction process. It allows you to define term separators, assign types, and impose a minimum frequency value. If you select this option, you must click the Settings button and enter text separators in the Structured Text Formatting. area of the Document Settings dialog box. See the topic Document Settings for Fields Tab for more information.

Input encoding. This option is available only if you indicated that the text field represents Pathnames to documents. It specifies the default text encoding. A conversion is done from the specified or recognized encoding to ISO-8859-1. So even if you specify another encoding, the extraction engine will convert it to ISO-8859-1 before it is processed. Any characters that do not fit into the ISO-8859-1 encoding definition will be converted to spaces.

Text language. Identifies the language of the text being mined; this is the main language detected during extraction. Contact your sales representative if you are interested in purchasing a license for a supported language for which you do not currently have access.