Concept Model: Fields tab
The Fields tab defines the text field value for the new input data, if necessary.
Text field. Select the field containing the text to be mined. This field depends on the data source.
Document type. The document type specifies the structure of the text. Select one of the following types:
- Full text. Use for most documents or text sources. The entire set of text is scanned for extraction. Unlike the other options, there are no additional settings for this option.
- Structured text. Use for bibliographic forms, patents, and any files that contain regular structures that can be identified and analyzed. This document type is used to skip all or part of the extraction process. It allows you to define term separators, assign types, and impose a minimum frequency value. If you select this option, you must click the Settings button and enter text separators in the Structured Text Formatting. area of the Document Settings dialog box. See the topic Document Settings for Fields Tab for more information.
Input encoding. This option is available only if you indicated that
the text field represents Pathnames to documents. It specifies the default
text encoding. A conversion is done from the specified
or recognized encoding to ISO-8859-1. So even if you specify another encoding, the
extraction engine will convert it to ISO-8859-1 before it is processed. Any
characters that do not fit into the ISO-8859-1 encoding definition will be
converted to spaces.
Text language. Identifies the language of the text being mined; this is the main language detected during extraction. Contact your sales representative if you are interested in purchasing a license for a supported language for which you do not currently have access.