Each field in an IBM® Content
Classification system
has a unique name by which it is identified, a data type, and a content
type. You can use the Management Console to
view or change the field definitions.
Field name
Specifies
the name of the field. You can type a name for a new field, but you
cannot change this value when you view the properties of an existing
field.
Data type
The data
type specifies the kind of information that is stored in each field
and classified by Content Classification.
The following data types are supported:
- Binary
- Identifies a content field that contains binary content from binary
documents such as Microsoft Word files, PDF files, and HTML files.
The binary content is not extracted by Content Classification text extractors; it
is sent directly to a decision plan without filtering. Use the Binary data
type only with the advanced decision plan action called Use
binary callback.
- Create associated document property fields
- Specifies that the system is to automatically create metadata
fields that contain information about the document. The following
document property fields are created:
- MIME type (such as TXT, HTML, and PDF)
- Character set (the document encoding, such as Windows-1252 or
UTF-16)
- Language (the language of the source text)
- File extension
- File name
These document properties are not used for text extraction. They
are concatenated to a single field called Binary_Field_Name.Properties.
The value of this field can be used by a decision plan.
- Numeric
- Identifies a content field that contains numbers.
- Text
- Identifies a content field that contains text.
- Document
- Identifies a content field that contains raw data extracted
from binary documents such as Microsoft Word files, PDF files, and
HTML files by using Content Classification text
extractors. When the data type is set to Document,
the content type is set automatically to None.
For
single-value fields, this data type is reserved for Content Classification applications that use
the AnalyzeText, CreateTextBinary,
and SuggestBinary API functions. For multiple-value
fields, this data type is reserved for Decide, DecideFeedback, SuggestDocument, CreateTextDocument,
and AnalyzeTextDocument API functions.
- Create associated document property fields
- Specifies that the system is to automatically create metadata
fields that contain information about the document. The following
document property fields are created:
- MIME type (such as TXT, HTML, PDF, and so on)
- Character set (the document encoding, such as Windows-1252 or
UTF-16)
- Language (the language of the source text)
- File extension
- File name
- DocumentProperty
- Identifies a content item field that contains information
about the document. When the data type is set to DocumentProperty,
the content type is set automatically to None.
Content type
The
content type specifies the role or purpose of the field and defines
the type of natural language processing that is applied to the field.
The following content types are supported. You can also configure
the system to include custom content types by importing and exporting
language customization data.
- Body
- Identifies a content item field that contains the main text of
the document, such as the body of a document in an enterprise content
management repository, the body of an email message, or the body of
an attachment.
- DocTitle
- Identifies a content item field that contains the document title.
- None
- This content type is assigned automatically to Document and DocumentProperty data
types, and you cannot change the value.
- Other
- Identifies a content item field that does not match one of the
other content type options.
- PlainText
- Identifies a content item field that contains textual content.
This option is recommended for documents in an ECM repository.
- Sender
- For email content only. Identifies a content item field that contains
an email address, such as Sender, From, CC, or To fields.
- Subject
- For email content only. Identifies a content item field that contains
the subject of an email message, such as a Subject field.