Field properties

Each field in an IBM® Content Classification system has a unique name by which it is identified, a data type, and a content type. You can use the Management Console to view or change the field definitions.

Field name

Specifies the name of the field. You can type a name for a new field, but you cannot change this value when you view the properties of an existing field.

Data type

The data type specifies the kind of information that is stored in each field and classified by Content Classification. The following data types are supported:

Start of change Binary End of change
Start of change Identifies a content field that contains binary content from binary documents such as Microsoft Word files, PDF files, and HTML files. The binary content is not extracted by Content Classification text extractors; it is sent directly to a decision plan without filtering. Use the Binary data type only with the advanced decision plan action called Use binary callback.
Create associated document property fields
Specifies that the system is to automatically create metadata fields that contain information about the document. The following document property fields are created:
  • MIME type (such as TXT, HTML, and PDF)
  • Character set (the document encoding, such as Windows-1252 or UTF-16)
  • Language (the language of the source text)
  • File extension
  • File name
These document properties are not used for text extraction. They are concatenated to a single field called Binary_Field_Name.Properties. The value of this field can be used by a decision plan.
End of change
Numeric
Identifies a content field that contains numbers.
Text
Identifies a content field that contains text.
Document
Start of change Identifies a content field that contains raw data extracted from binary documents such as Microsoft Word files, PDF files, and HTML files by using Content Classification text extractors. When the data type is set to Document, the content type is set automatically to None.

For single-value fields, this data type is reserved for Content Classification applications that use the AnalyzeText, CreateTextBinary, and SuggestBinary API functions. For multiple-value fields, this data type is reserved for Decide, DecideFeedback, SuggestDocument, CreateTextDocument, and AnalyzeTextDocument API functions.

Create associated document property fields
Specifies that the system is to automatically create metadata fields that contain information about the document. The following document property fields are created:
  • MIME type (such as TXT, HTML, PDF, and so on)
  • Character set (the document encoding, such as Windows-1252 or UTF-16)
  • Language (the language of the source text)
  • File extension
  • File name
End of change
DocumentProperty
Identifies a content item field that contains information about the document. When the data type is set to DocumentProperty, the content type is set automatically to None.

Content type

The content type specifies the role or purpose of the field and defines the type of natural language processing that is applied to the field. The following content types are supported. You can also configure the system to include custom content types by importing and exporting language customization data.
Body
Identifies a content item field that contains the main text of the document, such as the body of a document in an enterprise content management repository, the body of an email message, or the body of an attachment.
DocTitle
Identifies a content item field that contains the document title.
None
This content type is assigned automatically to Document and DocumentProperty data types, and you cannot change the value.
Other
Identifies a content item field that does not match one of the other content type options.
PlainText
Identifies a content item field that contains textual content. This option is recommended for documents in an ECM repository.
Sender
For email content only. Identifies a content item field that contains an email address, such as Sender, From, CC, or To fields.
Subject
For email content only. Identifies a content item field that contains the subject of an email message, such as a Subject field.