What data formats are supported?

To import data, the data must be in a supported format.

Supported external formats
  • Files in file system folders, such as Microsoft Word documents, MSG files, PDF documents, XML files, and HTML documents
  • XML files that conform to the IBM® Content Classification XML schema
  • CSV (comma separated values file)
  • PST (Microsoft Outlook folder)
  • NSF (Lotus Notes® database) format is supported by running a script that is installed with IBM Content Classification.

When you import data from a file system folder, each file in the imported folder becomes a content item, and the folder name is used as the category name (by default). When you import data from XML files that conform to the Content Classification XML schema, CSV files, or PST format, the structure of the imported file is used to create the structure of the content set. That is, each item in the imported file becomes a content item.

Fields in the imported files (such as XML tags or CSV file column headers in Microsoft Excel) correspond to the fields in the content set. For more information about fields, see Defining fields.

Supported internal formats
Classification Workbench Content Set.

When you import data, you can create a content set from data in a combination of formats (for example, PST and CSV files).