Output paths for exported documents

When you export documents as XML, the export program creates a directory under the output file path that you specify when you configure export options.

Exported file types

Depending on the types of documents that you export, the output can include .xml files, .xmi files, .dat files, or all types of files.

Crawled documents

When you export crawled documents, the document metadata is saved in an XML file with the extension .xml. The document content is stored in a separate data file with the extension .dat. If the content is a PDF file, you can open the .dat file with a PDF reader.

If you select the option to use field names or facet paths as XML elements, then the file extension matches the extension of the original file, if it can be determined, instead of .dat.

The names of metadata fields for exported crawled documents matches the original metadata field as it is defined in the data source, such as the column name of a table in a relational database.

Analyzed documents

When you export analyzed documents, the document metadata and content can be stored in an XML file with the extension .xml. Because content is converted to plain text when the documents is parsed and analyzed, the metadata and content can be merged into the same XML file.

When you enable the CAS as XMI format option, the information stored in the common analysis structure (CAS) is converted to XML Metadata Interchange (XMI) format and exported as XMI files with the extension .xmi.

The names of metadata fields for exported analyzed documents is a mapped index field name. Only fields that are configured to be Returnable in the index field definition are exported.

Analyzed documents also have annotations that were added to the documents by annotators and other linguistic and analytical processes. Only annotations that are configured to be indexed as facets or index fields are included in the output file.

Searched documents

When you export results from an enterprise search application or content analytics miner, you choose whether you want to export the crawled document output, the analyzed document output, or both types of output.

Directory structures for exported documents

When documents are exported as XML, the output directory name is based on the time that the export occurs. For example, if the export starts on 2009/06/11 at 13:00, then the directory name is 200906111300.

The export program saves up to 1,000 export files in the directory. If there are more than 1,000 files to be exported, the export program creates subdirectories and saves up to 1,000 files in each directory. These additional directories are named sequentially, beginning with the number 0.

The output files are also named sequentially, beginning with the number 0. Different sequences identify the data (.dat) and XML (.xml) ouput.

For example:

Sample output path for crawled documents Sample output path for analyzed documents

Table 1. Sample directory structures for crawled or analyzed documents
Sample output path for crawled documents	Sample output path for analyzed documents
`Crawled content 200906111300 0 000.dat 001.dat ... 999.dat 1 000.dat 001.dat ... 999.dat metadata 200906111300 0 000.xml 001.xml ... 999.xml 1 000.xml 001.xml ... 999.xml`	`Analyzed CAS 200906111200 exported_typesystem.xml 0 000.xmi 001.xml ... 999.xmi content 200906111300 0 001.xml 002.xml ... 999.xml 1 001.xml 002.xml ... 999.xml`

Crawled
  content
    200906111300
      0
        000.dat
        001.dat
        ...
        999.dat
      1
        000.dat
        001.dat
        ...
        999.dat
  metadata
    200906111300
      0
        000.xml
        001.xml
        ...
        999.xml
      1
        000.xml
        001.xml
        ...
        999.xml

Analyzed
  CAS
    200906111200
      exported_typesystem.xml
      0
        000.xmi
        001.xml
        ...
        999.xmi
  content
    200906111300
      0
        001.xml
        002.xml
        ...
        999.xml
      1
        001.xml
        002.xml
        ...
        999.xml