When you export documents as XML, the export program creates a directory under the output file path that you specify when you configure export options.
Depending on the types of documents that you export, the output can include .xml files, .xmi files, .dat files, or all types of files.
If you select the option to use field names or facet paths as XML elements, then the file extension matches the extension of the original file, if it can be determined, instead of .dat.
The names of metadata fields for exported crawled documents matches the original metadata field as it is defined in the data source, such as the column name of a table in a relational database.
When you enable the CAS as XMI format option, the information stored in the common analysis structure (CAS) is converted to XML Metadata Interchange (XMI) format and exported as XMI files with the extension .xmi.
The names of metadata fields for exported analyzed documents is a mapped index field name. Only fields that are configured to be Returnable in the index field definition are exported.
Analyzed documents also have annotations that were added to the documents by annotators and other linguistic and analytical processes. Only annotations that are configured to be indexed as facets or index fields are included in the output file.
When documents are exported as XML, the output directory name is based on the time that the export occurs. For example, if the export starts on 2009/06/11 at 13:00, then the directory name is 200906111300.
The export program saves up to 1,000 export files in the directory. If there are more than 1,000 files to be exported, the export program creates subdirectories and saves up to 1,000 files in each directory. These additional directories are named sequentially, beginning with the number 0.
The output files are also named sequentially, beginning with the number 0. Different sequences identify the data (.dat) and XML (.xml) ouput.
Sample output path for crawled documents | Sample output path for analyzed documents |
---|---|
|
|