Exporting documents as CSV files

When you configure options for exporting documents in the administration console, a wizard helps you specify information for exporting documents as comma-separated value (CSV) files.

About this task

You can import the exported CSV files into spreadsheet applications, such as Lotus® Symphony® and Microsoft Excel, to create statistical reports. In addition, the documents are exported as if they were star-schema tables, which means you can load the files into a relational database and then use a business intelligence application to analyze the data along with other structured data.

If you use IBM® Cognos® Business Intelligence and plan to export documents so that you can generate reports or do additional analysis through a relational database, you must set up the integration between Watson Explorer Content Analytics and IBM Cognos BI before you run the wizard.

Procedure

To export documents as CSV files:

  1. On the Collections view, expand the collection that you want to configure.
    • To export crawled or analyzed documents, check the Parse and Index pane and ensure that the parse and index process is running. Click the icon to export document content and metadata.
    • To export searched documents, expand the Search pane for an enterprise search collection or Analytics pane for a content analytics collection. Click the icon to configure settings for exporting documents from search results.
  2. Click Export documents as CSV files, and then click Configure to start the export wizard. If you previously ran the wizard, the path that you specified for where the CSV files are to be created is displayed.
  3. Specify the fields to export and how you want to export them. By default, the ID, Document ID (URI), and Document Date are exported as one CSV file, which as the named doc_fact.csv. The ID is an automatically assigned number that is unique for each document export. This means that the ID is changed each time that the document is exported.

    For crawled documents, all native fields that are mapped to index fields in this collection can be exported. For example, if a column in a database table is named LSTNAME, and it is mapped to an index field named lastname, then LSTNAME is listed on this page.

    For analyzed or searched documents, only fields that have been marked as Returnable can be exported.

    Configure the following options for exporting fields:

    Do not export
    Specifies that the field will not be exported when the documents are exported.
    A column for document fact table
    Exports the field as an additional column of the doc_fact.csv file. For example, if the option for __$FileName$__ is selected, the doc_fact.csv file will include the ID, Document ID, Document Date, and file name of the document. An example of the content in the doc_fact.csv file is:
    1,file:///C:data/xml/apple.xml,1199186392296,apple.xml
    2,file:///C:data/xml/banana.xml,1199272792171,banana.xml
    3,file:///C:data/xml/citrus.xml,1199272792218,citrus.xml
    A table for dimension
    Exports the field as a separate CSV file. For example, filesize.csv is created in addition to the doc_fact.csv file. If you change file name that is suggested by the wizard, ensure that you specify a unique name. You cannot have two CSV files with the same name.
    The generated CSV file includes the ID of the associated document and the value of the field. For example, the filesize.csv file might be:
    1,100
    2,125
    3,98
    IDs in a CSV file are unique, but they are not sorted. In this example, this means that the document that has the ID 1 in the doc_fact.csv file has the __$FileSize$__ field and the value if the field is 100.
    If you select the A document can have more than one value for this field check box, another CSV file is generated that has the suffix _brg. Select this check if there are more than one field with the same name in the document. For example, there are two component field instances in this document:
    URI: file:///C:data/carrepair/001.xml
    date:1199186392296
    title:repair history
    customer:00001
    component:shaft
    component:tire
    If you export the component field to the component.csv file and select this check box, the component_brg.csv file is also created and it contains relationships between the doc_fact.csv file and the component.csv file. Examples of the content in these files are:
    doc_fact.csv
    1,file:///C:data/carrepair/001.xml,1199186392296
    
    component_brg.csv
    1,1
    1,2
    
    component.csv
    1,shaft
    2,tire
  4. If you are configuring export options for analyzed documents or searched documents, specify the facets that you want to export.
    Configure the following options for exporting fields:
    Do not export
    Specifies that the facet will not be exported when the documents are exported.
    A table for a dimension
    Exports all subfacets of the facet into one CSV file. If you change file name that is suggested by the wizard, ensure that you specify a unique name. You cannot have two CSV files with the same name.

    A CSV file that stores the relationship between this file and the doc_fact.csv is also created. For example, if Phrase Constituent is selected to be exported into a file named phrase.csv, a file named phrase_brg.csv is also created. Examples of the content in these files are:

    doc_fact.csv
    1,file:///C:/samples/firststep/data/xml/00000000.xml,1199186392296
    2,file:///C:/samples/firststep/data/xml/00000001.xml,1199272792171
    
    phrase_brg.csv
    1,64
    1,94
    1,101
    1,31
    2,182
    2,170
    2,185
    2,176
    2,134
    2,138
    
    phrase.csv
    64,noun_phrase,nouns,container straw lemon tea
    94,noun_phrase,adp_noun,from ... juice pack
    101,noun_phrase,nouns,juice pack
    31,noun_phrase,mod_noun,lemon ... tea
    182,noun_phrase,adp_noun,of ... thread
    170,pred_phrase,verb_noun,be ... something
    185,noun_phrase,adp_noun,inside ... cup
    176,noun_phrase,adp_noun,like ... piece
    134,noun_phrase,mod_noun,tampering ... thread
    138,pred_phrase,verb_noun,tamper ... thread
  5. Specify the directory where the CSV files are to be created and click Finish to complete the wizard. The directory must exist and allow write access.
  6. In the Parse and Index pane, stop and restart parsing and indexing. If documents are already indexed, you must rebuild the index from the document cache.
  7. Optional: After documents are exported, expand the Export area to see the status of export requests. For example, you can see the number of documents that have been exported so far, whether the export request is completed, and whether any errors occurred.