About this task
How and when documents are exported depends on how content
is added to the collection and whether the collection uses a document
cache. Documents are exported when the parse and index services run.
The parse and index services start when content is added to the collection
or when the index is rebuilt.
- Crawled documents
- How the crawler is configured to run also controls how new, changed,
and deleted documents are exported. The first time that the crawler
crawls a data source, all documents are crawled. In subsequent crawls:
- If the crawler is configured to crawl all updates, then the crawler
checks for new, changed, and deleted documents. The export program
exports the new and changed documents. You can configure an option
to export information about deleted documents. In this case, when
you export documents as XML, an XML file is created for each deleted
document. In the XML output, the value of the /Document@Type element
is DELETED.
- If the crawler is configured to crawl new and modified documents
only, the crawler does not check for deleted documents and information
about the deleted documents is not generated.
If you select the
option to crawl new and modified documents only, the crawler looks
for documents with modification dates that are later than the previous
crawl time. If you copy files to a resource, the modification date
might not change, which means that the crawler might not detect that
the files were added to the resource. For example, if you copy files
to a Windows folder, Windows does not automatically
change the modification date of the files. To ensure such files are
crawled, select the option to crawl all updates or a full crawl.
- If the crawler is configured to do a full crawl, then the entire
crawl space is crawled and all documents that match your export criteria
are exported, regardless of whether the documents were updated or
deleted since the previous crawl.
- Imported documents
- All documents imported to a collection are passed to the parse
and index services. If you configure a collection to export documents,
all imported documents will be exported when they are processed by
the parse and index services.
- Rebuilding the index
- If the document cache is enabled for the collection, crawled and
imported documents are saved in the cache. When you rebuild the index,
documents in the cache are passed to the parse and index services.
Thus, documents can be exported by restarting the index build.
If you change the export options, such as enabling
the export of analyzed documents as XML files, you must restart the
parse and index services to reflect the change. Restarting the parse
and index services also initializes some export actions. For example,
if the collection is configured to export documents as CSV files,
the export process creates a directory and CSV files to save the exported
documents.
Exporting to IBM Cognos BI: If
you use IBM® Cognos® Business Intelligence, the wizard helps
you specify information for exporting documents to a relational database
or as comma-separated value (CSV) files. Also see related topics about
setting up the integration between Watson Explorer Content Analytics and IBM Cognos BI.
Exporting to IBM DB2: If you plan
to export documents to an IBM DB2® database, you must install the DB2 Client on the Watson Explorer Content Analytics server. In a distributed
installation, install the DB2 Client
on the master server. For the configuration, specify appropriate jar
files, such as db2jcc.jar and db2jcc_license_cu.jar, which are installed
with the DB2 Client.