To use information from Watson Explorer Content Analytics for other purposes, such as data warehouse, business intelligence, classification, and eDiscovery or compliance applications, you can export documents from collections and then import the exported data into your applications.
When you configure options for exporting documents, you specify whether the documents are exported as XML files or CSV files to a file system, exported to a relational database, or exported according to a format and location that is specified by a custom plug-in. Watson Explorer Content Analytics does not provide any utilities for importing the exported documents into other applications.
In this scenario, assume that your content management system supports importing documents from a file system, but you also want to import documents that are stored in a BLOB column in a relational database. You can configure a database crawler to crawl the BLOB column and then export the crawled data to XML files on a file system. You can then import the exported files into your content management system.
In this scenario, assume that you want to analyze reports about defects in automobiles. The reports contain structured data, such as a problem code or the date of the report. Each report also includes a description of the customer's complaint and how the problem was addressed in free text format. For example, the problem report might be "Customer smelled burning odor under the hood" and the problem solution might be "Rusty connection to the fuel pump relay was replaced."
You can create an annotator to extract industry-specific keywords or patterns from unstructured text, such as the symptoms of a problem, the names of replacement parts, and so on. If you configure a collection to use the annotator, relevant unstructured data can be analyzed, extracted, and annotated when crawled documents enter the document processing pipeline. You can export this analyzed data to XML files, CSV files, or a relational database, and then import the exported data into your business intelligence application.
If you use IBM Cognos® Business Intelligence (IBM Cognos BI), you can configure Watson Explorer Content Analytics to export documents directly to a relational database. you can run online analytical processing (OLAP) queries against the reports to do a more in-depth analysis of both structured and unstructured data.
In this scenario, assume that you are asked by your legal compliance department to gather documents in response to a discovery request regarding patent infringement. The documents of interest are crawled, analyzed, and stored in the index. However, the collection also includes documents that are of no value to the current investigation. When you search the collection, specify criteria to limit the results to documents that are relevant to the discovery request. You can export the documents to XML files and then import the exported files into your eDiscovery system, such as IBM eDiscovery Manager.
In another scenario, assume that you need to train an IBM Content Classification knowledge base. When you query the collection, you can export documents that match your search conditions as XML files. When the documents are exported, a catalog.xml file that contains information about the fields in the documents is also exported. If you import the document XML files and catalog.xml file into Classification Workbench, you can use the data to train knowledge bases and decision plans. By repeatedly searching collections and exporting documents, you can improve how content is classified over time.
To export documents, you must enable the document cache for the collection.
When you configure export options for crawled or analyzed documents, you can specify whether the documents are to be added to the index. For example, if you use Watson Explorer Content Analytics primarily as a means to collect documents or collect analytical data about content, then you might want to export the documents without adding them to the index.
When you configure export options for searched documents, you can configure schedules to control when the documents are to be exported from the document cache. You can create a general schedule for all export requests and configure custom schedules for individual requests. You can also schedule the request to run on an incremental basis. In this case, only documents that were added to the index after the last time the export program ran are exported.