Analyzing documents with a Watson Explorer Content Analytics pipeline

You can analyze documents with a Watson Explorer Content Analytics pipeline and review the resulting annotations in Content Analytics Studio. For example, you can determine how modifications that you make in Watson Explorer Content Analytics affect the annotations that are produced by the pipeline.

Before you begin

Before you can analyze documents with a Watson Explorer Content Analytics pipeline, you must configure a Watson Explorer Content Analytics server connection file.

About this task

You can send sample documents in your Content Analytics Studio project to a Watson Explorer Content Analytics server to be annotated by the document processing pipeline that is associated with a particular collection. The resulting annotations are then returned to Content Analytics Studio for your review and analysis.

You can analyze a single text file or a collection of documents with the Watson Explorer Content Analytics pipeline. Analyzing a single document is useful to identify the UIMA annotation types that are currently generated by the pipeline on the Watson Explorer Content Analytics server. You can annotate a collection of files to see the results of the pipeline on a larger set of documents. The results can be saved and compared with results from other versions of the pipeline to analyze how performance is affected by modifications that you made in Watson Explorer Content Analytics, such as changes to the analytic resources.

Analyzing documents with a pipeline on a Watson Explorer Content Analytics server can also help you test the pipeline that you are developing in Content Analytics Studio:
  • You can determine how the existing Watson Explorer Content Analytics pipeline compares with the UIMA pipeline that you are developing in Content Analytics Studio.
  • After you develop and test your UIMA pipeline in Content Analytics Studio, you can export the pipeline to Watson Explorer Content Analytics and verify that the annotation results are as you expect.

Procedure

To annotate documents on a Watson Explorer Content Analytics server

  1. In the Studio Explorer view, select the documents to analyze.
    Option Action
    Analyze a single document
    1. Open a document in the editor view.
    2. Right-click the document in the editor view and click Analyze Document with Watson Explorer Content Analytics.
    Annotate a collection of documents
    1. In the Studio Explorer view, select one or more documents or folders that contain documents.
    2. Right-click the selected documents, and select Analyze Collection with Watson Explorer Content Analytics
    .
  2. In the File Selection window, select the Watson Explorer Content Analytics ICACONFIG connection file that you defined in the Configuration/Servers directory of your project.
  3. In the Collections window, select the Watson Explorer Content Analytics collection that is associated with the pipeline that you want to use to annotate the documents.

Results

The files are sent to the Watson Explorer Content Analytics server and annotated by using the pipeline of the specified collection. The resulting annotations are returned to Content Analytics Studio and displayed as if they were generated by a local UIMA pipeline. The annotations that are generated by the default Watson Explorer Content Analytics text analysis engines are displayed in the default view of the Outline view. To view the annotations that are generated by a custom text analysis engine, select the common analysis structure (CAS) view that you use for custom text analysis from the View list. If you exported the custom pipeline from Content Analytics Studio to Watson Explorer Content Analytics, the name of the associated CAS view is lrw-view.

What to do next

If you analyzed a collection of documents, you can click the Save icon in the Collection Analysis view to save the results as an ANNOTATION file in the Results directory. To compare the current results to a saved annotation file that was generated by a previous version of the pipeline on the Watson Explorer Content Analytics server or by a UIMA pipeline in Content Analytics Studio, click the Compare annotations icon.