Open, scalable analytics pipeline

As content is crawled, documents and records are fed into the Watson Content Analytics analytics pipeline.

Each item is converted to text and sent through a series of processing and analysis steps. Each step annotates the content item with additional information and cleans, clarifies, and extracts meaning from the item.

As provided, Watson Content Analytics includes a powerful set of annotators to annotate and extract meaning from content. The annotators:

Because Watson Content Analytics implements the open Unstructured Information Management Architecture (UIMA) framework, you can fully customize the processing of content and take advantage of existing UIMA annotators, both open source and commercial, to enhance your automated content analysis capabilities.

To develop custom annotators, you can use IBM Watson Content Analytics Studio as an alternative to manually developing annotators with the UIMA software development kit (SDK). IBM Watson Content Analytics Studio is a separately installable component of Watson Content Analytics.

Deep, automated analysis of content can be a computing-intensive operation. To support fast and efficient analysis of large amounts of content, Watson Content Analytics provides a highly scalable implementation of the UIMA specification, which allows you to distribute content analysis across multiple machines.