The system components collect data from throughout your
enterprise; parse, analyze, and extract meaning from the information;
and create a text index that users can query.
A collection represents the set of sources that users
can search and mine with a single query. When you create a collection,
you specify which sources you want to include and configure options
for how users can query the indexed data.
You can create multiple collections, and each collection can contain
data from various data sources. For example, you might create a collection
that includes documents from IBM® DB2® and IBM Content
Manager Enterprise Edition databases, or a collection
that includes documents from IBM FileNet® P8 object
stores and Microsoft SharePoint
repositories. When users query a collection, the results potentially
include documents from each of the data sources.
To help you get started quickly, Watson Explorer Content Analytics provides several feature
packages. Each package contains predefined configuration settings,
typically designed for a specific purpose or industry. When you create
a collection by selecting a feature package, the settings are automatically
applied. If the package includes sample data, the parsing and indexing
processes begin automatically.
The type of collection that you create determines which functions
are available for configuring the collection:
- Enterprise search collection
- These collections support search and retrieval functions, including
the ability to browse and narrow results by selecting facets, sort
documents by relevance or date, preview documents in the search results,
and view thumbnail images of certain types of documents. You can choose
to enable some analytic features for searching these collections,
such as the ability to see correlation scores and how results flow
along a timeline chart.
- Content analytics collection
- In addition to search, these collections support content mining
functions, such as the ability to explore correlations, deviations,
and trends in your data. You can also export analysis results to data
warehouse or business intelligence applications, and generate reports
that can be saved in comma-separated value (CSV) format or opened
with IBM Cognos® Business Intelligence tools.
Creating and administering a collection involves the following
activities:
- Collecting data
- The crawler components collect documents from data
sources, either on a continual basis or according to a schedule that
you specify. Frequent crawling ensures that users always have access
to the latest information. In addition to crawling data sources, you
can add content to a collection by importing CSV files.
- Analyzing data
- The analytics pipeline extracts text from documents,
does linguistic analysis, finds meaningful word and phrases, extracts
entities, and performs custom analysis on each document. The detailed
content analysis provides facets of data that can be used for exploring
the content.
- Indexing data
- The index components add data from new and changed
documents to the index. The index components also do global analysis
of the documents in a collection to determine correlation scoring
or to detect duplicate and nearly duplicate documents. In a content
analytics collection, a separate index can be created for facets.
You can also create an overlay index to exclude words
and phrases from the search results.
- Searching and mining content
- An enterprise search application provides an interactive
graphical interface for finding and retrieving specific documents.
The content analytics miner provides an interactive graphical
interface for exploring analyzed content to discover relationships
and anomalies.