Add a collection

You can add a collection using the IBM Watson® Explorer Admin Console or the Watson™ Explorer Content Miner.

Click Add Collection on the Collections page of Watson Explorer Admin Console or Watson Explorer Content Miner, and then choose a template on the Collection Template page.

The following default collection types are available.

Content Mining
This is a collection for general content mining. (Version 12.0.1 or later.)
Sentiment Analysis
This is a collection for sentiment analysis.(Version 12.0.1 or later.)
This is a text analytics collection. (Version 12.0.0 only.)
  • For IBM Watson Explorer Admin Console, this is an enterprise search collection type that you can choose for use in IBM Watson Explorer oneWEX Application Builder. (Version 12.0.2 and later.)
  • For Watson Explorer Content Miner, this template does not exist. (Version 12.0.2 and later)

There may be other collection templates available if you have created new collection templates by deploying rankers.

Applies to version and subsequent versions unless specifically overridden If IBM Watson Explorer oneWEX is installed on IBM® Cloud Private with multiple index partitions, the following advanced options are available.

Number of index partitions
Specifies number of index partitions. Set the same number as the number of pods for the Discovery service to maximize parallelism of a single query request. This option can be set only when creating a collection.
Enable index replication
Check if you want to enable a backup index replica. Enabling this option improves service availability. Even if a pod goes down, another pod works as a backup. Note that required system resource will be doubled. This option can be set only when creating a collection.

After you select a collection type and click Next, provide a name and a description for your collection. The collection creation wizard guides you through the rest of the collection creation process. These steps are described below.

Add a dataset to your collection

You can select an existing dataset that has already been defined from the drop-down list. Alternatively, you can create a new dataset by uploading a CSV file or by crawling the file system.

Upload CSV
For instructions on uploading a CSV file, see Importers.
File System
Before crawling the file system, you must provide IBM Watson Explorer oneWEX access. For more information, see Providing access to the local filesystem from Watson Explorer oneWEX. You can select multiple directories to crawl. Subdirectories will also be crawled.

After you create a dataset, the dataset is crawled for data. When the crawl has completed, you can proceed to the next step.

Configure collection fields

Select the title, body, and timestamp fields, which are typically used by applications, and metadata fields to initially configure this collection. For advanced usages, you can further configure the fields after creating a collection.

You can configure the following fields.

Body field
Specifies unstructured text content data to be analyzed. For an analytics collection, the enrichment process enriches this field in order to analyze documents in later stages. For a search collection, the field is tokenized for better search precision.
Title field
Specifies the document title. Document titles are used in various ways in IBM Watson Explorer Content Miner. For example, the Documents view has a Title column. In both analytics and search collections, this field is tokenized for better search precision.
Date field
Specifies the document date. The document date is used in the Documents view as the DATE column, and is also used in time-series based analytics view such as Time Series, Topic, and Trends view.
Metadata Facets
Select fields you want to use as facets for your analysis. You cannot select the body field or the title field. Fields selected here are treated as facet values and will be displayed in the Facet tree. You can use these facet values in Watson Explorer Content Miner analysis views. This is a very important step because Watson Explorer Content Miner requires facets for text analytics processes.

Enrich your collection

This step does not apply to search collections.

Enrichment is a process to generate annotations from unstructured text content. Only existing annotations are listed here, but you can create and apply more later. Enrichments selected here are applied to analyzable text fields (body and title fields in typical collections).

Select annotators to be enabled for this collection. Selected annotators enrich the body text content. The Part of Speech annotator is selected by default. For more information, see c_ee_adm_annotators.html#c_ee_adm_annotators.
Select classifier modules to be enabled for this collection. Selected classifiers are used to classify results into categories. For more information, see Classifiers.
Language identification
Specify how a language used in the enrichment process applied to text content is determined. Choose automatic detection or a specific language. The following languages are supported.
  • Arabic, Czech, Danish, German, English, Spanish, French, Hebrew, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Turkish, Chinese

Specify the facets for analysis

A facet is an unit of analysis. You analyze the unstructured content with facets and various statistics. Specifying meaningful labels for each facet is very important for your successful analysis.

You can check and confirm the available facets that were produced by selected annotators, classifiers and metadata fields in previous steps. You can modify these facets.

You can specify the default visualization for each facet.

Applies to version 12.0.3 and subsequent versions unless specifically overridden Value Filters allows you to specify a set of regular expressions to filter noisy values from your analysis results. For example, if Part of Speech contains TR, and you set the filter TR on the Part of Speech facet, every analysis result makes TR invisible. Click Edit... to open the Value Filters dialog. Specify a list of regular expressions, and click Apply. Please make sure to click Save after Apply. Just clicking Apply does not save your settings. You can use Javascript regular expressions in your browser.
Note: All filtering is done on the client side. If you specify a complex query, the performance is not guaranteed.
Note: This option is a simple UI filter function. All statistics are calculated including the filtered value. Just hide those values in the UI side after all analysis is done. This is different from the Stop Words function in the Exploration tab.
Note: Value Filters are not applied to descendants and ancestors. If you specify some filter to a Part of Speech, that filter is not applied to its descendants, like "Noun" and "Verb". If you want to apply the filter, you need to set the same filter to those descendants as well.

You can specify a range rule for facets that are of type Long, Double, and Date. Click Edit next to the facet and modify the range rule. For further details, see Interval Faceting.

Save your collection

Applies to version 12.0.1 and subsequent versions unless specifically overridden You can select Enable Domain Adaptation Curator to facilitate natural language processing in Watson Explorer Content Miner. For more information, see Domain Adaptation Curator.

You can choose what occurs after you save your collection. There are three choices if you created your collection in Watson Explorer Content Miner but only one choice if you created it in Watson Explorer Admin Console.

Run indexing now
The indexing process starts soon after the collection is created. Available in Watson Explorer Content Miner and Watson Explorer Admin Console.
Open the collection to configure advanced options
Watson Explorer Content Miner opens the edit page for the collection in order to review or change the collection configuration. Available in Watson Explorer Content Miner only.

To run indexing, select Start Index in the collection card of Watson Explorer Content Miner or the collections page of Watson Explorer Admin Console.

Do nothing
The indexing process does not start after the collection is created. Available in Watson Explorer Content Miner only.

To run indexing, select Start Index in the collection card of Watson Explorer Content Miner or the collections page of Watson Explorer Admin Console.