Add a ranker

You can create a ranker with the Create Ranker wizard.

The wizard steps are described below.

Create Ranker

Specify the name and description of the ranker. Select the ranker type from the drop-down list.

Add a dataset to your collection

You can select an existing dataset that has already been defined from the drop-down list. Alternatively, you can create a new dataset by uploading a CSV file or by crawling the file system.

Upload CSV
For instructions on uploading a CSV file, see Importers.
File System
Before crawling the file system, you must provide IBM Watson® Explorer oneWEX access. For more information, see Providing access to the local filesystem from Watson Explorer oneWEX. You can select multiple directories to crawl. Subdirectories will also be crawled.

After you create a dataset, the dataset is crawled for data. When the crawl has completed, you can proceed to the next step.

Training Data

Applies to version 12.0.2.2 and subsequent versions unless specifically overridden Specify who provides training data for training the classifier. Select an existing collection or select *New collection* to create a new collection. The collection should include ground truth labels.
Note: A collection selected here is prevented from deleting until the classifier / ranker is deleted.

Similar Document Ranking Setting

Specify fields for machine learning training and prediction. The following fields are required.

Answer Field
Specifies the field to use as the answer field.
Answer Field Type
Specifies the answer field type.
Attribute Type
The answer field contains a list of types.
ID Type
The answer field contains a list of similar document IDs.
Collection Template
Specifies the name and description of the collection template that the ranker generates. You must specify this template to create a collection that uses this ranking.

Configure collection fields

Select the title, body, and timestamp fields, which are typically used by applications, and metadata fields to initially configure this collection. For advanced usages, you can further configure the fields after creating a collection.

You can configure the following fields.

Body field
Specifies unstructured text content data to be analyzed. For an analytics collection, the enrichment process enriches this field in order to analyze documents in later stages. For a search collection, the field is tokenized for a better search precision.
Title field
Specifies the document title. Document titles are used in various ways in IBM Watson Explorer Content Miner. For example, the Documents view has a Title column. In both analytics and search collections, this field is tokenized for better search precision.
Date field
Specifies the document date. The document date is used in the Documents view as the DATE column, and is also used in time series bases analytics view such as Time Series, Topic, and Trends view.
Metadata Facets
Select fields you want to use as facets for your analysis. You cannot select body field or title field. Fields selected here are treated as facet values and will be displayed in the Facet tree. You can use these facet values in Watson™ Explorer Content Miner analysis views. This is a very important step because Watson Explorer Content Miner requires facets for text analytics processes.
Note: Whether or not you select these fields, IBM Watson Explorer oneWEX will use all metadata facet fields.

Enrich your collection

Enrichment is a process to generate annotations from unstructured text content. Only existing annotations are listed here, but you can create and apply more later. Enrichments selected here are applied to analyzable text fields (body and title fields in typical collections).

Annotators
Select annotators to be enabled for this collection. Selected annotators enrich the body text content. The Part of Speech annotator is selected by default. For more information, see c_ee_adm_annotators.html#c_ee_adm_annotators.
Classifiers
Select classifier modules to be enabled for this collection. Selected classifiers are used to classify results into categories. For more information, see Classifiers.
Language identification
Specify how a language used in the enrichment process applied to text content is determined. Choose automatic detection or a specific language. The following languages are supported.
  • Arabic, Czech, Danish, German, English, Spanish, French, Hebrew, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Russian, Slovak, Turkish, Chinese

Specify the facets for analysis

A facet is an unit of analysis. You analyze the unstructured content with facets and various statistics. Specifying meaningful labels for each facet is very important for your successful analysis.

You can check and confirm the available facets that were produced by selected annotators, classifiers and metadata fields in previous steps. You can modify these facets.

Confirm

Confirm the configuration. If you want to change these settings, go back to modify the wizard steps.