Edit a collection

You can edit an existing collection and modify the fields, enrichment, and facets that were defined when the collection was created. You can also specify options for exploring the collection.

You can open a collection for editing in IBM Watson® Explorer Admin Console or in IBM Watson Explorer Content Miner. The editing screen has six tabs where you can modify the collection. There is also a sidebar that displays information about the collection.

Edit

You can modify the name and description of the collection. You can also set the time zone of the collection. This setting is used in date-related analysis.

Applies to version 12.0.2 and subsequent versions unless specifically overridden If this is a secured collection, you can disable pre-filtering or post-filtering of search results.

Ignore document-level access controls in the index: Disable pre-filtering of search results.
Do not validate current credentials before returning results: Disable post-filtering of search results.

For more information, see Document-level security.

Applies to version 12.0.1 and subsequent versions unless specifically overridden You can view, but not change, the setting of the Domain Adaptation Curator.

Applies to version 12.0.2.1 and subsequent versions unless specifically overridden If IBM Watson Explorer oneWEX is installed on IBM® Cloud Private with multiple index partitions, you can view, but not change, the following advanced options.

Number of index partitions: Specifies the number of index partitions.
Enable index replication: Enables a backup index replica.

Fields

You have more control over fields when editing a collection than you did when you created the collection. You can specify field indexing options in the Field indexing option table. The table columns are described here.

Fields

The dataset field name.

Field type

The dataset field type.

Index type

A drop-down list of index types. The allowable types depend on the field type.

For example, for Field type=String, Index type options are:

String - case sensitive, exact match
Analyzable text content - case normalized, fuzzy match
Tokenized text - case normalized, text is tokenized (broken up into meaningful elements, or tokens)

Free text searchable

Makes the field available for a free-text search.

Metadata facet

Specifies whether to use the field as a metadata facet. Not applicable for analyzed text content or tokenized text.

You can enable or disable n-gram segmentation.

You can modify the title and date fields. You can only view the body field.

Enrichment

The Enrichment tab is identical to the enrichment step when creating a collection.

Facet

The Facet tab is identical to the facet step when creating a collection.

Exploration

You can configure options to improve the precision of search results, ensuring that the most relevant results are ranked higher in the result set, and to customize query results by associating dictionaries.

Ranker

You can select a ranker from the drop-down list. For more information, see Rankers.

Natural language query

Improve the user search experience by using Natural Language Processing. The following options are available.

Disjunctify threshold

The minimum number of annotations required for disjunctify strategy to take place.

Maximum number of annotations

The maximum number of annotations taken into account to build query.

Annotation conversion strategies

Specify how each annotation in the original query is converted. Click Edit to open the Conversion Strategy Configuration dialog. The following strategies are available:

Original text: Converts the annotation into a multi-term query. Search documents which contain the terms in the annotated span.
Refinement: Convert the annotation into a facet refinement query. Search documents which have the same annotation with the annotated query.
Facet value: Convert the span into a multi-term query. Search documents which contain the terms in the annotation's facet value.
Phrase: Convert the annotation span into a phrase query. Search documents which contain the exact same sequence of terms in the annotated span.

Blacklist words

A list of noise words to be removed from the query. Blacklist filters out terms from natural language query.

Whitelist words

A list of words that are candidates for query terms.

Stop words

You can use a stop words dictionary. Click Upload to upload a stop words dictionary. The dictionary is a UTF-8-encoded text file with one stop word on each line. Stop words filter out query terms from normal queries.

Synonym

Upload a synonym list file. Each line is set of comma-delimited words that are synonyms. The file must be encoded as UTF-8.

Spotlight

Matches user query text to a map of top results, configured using an elevate.xml file. For more information, see The Query Elevation Component.

Document relevancy score

Modify document query relevancy score by boosting specified fields. Click Edit and then select a boost factor for a field selected from the drop-down list.

If there are non-field search prefixes in the query, all the fields configured here will be searched.

Training of machine learning models

Select machine-learning models. You can select Vector representation of words or Document recommendation on collaboration activities.

Document flags

Applies to version 12.0.1 and subsequent versions unless specifically overridden You can enable and create document flags on the Document flags tab. After you have created flags, you can apply them to documents in IBM Watson Explorer Content Miner. For more information, see Documents view.

The Document flags tab displays existing flags which you can edit or delete. You can also add new flags by clicking Add flag. After you click Add flag, the Add flag dialog opens, where you can name the new flag, and add a description and set the flag color.

Edit

You can modify the name and description of the collection. You can also set the time zone of the collection. This setting is used in date-related analysis.

Applies to version 12.0.2 and subsequent versions unless specifically overridden If this is a secured collection, you can disable pre-filtering or post-filtering of search results.

Ignore document-level access controls in the index: Disable pre-filtering of search results.
Do not validate current credentials before returning results: Disable post-filtering of search results.

For more information, see Document-level security.

Applies to version 12.0.1 and subsequent versions unless specifically overridden You can view, but not change, the setting of the Domain Adaptation Curator.

Applies to version 12.0.2.1 and subsequent versions unless specifically overridden If IBM Watson Explorer oneWEX is installed on IBM Cloud Private with multiple index partitions, you can view, but not change, the following advanced options.

Number of index partitions: Specifies the number of index partitions.
Enable index replication: Enables a backup index replica.

Index rebuilding after modifying a collection

When you make changes to a collection and click Save, you may get either a Rebuild Index dialog or a Warning dialog. These are described below.

Rebuild Index

You will see this dialog if you modify metadata facets or enrichment options. You have three choices.

Cancel: Saving of the collection is canceled.
No: The collection is saved but the modified settings are applied only to documents which are processed after the indexer process is updated with the saved settings. You can do this explicitly by stopping and then restarting the indexer.
Yes: The collection is saved and all documents are reevaluated and indexed with the new settings.

Warning

You will see this dialog if you modify the index type or sortable options, or enable n-gram segmentation. You can either cancel the save or select Restart a full index build and click OK. In this case, the collection is saved and all the indexed data is erased and then IBM Watson Explorer starts recreating indexes for all of the documents.