IBM Support

Configuring Natural Language Query capabilities in Watson Explorer Analytical Components

Fix Readme


Abstract

This document describes how to configure Natural Language Query(NLQ) capablities in Watson Explorer Analytical Components, introduced from Version 11.0.2 Fix Pack 1.

Content

Enabling Natural Language Query(NLQ) in Analytical Components

Follow these steps to enable Natural Language Query (NLQ) capabilities in Watson Explorer Analytical Components as an administrator.

  1. Login to the administrative console and create a collection. Note that both Enterprise Search Collection and Content Analytics Collection support NLQ capabilities.
  2. Import or crawl data to the collection and make sure the data is ready for search or analysis.
  3. Issue esadmin check to check if nlqservice.node1 session is started. If the session is not started, issue esadmin nlqservice.node1 start. When using a distributed environment, start the session on each search node, such as esadmin nlqservice.node2 start.

Performing NLQ on Content Miner

Follow these steps to perform Natural Language Query on Search or Content Miner UI after nlqservice session is started and the collection is ready for analysis,

  1. Open Search UI or Content Miner UI, and select NLQ check box next to the search box.
  2. Input some natural language passages to suit with the data in the collection, and click Start to search button. For example, "what can be a problem caused by oversensing of noise" to analyze car accident report.
  3. After the result appears on the screen, see how the natural language query is converted. For example, "what can be a problem caused by oversensing of noise" is converted to "can OR problem OR caused OR oversensing OR noise".

Notes:
  • AND operator is converted to OR operator in case the number of remained keywords is higher than the threshold. The default threhold value is 5.
  • Threshold value is configurable. See Configuring Query Modifier for detailed steps.

Configuring Query Modifier

Query Modifier behavior is configured with querymodifier.yml file. Follow these steps to configure Query Modifier behavior.

  1. Open $ES_NODE_ROOT/master_config/nlqservice/querymodifier.yml file.
  2. Edit the file to customize the behavior, and save the changes. For detailed examples, see Configuration example.
  3. Issue esadmin config sync to reflect the change.
  4. Issue esadmin nlqservice.node1 stop and esadmin nlqservice.node1 start to restart the session. Restart the session on the each search node in the distributed search node environment.

Query Modifier Configuration examples

Following examples shows how to configure Query Modifier at step 2 in Configuring Query Modifier.

Configuring Disjunctify Parse Strategy

Follow these steps to change the Disjunctify Parse Strategy configuration that converts AND operators into OR operators if the operator has more terms than the threshold.

  1. Find disjunctify: section in the querymodifier.yml file.
  2. Change minimumRequiredTerms: from 5 to any value. In this example, change the value from 5 to 10.

This configuration results as follows, for example:
  • A query "what can be a problem caused by oversensing of noise" is converted to "can problem caused oversensing noise" using AND operator, because it is not more than the threshold (10).

Configuring Part of Speech Based Removal Parse Strategy

Follow these steps to change Part of Speech Noiseword Removal configuration configuration that removes words based on the part-of-speech(POS) tag.

  1. Check POS tags to be removed. See https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.aqlref.doc/doc/parts-of-speech.html for POS tags list.
  2. Find noisePOSTags: section in the querymodifier.yml file.
  3. Add the POS tag to default: section. In this example, MD is the POS tag for modal, so add MD.

The configuration results as follows, for example:
  • A query "what can be a problem caused by oversensing of noise" is converted to "problem caused oversensing noise" and "can" is removed from the query, because it is modal in the query.
  • A query "The entire unit including the spray can was evaluated" is converted to "entire unit including the spray can evaluated", because "can" is not modal but noun in the query.

Configuring Dictionary-Based Removal Parse Strategy

Follow these steps to remove words with a based on the dictionary that you configure.

  1. Find noiseWords: section in the querymodifier.yml file.
  2. Add a word that you want to remove from the query. In this example, add "caused".

The configuration results as follows, for example:
  • A query "what can be a problem caused by oversensing of noise" is converted to "problem oversensing noise", because "caused" is configured to be removed in the configuration.

Configuring NLQ keyword expansion with Ontolection

Follow these steps to perform NLQ keyword expansion with Ontolection trainer.

1. Preparing Training Data for Ontolection

Prepare the training data that you want to use. Make sure that the system is configured to map the text data to Analyzable field.

2. Creating Ontolection Model

Run the java program from the command line to create Ontolection model, and wait until the model is created. How to specify the file path depends on the OS platform.

$ES_INSTALL_ROOT/_jvm/jre/bin/java -jar $ES_INSTALL_ROOT/lib/ontlectiontrainer.jar -trainOntolection -corpus <training_data> -pear $ES_INSTALL_ROOT/packages/pears/<language>.pear -persistModel <model_output_location> -outputPath <output_location>

For example,

$ES_INSTALL_ROOT/_jvm/jre/bin/java -jar $ES_INSTALL_ROOT/lib/ontlectiontrainer.jar -trainOntolection -corpus ./FDA-Sense.csv -pear $ES_INSTALL_ROOT/packages/pears/en.pear -persistModel ./myOntolectionModel -outputPath ./myOntolection

Notes:

  • The processing time change based on the size of training data, from minutes to several days.
  • The created ontolection file (not ontolection model) shows which keywords are learned as semantically similar. In this example, check the "myOntolection" file.

3. Configuring Ontolection

Follow these steps to configure Ontolections after the ontolection model is created.

  1. Copy the created ontolection model to a directory, such as $ES_NODE_ROOT/esdata/nlq/myOntolectionModel .
  2. Open $ES_NODE_ROOT/master_config/nlqservice/querymodifier.yml file.
  3. Find ontolections: section, and add <collection id>: <path to the ontolection model> entry under wordEmbeddingsMap: section, for example:

    ontolections:
    wordEmbeddingsMap:
    col_12345: /home/esadmin/esdata/nlq/myOntolectionModel
  4. Issue esadmin config sync to reflect the change.
  5. Issue esadmin nlqservice.node1 stop and esadmin nlqservice.node1 start to restart the session. Restart the session on the each search node in the distributed search node environment.

Notes:
  • Place the ontolection model other than $ES_NODE_ROOT/esdata/master_config/<collection id>.indexservice directory, though it is recommended in the comment of $ES_NODE_ROOT/master_config/nlqservice/querymodifier.yml file.
  • Place the ontolection model manually on each search node in the distributed environment, if the directory specified in the $ES_NODE_ROOT/master_config/nlqservice/querymodifier.yml file is not shared across the search nodes. All servers share the querymodifier.yml file, so you need to place the model file in the same path on each node.

Confirming NLQ keyword expansion result with Ontolection on Content Miner

Follow these steps to confirm the NLQ keyword expansions with Ontolection on Content Miner.

  1. Open Content Miner UI.
  2. Search documents with NLQ and click "Show advanced" next to the search box.
  3. Open "Query Expansion" tab.

In the "Query Expansion" tab, "Ontolection" suggestion type indicates the suggestion is from Ontolection.

Configuring Query Keyword Suggestions with Ontolection

Follow the steps to configure query keyword suggestions with Ontolection as an administrator.

  1. Login to the administrative console as an administrator and select a Content Analytics Collection, and select "Search quality management" at search server configuration.
  2. Select "Edit configurations" in Action menu on "Query Configuration" tab.
  3. Change "Search result threshold for spelling correction:" value from 100 to any value on "Edit Configuration" dialog, and click "Apply" button.

Notes:
  • Query keywords suggested by Ontolection is displayed as a type of "spelling correction".
  • Spelling correction keywords are displayed when the number of search results are fewer than the threshold. The default threshold value is 100. For example, when the number of search results is 1000 and the threshold is 100, the query keywords suggested by Ontolection will not be displayed.

[{"Product":{"code":"SS8NLW","label":"IBM Watson Explorer"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"11.0.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 June 2018

UID

swg22004851