Overview

Important: For ease of use, you should use the Terminology Manager module of the Results Module to create Thesaurus-Driven semantic expansions. Data-Driven semantic expansion is still configured in the manner described in this tutorial.
Note: Ontolections that are created by using the methods described in this tutorial (crawling a taxononmy or data driven conceptual search) are not associated with any Results Module environment. This means that, when specified in a project, the ontolection will be used for semantic expansion regardless of the environment settings for that installation of Watson™ Explorer Engine.

Helping users find the "right" search results is the ultimate goal of any application, and depends both on what users are looking for and the queries that they use to try to find that information. To improve the quality and usability of search results, Watson Explorer Engine has always enabled application developers to influence the prominence (ranking) of selected search results, and has introduced mechanisms such as clustering that make it easier for users to identify and explore related sets of results. However, finding the right results still boils down to asking the right questions - which, in the search field, means starting with the right queries and query terms.

Conceptual search techniques make it easy for users to expand their queries by automatically suggesting related terms to use in that query. This provides an easy-to-use but sophisticated mechanism for discovering search results that you might otherwise have missed due to your initial choice of query terms.

When conducting a search, the users of an application typically have one of two goals in mind: either they are trying to find a specific piece of information, or they are doing general research on a topic. In information retrieval terms, such users are, respectively, either interested in the precision of their results or their completeness (commonly referred to as recall). Standard web search mechanisms focus on locating specific documents (precision), while conceptual search focuses on expanding the number of potentially relevant results that are returned. This increases recall but inherently reduces precision because search results are being returned based on matching any of a larger set of terms.

Note: When using query expansion with Microsoft Internet Explorer, long expansions can exceed Internet Explorer's URL character limit (2,083 characters). Contact product support for assistance with this issue.

There are three types of search expansion available through the Watson Explorer Engine application. These types are defined as:

  • Semantic Expansion -- when enabled, the Watson Explorer Engine allows automatic and manual expansion of the original query with synonyms and other related terms previously imported into an ontolection.
  • Wildcard Expansion -- when enabled, the Watson Explorer Engine replaces a wildcard term in a query with an OR'd combination of the words that match the wildcard pattern from all the dictionaries used by the project.
  • Stem Expansion -- when enabled, the Watson Explorer Engine replaces a stem-expanded word by an OR'd combination of the words having the same stem, using the stemmer(s) specified in the meta.stem_expand_stemmer variable.

The section entitled Stemming Expanded Query describes how to integrate stemming and semantic query expansion. More information on wildcard and stem expansion can be found in the Defining the Outputs for a Custom Dictionary section of this manual. This tutorial will concentrate on semantic expansion.

Watson Explorer Engine provides two approaches to semantic expansion:

  • Thesaurus-Driven: Watson Explorer Engine enables you to leverage information that you already have about synonyms, acronyms, and generally related terms and offer these as additional terms to use in subsequent queries. Many corporations maintain an enterprise dictionary or thesaurus that identifies business-specific terminology, synonyms, acronyms, and so on. Watson Explorer Engine applications can incorporate this time-tested, domain-specific information to provide a smarter and more site-specific search experience. Because the terms in a thesaurus have already been determined to be relevant to an enterprise or application domain, selecting any of the additional query terms that thesausus-driven conceptual search offers should have little impact on the precision of individual queries.
  • Data-Driven: Watson Explorer Engine can automatically identify related terms and concepts from a domain collection or a domain list of industry specific concepts which can be applied to a given search.

Thesaurus-driven and data-driven conceptual search can be used separately or together in any application, providing opportunities for expanding recall through pre-defined terms, automatically-identified terms, or both. Related terms that are revealed by these conceptual search techniques are displayed in a pop-up dialog that provides an easy way of selecting and de-selecting the additional terms that you want to factor into your search. This dialog also makes it easy for end users to iteratively repeat their search with the new terms to help them find what they are looking for. Because each new search combines newly-selected terms with existing query terms, this technique is often generically referred to as query expansion.

Adding enterprise data such as a thesaurus to a Watson Explorer Engine application is done by crawling that data to produce a special type of search collection that can be used internally by Watson Explorer Engine to identify and suggest related or alternate terms. The term ontology is commonly used in computer science to refer to modeling a domain of knowledge by identifying a set of basic terms and defining the relationships between them. Because the special type of search collection used by Watson Explorer Engine conceptual search support is based on a set of related terms that are specific to the domain of an application or enterprise, this type of search collection is referred to as an ontolection.

Note: Before working through this tutorial, you should already have worked through the Watson Explorer Engine Metadata Tutorial and the Watson Explorer Engine Refinement Tutorial.

This tutorial explains how to begin integrating Watson Explorer Engine conceptual search capabilities into your applications. This tutorial:

  • introduces the portions of the Watson Explorer Engine administration tool that you can use to configure and customize a basic thesaurus-driven conceptual search
  • introduces the portions of the Watson Explorer Engine administration tool that you can use to configure and customize a basic data-driven conceptual search
Note: The outline provided in the following section is intended as a general checklist, not as a complete list of steps for this task. The complete list of steps are in the tutorial section following the overview.

To proceed to the next section of this tutorial, click Prerequisites for This Tutorial.