Technical Blog Post
Add your insights to user queries - Query expansion and document ranking with IBM Content Analytics with Enterprise Search V3.0
Jane Singer works in the IBM Content Classification development team in the Software Labs in Jerusalem Israel. Jane leads the L3 support team for Content Classification. She has also worked in the OmniFind Enterprise Search L2, QA for Case Manager and Content Navigator mobile teams. She has written multiple DeveloperWorks articles and is the author of IBM Classification Module: Make It Work for You.
Users formulate search queries according to terms they are familiar with. Concepts that are used in documents are typically presented in a somewhat formal term. Users who perform search queries, however, typically use informal language to search for these concepts. There is no way to predict exactly how a user formulates his or her search.
One approach to deal with concept synonyms is to build synonym dictionaries and add terms to the index. Such an approach requires re-indexing, and can be expensive, in terms of time and index size.
In the upcoming IBM Redbooks publication, IBM Content Analytics with Enterprise Search: Discovering Actionable Insight from Your Content, I included step-by-step instructions for techniques to tune your search strategy.
IBM Content Analytics with Enterprise Search (ICAwES) V3.0 has new functionality which allows you to add query expansions at runtime, with no need for re-indexing. Each expansion consists of two rules, the rule for matching and the rule for expansion.
Matching rules can be based on keywords configured within the rule, regular expressions, or dictionary lookups. Each dictionary listing contains a preferred term and any number of synonyms, acronyms, etc.
Expansion rules can supplement the search (“monitor or screen”) or replace the matched term with dictionary lookup, regular expression, or keyword.
Some search result may contain a lot of irrelevant documents or “noise.” Document ranking can “improve” query result precision by “hiding” less relevant results from the user in pages he or she will never access.
A document can receive a boost score based on a range of ranking factors:
- How the expansion rule is matched
- The fields where the match occurs (matches in the title field may be more relevant than matches in the abstract field)
- Aggregation: groups can be prioritized according to independent ranking (PDF documents ranked higher than text files)
Document ranking is based not only on relevancy to the user search, but also on the quality and usefulness of the documents.
For more information see the “Query expansion” chapter of the upcoming IBM Redbooks publication.
For IBM Content Analytics with Enterprise Search related BLOG posts, see:
- Unlock the hidden value of your unstructured content and gain new business insight: Really ? For us ?
- Identifying meaningful token with IBM Content Analytics Studio
- Add your insights to user queries - Query expansion and document ranking
- IBM Content Analytics - Import and Export OOTB (Out Of the Box)
For IBM Redbooks publication, see:
- IBM Content Analytics with Enterprise Search: Discovering Actionable Insight from Your Content (coming soon!)