This blog promotes knowledge sharing through experience and collaboration. For more product information, visit our WebSphere Commerce CSE page. For easier navigation, utilize the Categories to find posts that match your interest.
Search 'n' Rescue: Spellcheck (Did you mean?...)
From time to time, users will search for terms that are misspelled, causing no results to be returned for their search. In this scenario, we will attempt to perform spellchecking on the search terms used to see if we can find similar terms to use instead. If spellcheck is not tuned properly, then you will see either bad suggestions or no suggestions returned by spellcheck, making it difficult for users to find what they're looking for. To start, we will first review the different types of spellcheck available.
Types of Spellcheck
First thing to understand is what type of spellcheck is being used, as each spellcheck type uses different ways of determining the available terms for spellcheck.
To determine which one you're using, you can check the wc_spellcheck searchComponent definition in solrconfig.xml for the CatalogEntry search core. Here is an example from an OOTB solrconfig.xml configuration:
<searchComponent name="wc_spellcheck" class="solr.SpellCheckComponent">
<!-- name of the type on which basis SpellChecker query will be analyzed -->
Each of the alternate options has it's own pros and cons, depending on how your Search functionality is used. For example, if you only want a select few words available for spellchecking, it may be better for you to use FileBasedSpellChecker to easily restrict the spellcheck terms to what is defined in spelling.txt. However, you will now need to maintain a file of available spellcheck terms, which can be an excessive amount of work to maintain if your catalog regularly changes. Similarly, with IndexBasedSpellChecker, you will need to make sure that this spellcheck index is reindexed to keep it consistent, which adds time to how long indexing will take for the CatalogEntry core. For more information on these alternative spellchecking options, you can review Solr's documentation on Spell Checking.
The rest of this post will assume that you are using the out-of-box Commerce Search spellcheck (DirectSolrSpellChecker). Once you have confirmed the spellcheck being used, the next step is to test spellcheck against the index.
While you can test spellcheck through the storefront, you won't be able to troubleshoot spellchecking issues this way unless you collect the tracing requested in the Search Runtime MustGather. If you have access to directly query the CatalogEntry core, you can quickly get information about your spellchecking scenarios. For example, let's say you want to make sure that coffeee will provided coffee as a spellcheck suggestion. You can execute the following direct search query to confirm this:
Here, you can see that coffee is successfully returned as a spellcheck suggestion, so you now know that scenario works. However, let's say that you try this with hockee to make sure that hockey is returned (as you have recently added hockey equipment to the catalog):
Do I have the expected spellcheck suggestion in the index?
To confirm that you have the expected spellcheck suggestion in the index, you can simply query against the spellCorrection field. This is because the spellCorrection field is the one that the spellcheck component will be querying against to find possible suggestions. Continuing the hockey example, you can query against the spellCorrection field for the term like so:
Here, we can see that we have 25 catalog entries that contain hockey as a possible spellcheck suggestion. If you had 0 returned here for your given spellcheck suggestion, then you would have no chance to get that spellcheck suggestion returned, as the spellcheck component only considers terms available in spellCorrection as possible spellcheck suggestions. Now that you have confirmed that hockey does exist in the index, the next step should be to compare this result against the total number of documents in the index to see if we are hitting our minimum/maximum frequency ratio.
Spellcheck Frequency Ratio
In the CatalogEntry core's solrconfig.xml, we have two frequency ratios set for spellcheck:
<!-- maximum percentage of documents in which word can appear for the word to be considered as one to correct (0.01 value means 1%) -->
The maxQueryFrequency is the upper bound, which prevents terms that are too general from being used as a spellcheck suggestion, while thresholdTokenFrequency is the lower bound, preventing terms that are too specific from being used as a spellcheck suggestion. These default ratios may not make sense for your catalog structure, so you will want to tune these values to prevent expected suggestions from not being returned. For example, say you check the total index size and see that it is 1.2 million catalog entries:
If we now check the ratio of hockey when considering the total index size, we get 25/1,200,000 = 0.00002, which is definitely smaller than the default thresholdTokenFrequency. If you lowered the thresholdTokenFrequency to be 0.000001 and restarted the Search server, you should see hockey now returned as a spellcheck suggestion. In general, the larger your catalog, the more likely you'll need to reduce thresholdTokenFrequence to pick up your expected suggestions. Similarly, you may also need to increase maxQueryFrequency from preventing too many terms from being removed from possible spellcheck suggestions. If you followed these steps for directly testing your spellcheck suggestions, but the spellcheck suggestions are not being returned as expected for your storefront, there may be an issue with the spellcheck query parameters being used for the storefront.
Storefront spellcheck not working still?
If you collect the Search Runtime MustGather, you can look at the Final Solr query trace line to confirm the query parameters being used:
[12/27/16 15:56:17:485 EST] 0000008d SolrRESTSearc 1 com.ibm.commerce.foundation.server.services.rest.search.processor.solr.SolrRESTSearchExpressionProcessor performSearch(SelectionCriteria) Final Solr query expression: q=coffeee&rows=0&spellcheck.count=5&spellcheck=true&spellcheck.collate=true&spellcheck.collateExtendedResults=true&spellcheck.onlyMorePopular=false&spellcheck.accuracy=0.5&spellcheck.alternativeTermCount=5&spellcheck.maxResultsForSuggest=3&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&spellcheck.q=coffeee
You can then review the different spellcheck parameters to confirm if there are any unexpected settings, which you can then compare against the set configurations in your CatalogEntry core's solrconfig.xml as well as wc-component.xml (Search_eardir/xml/config/com.ibm.commerce.catalog/). For example, if spellcheck.accuracy is set higher than you expect, then you should update the SpellCheckAccuracy in wc-component.xml to match your expectations. For more information on the different spellcheck parameters available in Commerce Search, you can review the Spell Correction section for the Knowledge Center page on wc-component.xml configurations.
If you have any helpful tips for troubleshooting spellcheck issues, or if you had any questions about the information provided, do not hesitate to post a comment below!