IBM® Informix® 12.10

# Fuzzy searches

A fuzzy search searches for text that matches a term closely instead of exactly. Fuzzy searches help you find relevant results even when the search terms are misspelled.

To perform a fuzzy search, append a tilde (~) at the end of the search term. For example the search term bank~ will return rows that contain tank, benk or banks.

bts_contains(column, 'bank~')

You can use an optional parameter after the tilde in a fuzzy search to specify the degree of similarity. The value can be between 0 and 1, with a value closer to 1 requiring the highest degree of similarity. The default degree of similarity is 0.5, which means that words with a degree of similarity greater than 0.5 are included in the search.

The degree of similarity between a search term and a word in the index is determined by using the following formula:

similarity = 1 - (edit_distance / min ( len(term), len(word) ) )

The edit distance between the search term and the indexed word is calculated by using the Levenshtein Distance, or Edit Distance algorithm. The min() function returns the minimum of the two values of the len() functions, which return the length of the search term and the indexed word. The following table shows the values used to calculate similarity and the resulting similarity between the search term "tone" and various indexed words.

Table 1. Sample set of comparisons
Term Length of term Word Length of word Edit distance Similarity
tone 4 tone 4 0 1.00
tone 4 ton 3 1 0.67
tone 4 tune 4 1 0.75
tone 4 tones 4 1 0.75
tone 4 once 4 2 0.50
tone 4 tan 3 2 0.33
tone 4 two 3 3 0.00
tone 4 terrible 8 6 -0.50
tone 4 fundamental 11 9 -1.25

For example, the following query searches for words with the default degree of similarity of greater than 0.50 to the search term tone:

bts_contains(text, 'tone~')

This query returns rows that contain these words: tone, ton, tune, and tones. Rows that contain the word onceare not included because the degree of similarity for once is exactly 0.50, not greater than 0.50. The following query would include the rows that contain the word once:

bts_contains(text, 'tone~0.49')
Tip: Test the behavior of specifying the degree of similarity with your data before you rely on it in your application.

If the number of indexed tokens that match your fuzzy query exceed 1024, you receive the following error:

(BTSB0) - bts clucene error: Too Many Clauses

To solve this problem, you can make the query more restrictive or you can recreate the bts index with the max_clause_count index parameter set to a number greater than 1024.

To find the PDF, see Publications for the IBM Informix 12.10 family of products.
For the release notes, documentation notes, and/or machine notes, see the Release Notes page.