Fuzzy Grouping

In the Extraction Settings dialog, if you select Accommodate spelling for a minimum root character limit of:, you have enabled the fuzzy grouping algorithm.

Fuzzy grouping helps to group commonly misspelled words or closely spelled words by temporarily stripping all vowels (except for the first vowel) and double or triple consonants from extracted words and then comparing them to see if they are the same. During the extraction process, the fuzzy grouping feature is applied to the extracted terms and the results are compared to determine whether any matches are found. If so, the original terms are grouped together in the final extraction list. They are grouped under the term that occurs most frequently in the data.

Note: If the two terms being compared are assigned to different types, excluding the <Unknown> type, then the fuzzy grouping technique is not be applied to this pair. In other words, the terms must belong to the same type or the <Unknown> type in order for the technique to be applied.

If you enabled this feature and found that two words with similar spelling were incorrectly grouped together, you may want to exclude them from fuzzy grouping. You can do this by entering the incorrectly matched pairs into the Exceptions section in the Advanced Resources tab. See the topic About Advanced Resources for more information.

The following example demonstrates how fuzzy grouping is performed. If fuzzy grouping is enabled, these words appear to be the same and are matched in the following manner:

	color  -> colr                 mountain -> montn
	colour -> colr                 montana  -> montn

	modeling  -> modlng            furniture -> furntr
	modelling -> modlng            furnature -> furntr

In the preceding example, you would most likely want to exclude mountain and montana from being grouped together. Therefore, you could enter them in the Exceptions section in the following manner:

	mountain      montana

Important! In some cases, fuzzy grouping exceptions do not stop 2 words from being paired because certain synonym rules are being applied. In that case, you may want to try entering synonyms using the exclamation mark wildcard (!) to prohibit the words from becoming synonymous in the output. See the topic Defining Synonyms for more information.

Formatting Rules for Fuzzy Grouping Exceptions

• Define only one exception pair per line.

• Use simple or compound words.

• Use only lowercase characters for the words. Uppercase words will be ignored.

• Use a TAB character to separate each word in a pair.