The IBM® Netezza® SQL language supports two fuzzy string search functions: Levenshtein Edit Distance and Damerau-Levenshtein Edit Distance.
A fuzzy string search is a form of approximate string matching that is based on defined techniques or algorithms. These functions compare two strings to show how similar or different they are. These functions support VARCHAR and CHAR data types (Latin9 encoding) only.
<int4 value> = le_dst(<str_expr_1>, <str_expr_2>)
The return value indicates how different the two input strings are calculated according to the Levenshtein edit distance algorithm. A value of 0 indicates that the strings are equivalent without any modifications. The algorithm computes the number of modifications that are required to change the first string into the second string. The strings are case-sensitive. A modification is a change such as an addition, deletion, letter case-change, or substitution of a single character.
For example, le_dst('sow','show') returns a value of 1 (the addition of the character h); le_dst('hello','Hollow') returns a value of 3 (the substitution of e for o, the capitalization of H, and the addition of w).
Because the string comparisons are case-sensitive, you can use functions such as upper() and lower() to change the letter casing of strings before the comparison and ignore case-change modifications. For example, select le_dst('Smith','SMYTH') returns a value of 4 (three uppercase letter changes and a letter substitution). The function select le_dst(upper('Smith'),'SMYTH') returns a value of 1 (the I/Y letter substitution).
<int4 value> = dle_dst (<str_expr_1>, <str_expr_2>)
The value that is returned indicates how different the two input strings are calculated according to the Damerau-Levenshtein edit distance algorithm. The strings are case-sensitive. Similar to the Levenshtein algorithm, a modification is a change such as an addition, deletion, letter case-change, or substitution of a single character. However, in the Damerau-Levenshtein algorithm, a character transposition change such as ‘two’ to ‘tow’ counts as one change, not two. A value of 0 indicates that the strings are equivalent without any modifications. Similar to the le_dst() function, the string comparisons are case-sensitive; you can use functions such as upper() and lower() to change the case of the input strings before the comparison and ignore case-change modifications.