Fuzzy string search functions
The Netezza Performance Server SQL language supports two fuzzy string search functions: Levenshtein Edit Distance and Damerau-Levenshtein Edit Distance.
A fuzzy string search is a form of approximate string matching that is based on defined techniques or algorithms. These functions compare two strings to show how similar or different they are. These functions support VARCHAR and CHAR data types (Latin9 encoding) only.
Levenshtein edit distance
le_dst(<str_expr_1>, <str_expr_2>)The return value is an integer (int4) that indicates how different the two input strings are calculated according to the Levenshtein edit distance algorithm. A value of 0 indicates that the strings are equivalent without any modifications. The algorithm computes the number of modifications that are required to change the first string into the second string. The strings are case-sensitive. A modification is a change such as the addition, deletion, case change, or substitution of a single character.
le_dst('sow','show')returns a value of 1 (the addition of the character h).le_dst('hello','Hollow')returns a value of 3 (the substitution of e for o, the capitalization of H, and the addition of w).
select le_dst('Smith','SMYTH')returns a value of 4 (three uppercase letter changes and the i-to-Y letter substitution).select le_dst(upper('Smith'),'SMYTH')returns a value of 1 (the I-to-Y letter substitution).
Damerau-Levenshtein edit distance
dle_dst (<str_expr_1>, <str_expr_2>)- A modification is a change such as the addition, deletion, case change, or substitution of a single character.
- The return value is an integer (int4) that indicates the number of modifications required to change the first string into the second string.
- A value of 0 indicates that the strings are equivalent without any modifications.
- The string comparisons are case-sensitive.
le_dst('two','tow')returns a value of 2 (changing the w to an o and the o to a w).dle_dst('two','tow')returns a value of 1 (transposing the w and o).