QualityStage user guide definitely specify (Chapter 11. Match Comparisons) that CNT_DIFF comparison type is only for numeric data.
For example look here
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/index.jsp?topic=/com.ibm.swg.im.iis.qs.ug.doc/topics/logarithm.html (CNT_DIFF is in table 2 "Match comparisons that apply to numbers").
But I tried to use this comparison with alpha string and it works without any problem exact as for numeric strings. For example pairs AAAAAAAAAAAAAAAAAAAA / AAAAAAAAAAAAAAAAAbcd (alhpa) and 11111111111111111111 / 11111111111111111234 (numeric) produce the same weight = 1.6
Is it safe to use CNT_DIFF for non-numeric strings? Maybe it is a documentation bug?
What is algorithm is used in CNT_DIFF comparison (Edit Distance, Jaro-Winkler, ...)?
Pinned topic CNT_DIFF also works for non-numeric strings (documentation bug)?
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2011-01-28T17:36:37Z at 2011-01-28T17:36:37Z by OlegT.
Re: CNT_DIFF also works for non-numeric strings (documentation bug)?2011-01-28T17:06:30ZThis is the accepted answer. This is the accepted answer.P.S. In my test I used match command CNT_DIFF with Param1=5
P.S. Is there a way to edit my messages on this forum?
smithha 110000PAKN23 Posts
Re: CNT_DIFF also works for non-numeric strings (documentation bug)?2011-01-28T17:30:10ZThis is the accepted answer. This is the accepted answer.
- OlegT. 270000XN3K
It's more a documentation nuance as in "Compares two strings of numbers", where 'strings' includes character data (char, varchar, etc)
CNT_DIFF is designed to test for keystroke errors and it does evaluate strings, including strings of numeric or alpha values. It is most commonly used against numeric or date strings, as those types of strings are more likely to only have keystroke errors. If you have data such as license or part 'numbers' that contain mixed alphanumeric data, those are also good candidates to test using CNT_DIFF.
For most text data that has any freeform aspect to it (names, products, addresses, descriptions, etc.) I would not use CNT_DIFF.
The algorithms used in the Match Comparisons are proprietary to IBM.
Re: CNT_DIFF also works for non-numeric strings (documentation bug)?2011-01-28T17:36:37ZThis is the accepted answer. This is the accepted answer.
- smithha 110000PAKN
thanks for your help. Now it is clear for me how to use CNT_DIFF