Topic
  • 3 replies
  • Latest Post - ‏2011-01-28T17:36:37Z by OlegT.
OlegT.
OlegT.
14 Posts

Pinned topic CNT_DIFF also works for non-numeric strings (documentation bug)?

‏2011-01-28T17:01:41Z |
Hi,
QualityStage user guide definitely specify (Chapter 11. Match Comparisons) that CNT_DIFF comparison type is only for numeric data.
For example look here
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/index.jsp?topic=/com.ibm.swg.im.iis.qs.ug.doc/topics/logarithm.html (CNT_DIFF is in table 2 "Match comparisons that apply to numbers").

But I tried to use this comparison with alpha string and it works without any problem exact as for numeric strings. For example pairs AAAAAAAAAAAAAAAAAAAA / AAAAAAAAAAAAAAAAAbcd (alhpa) and 11111111111111111111 / 11111111111111111234 (numeric) produce the same weight = 1.6

Is it safe to use CNT_DIFF for non-numeric strings? Maybe it is a documentation bug?
What is algorithm is used in CNT_DIFF comparison (Edit Distance, Jaro-Winkler, ...)?

Regards,
Oleg
Updated on 2011-01-28T17:36:37Z at 2011-01-28T17:36:37Z by OlegT.
  • OlegT.
    OlegT.
    14 Posts

    Re: CNT_DIFF also works for non-numeric strings (documentation bug)?

    ‏2011-01-28T17:06:30Z  
    P.S. In my test I used match command CNT_DIFF with Param1=5

    P.S. Is there a way to edit my messages on this forum?
  • smithha
    smithha
    23 Posts

    Re: CNT_DIFF also works for non-numeric strings (documentation bug)?

    ‏2011-01-28T17:30:10Z  
    • OlegT.
    • ‏2011-01-28T17:06:30Z
    P.S. In my test I used match command CNT_DIFF with Param1=5

    P.S. Is there a way to edit my messages on this forum?
    Hi Oleg,

    It's more a documentation nuance as in "Compares two strings of numbers", where 'strings' includes character data (char, varchar, etc)

    CNT_DIFF is designed to test for keystroke errors and it does evaluate strings, including strings of numeric or alpha values. It is most commonly used against numeric or date strings, as those types of strings are more likely to only have keystroke errors. If you have data such as license or part 'numbers' that contain mixed alphanumeric data, those are also good candidates to test using CNT_DIFF.

    For most text data that has any freeform aspect to it (names, products, addresses, descriptions, etc.) I would not use CNT_DIFF.

    The algorithms used in the Match Comparisons are proprietary to IBM.

    Harald
  • OlegT.
    OlegT.
    14 Posts

    Re: CNT_DIFF also works for non-numeric strings (documentation bug)?

    ‏2011-01-28T17:36:37Z  
    • smithha
    • ‏2011-01-28T17:30:10Z
    Hi Oleg,

    It's more a documentation nuance as in "Compares two strings of numbers", where 'strings' includes character data (char, varchar, etc)

    CNT_DIFF is designed to test for keystroke errors and it does evaluate strings, including strings of numeric or alpha values. It is most commonly used against numeric or date strings, as those types of strings are more likely to only have keystroke errors. If you have data such as license or part 'numbers' that contain mixed alphanumeric data, those are also good candidates to test using CNT_DIFF.

    For most text data that has any freeform aspect to it (names, products, addresses, descriptions, etc.) I would not use CNT_DIFF.

    The algorithms used in the Match Comparisons are proprietary to IBM.

    Harald
    Hi Harald,
    thanks for your help. Now it is clear for me how to use CNT_DIFF

    Regards,
    Oleg