Topic
  • 4 replies
  • Latest Post - ‏2011-02-24T15:58:03Z by OlegT.
OlegT.
OlegT.
14 Posts

Pinned topic Custom ruleset - special character in token (classification file)

‏2011-02-18T15:03:58Z |
Hi all,
I am designing custom standartization ruleset for person name.
There is aproblem with classification (via classification file) of tokens which containes special characters like quote or dash - Qualitystage mark this tokens with class @ (mixed alpha-numeric) instead my custom classification rule.

For example, first name is ANA-MARIA.
I inserted into classification file line (F - is my token class for first name) "ANA-MARIA ANA-MARIA F"
But during testing Quality stage mark this token with class @.
Of course I can include these charactes to STRIPLIST. But maybe there is exists other solution?

Thanks,
Oleg
Updated on 2011-02-24T15:58:03Z at 2011-02-24T15:58:03Z by OlegT.
  • RobertDickson
    RobertDickson
    41 Posts

    Re: Custom ruleset - special character in token (classification file)

    ‏2011-02-18T21:01:30Z  
    Hi,

    Yes, this is expected behavior for all versions prior to 8.5. You can add the dash to the seplist (not the striplist), and the do an override for F-F. In all versions prior to 8.5, you can ONLY have pure alpha characters in the first column of the classification table.

    In 8.5, you can have mixed types (like ANA-MARIA).

    By the way, 'ANA-MARIA HARMKE' would be handled correctly in the USNAME rule set. Maybe you could use that rule set as a base for your customizations?

    Regards,
    Robert
  • OlegT.
    OlegT.
    14 Posts

    Re: Custom ruleset - special character in token (classification file)

    ‏2011-02-21T12:10:31Z  
    Hi,

    Yes, this is expected behavior for all versions prior to 8.5. You can add the dash to the seplist (not the striplist), and the do an override for F-F. In all versions prior to 8.5, you can ONLY have pure alpha characters in the first column of the classification table.

    In 8.5, you can have mixed types (like ANA-MARIA).

    By the way, 'ANA-MARIA HARMKE' would be handled correctly in the USNAME rule set. Maybe you could use that rule set as a base for your customizations?

    Regards,
    Robert
    Hi Robert,
    thanks for your very detailed answer@
    > You can add the dash to the seplist (not the striplist), and the do an override for F-F.
    Using SEPLIST is not very good way in my case. ANA-MARIA is a VERY simple example, what you suggest for example for V'YACHESLAV or MAR'IAN ;)

    > By the way, 'ANA-MARIA HARMKE' would be handled correctly in the USNAME
    I design rules for russian/ukrainian languages and I cannot use standard rulesets

    Thanks,
    Oleg
  • SystemAdmin
    SystemAdmin
    533 Posts

    Re: Custom ruleset - special character in token (classification file)

    ‏2011-02-21T17:25:01Z  
    • OlegT.
    • ‏2011-02-21T12:10:31Z
    Hi Robert,
    thanks for your very detailed answer@
    > You can add the dash to the seplist (not the striplist), and the do an override for F-F.
    Using SEPLIST is not very good way in my case. ANA-MARIA is a VERY simple example, what you suggest for example for V'YACHESLAV or MAR'IAN ;)

    > By the way, 'ANA-MARIA HARMKE' would be handled correctly in the USNAME
    I design rules for russian/ukrainian languages and I cannot use standard rulesets

    Thanks,
    Oleg
    Hi Oleg,

    Just ideas:

    1) put entries to CLS without apostrophe etc (this is a drawback..). And with it add apostrophe etc to striplist but not to seplist. In that case, V'yacheslav will match Vyacheslav in CLS.

    2) you could use a table with names (like city table where you can put anything) and check class @ against it. Probably it can be the supplement to CLS with only irregular entries. A drawback is that you need to maintain two tables (CLS and the new .tbl).
  • OlegT.
    OlegT.
    14 Posts

    Re: Custom ruleset - special character in token (classification file)

    ‏2011-02-24T15:58:03Z  
    Hi Oleg,

    Just ideas:

    1) put entries to CLS without apostrophe etc (this is a drawback..). And with it add apostrophe etc to striplist but not to seplist. In that case, V'yacheslav will match Vyacheslav in CLS.

    2) you could use a table with names (like city table where you can put anything) and check class @ against it. Probably it can be the supplement to CLS with only irregular entries. A drawback is that you need to maintain two tables (CLS and the new .tbl).
    Hi Andrei,
    > 1) put entries to CLS without apostrophe etc (this is a drawback..). And with it add apostrophe etc to striplist
    Yes, the most probably this approch will work quite well

    > 2) you could use a table with names (like city table where you can put anything) and check class @ against it.
    it is also interesting idea. But maybe the easiest way is to try new (8.5) Datastage release.
    Thanks to all for info and sharing ideas

    Regards,
    Oleg