Topic
4 replies Latest Post - ‏2011-02-24T15:58:03Z by OlegT.
OlegT.
OlegT.
14 Posts
ACCEPTED ANSWER

Pinned topic Custom ruleset - special character in token (classification file)

‏2011-02-18T15:03:58Z |
Hi all,
I am designing custom standartization ruleset for person name.
There is aproblem with classification (via classification file) of tokens which containes special characters like quote or dash - Qualitystage mark this tokens with class @ (mixed alpha-numeric) instead my custom classification rule.

For example, first name is ANA-MARIA.
I inserted into classification file line (F - is my token class for first name) "ANA-MARIA ANA-MARIA F"
But during testing Quality stage mark this token with class @.
Of course I can include these charactes to STRIPLIST. But maybe there is exists other solution?

Thanks,
Oleg
Updated on 2011-02-24T15:58:03Z at 2011-02-24T15:58:03Z by OlegT.
  • RobertDickson
    RobertDickson
    38 Posts
    ACCEPTED ANSWER

    Re: Custom ruleset - special character in token (classification file)

    ‏2011-02-18T21:01:30Z  in response to OlegT.
    Hi,

    Yes, this is expected behavior for all versions prior to 8.5. You can add the dash to the seplist (not the striplist), and the do an override for F-F. In all versions prior to 8.5, you can ONLY have pure alpha characters in the first column of the classification table.

    In 8.5, you can have mixed types (like ANA-MARIA).

    By the way, 'ANA-MARIA HARMKE' would be handled correctly in the USNAME rule set. Maybe you could use that rule set as a base for your customizations?

    Regards,
    Robert
    • OlegT.
      OlegT.
      14 Posts
      ACCEPTED ANSWER

      Re: Custom ruleset - special character in token (classification file)

      ‏2011-02-21T12:10:31Z  in response to RobertDickson
      Hi Robert,
      thanks for your very detailed answer@
      > You can add the dash to the seplist (not the striplist), and the do an override for F-F.
      Using SEPLIST is not very good way in my case. ANA-MARIA is a VERY simple example, what you suggest for example for V'YACHESLAV or MAR'IAN ;)

      > By the way, 'ANA-MARIA HARMKE' would be handled correctly in the USNAME
      I design rules for russian/ukrainian languages and I cannot use standard rulesets

      Thanks,
      Oleg
      • SystemAdmin
        SystemAdmin
        533 Posts
        ACCEPTED ANSWER

        Re: Custom ruleset - special character in token (classification file)

        ‏2011-02-21T17:25:01Z  in response to OlegT.
        Hi Oleg,

        Just ideas:

        1) put entries to CLS without apostrophe etc (this is a drawback..). And with it add apostrophe etc to striplist but not to seplist. In that case, V'yacheslav will match Vyacheslav in CLS.

        2) you could use a table with names (like city table where you can put anything) and check class @ against it. Probably it can be the supplement to CLS with only irregular entries. A drawback is that you need to maintain two tables (CLS and the new .tbl).
        • OlegT.
          OlegT.
          14 Posts
          ACCEPTED ANSWER

          Re: Custom ruleset - special character in token (classification file)

          ‏2011-02-24T15:58:03Z  in response to SystemAdmin
          Hi Andrei,
          > 1) put entries to CLS without apostrophe etc (this is a drawback..). And with it add apostrophe etc to striplist
          Yes, the most probably this approch will work quite well

          > 2) you could use a table with names (like city table where you can put anything) and check class @ against it.
          it is also interesting idea. But maybe the easiest way is to try new (8.5) Datastage release.
          Thanks to all for info and sharing ideas

          Regards,
          Oleg