Topic
3 replies Latest Post - ‏2013-03-26T09:59:35Z by SystemAdmin
SystemAdmin
SystemAdmin
197 Posts
ACCEPTED ANSWER

Pinned topic Nouns are not correctly recognized in Dutch

‏2013-03-20T09:29:38Z |
I am analyzing some documents related to Healthcare.
An example of the text is : "rustig voorste oogsegment".

For some reason, the word "oogsegment" is not recognized as a noun. What it does : it recognizes "oog" as noun and "segment" as "noun". When I try to search for "oogsegment" no document is found back. In the Content Analytics Studio, I see that the words "oog" and "segment" are recognized, but the word "oogsegment" isn't.

How can I solve this?
Updated on 2013-03-26T09:59:35Z at 2013-03-26T09:59:35Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    197 Posts
    ACCEPTED ANSWER

    Re: Nouns are not correctly recognized in Dutch

    ‏2013-03-20T14:54:11Z  in response to SystemAdmin
    This is due to the decomposition paradigm which decomposes "oogsegment" into oog and segment.
    You can add these type of compound words into a custom dictionary and use them in your model if you need to.
    • SystemAdmin
      SystemAdmin
      197 Posts
      ACCEPTED ANSWER

      Re: Nouns are not correctly recognized in Dutch

      ‏2013-03-25T15:00:43Z  in response to SystemAdmin
      Is there a way to switch off the "decomposition paradigm"? Working with a custom dictionary will be a hell of a job, because we will have a lot of "compound" words.
      • SystemAdmin
        SystemAdmin
        197 Posts
        ACCEPTED ANSWER

        Re: Nouns are not correctly recognized in Dutch

        ‏2013-03-26T09:59:35Z  in response to SystemAdmin
        If you want to see compound words as single tokens in the rules editor to write complex rules on top of them, I would suggest you create a rule to annotate any sequence of nouns with the feature "isConnectedToPrevious" as a "CompoundNoun" and use this new type in subsequent rules. So it is a bit like creating a shallow parsing grammar for your model.

        If you really want to turn off decomposition, then it is an advanced usage and you need to contact IBM via the support channel from which you bought the ICA license.

        It is possible to turn off decomposition, but please be aware that there are side effects, mainly on the Part of speech tagging precision which may degrade.