Topic
  • 3 replies
  • Latest Post - ‏2013-03-26T09:59:35Z by SystemAdmin
SystemAdmin
SystemAdmin
197 Posts

Pinned topic Nouns are not correctly recognized in Dutch

‏2013-03-20T09:29:38Z |
I am analyzing some documents related to Healthcare.
An example of the text is : "rustig voorste oogsegment".

For some reason, the word "oogsegment" is not recognized as a noun. What it does : it recognizes "oog" as noun and "segment" as "noun". When I try to search for "oogsegment" no document is found back. In the Content Analytics Studio, I see that the words "oog" and "segment" are recognized, but the word "oogsegment" isn't.

How can I solve this?
Updated on 2013-03-26T09:59:35Z at 2013-03-26T09:59:35Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Nouns are not correctly recognized in Dutch

    ‏2013-03-20T14:54:11Z  
    This is due to the decomposition paradigm which decomposes "oogsegment" into oog and segment.
    You can add these type of compound words into a custom dictionary and use them in your model if you need to.
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Nouns are not correctly recognized in Dutch

    ‏2013-03-25T15:00:43Z  
    This is due to the decomposition paradigm which decomposes "oogsegment" into oog and segment.
    You can add these type of compound words into a custom dictionary and use them in your model if you need to.
    Is there a way to switch off the "decomposition paradigm"? Working with a custom dictionary will be a hell of a job, because we will have a lot of "compound" words.
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Nouns are not correctly recognized in Dutch

    ‏2013-03-26T09:59:35Z  
    Is there a way to switch off the "decomposition paradigm"? Working with a custom dictionary will be a hell of a job, because we will have a lot of "compound" words.
    If you want to see compound words as single tokens in the rules editor to write complex rules on top of them, I would suggest you create a rule to annotate any sequence of nouns with the feature "isConnectedToPrevious" as a "CompoundNoun" and use this new type in subsequent rules. So it is a bit like creating a shallow parsing grammar for your model.

    If you really want to turn off decomposition, then it is an advanced usage and you need to contact IBM via the support channel from which you bought the ICA license.

    It is possible to turn off decomposition, but please be aware that there are side effects, mainly on the Part of speech tagging precision which may degrade.