I am analyzing some documents related to Healthcare.
An example of the text is : "rustig voorste oogsegment".
For some reason, the word "oogsegment" is not recognized as a noun. What it does : it recognizes "oog" as noun and "segment" as "noun". When I try to search for "oogsegment" no document is found back. In the Content Analytics Studio, I see that the words "oog" and "segment" are recognized, but the word "oogsegment" isn't.
How can I solve this?
This topic has been locked.
3 replies Latest Post - 2013-03-26T09:59:35Z by SystemAdmin
Pinned topic Nouns are not correctly recognized in Dutch
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-03-26T09:59:35Z at 2013-03-26T09:59:35Z by SystemAdmin
Re: Nouns are not correctly recognized in Dutch2013-03-20T14:54:11Z in response to SystemAdminThis is due to the decomposition paradigm which decomposes "oogsegment" into oog and segment.
You can add these type of compound words into a custom dictionary and use them in your model if you need to.
Re: Nouns are not correctly recognized in Dutch2013-03-26T09:59:35Z in response to SystemAdminIf you want to see compound words as single tokens in the rules editor to write complex rules on top of them, I would suggest you create a rule to annotate any sequence of nouns with the feature "isConnectedToPrevious" as a "CompoundNoun" and use this new type in subsequent rules. So it is a bit like creating a shallow parsing grammar for your model.
If you really want to turn off decomposition, then it is an advanced usage and you need to contact IBM via the support channel from which you bought the ICA license.
It is possible to turn off decomposition, but please be aware that there are side effects, mainly on the Part of speech tagging precision which may degrade.