Topic
  • 2 replies
  • Latest Post - ‏2013-01-17T16:14:36Z by SystemAdmin
SystemAdmin
SystemAdmin
197 Posts

Pinned topic Annotatoin from character rule does not work in sentence

‏2013-01-16T17:03:34Z |
Hello there,

I try to annotate a date in the following form: 10. Nov. 2012

Therefor I created a character rule:

$day = (0? $zahlO0)|((1|2) $zahlM0)|(3 (0|1)); $month = ... Nov|Nov.|November ...; $year = ((1|2) $zahlM0)? $zahlM0 $zahlM0; $date = $day  \. \p
{space
} $month \p
{space
} $year; $date 
{anno: ...
};


It works quite fine, until the date is in a sentence (except it begins with the date).
"10. November 2012 is a good day." => is annotated
"The 10. November 2012 is a good day." => not annotated

I assume the dot after the day (and month) is interpreted as the end of a sentence and therefore the problem.
Therefore I used a dictionary for the days and months and let a parsing rule annotate the date. It worked but the days from the dictionary(01., 02. etc.) affected other annotations. Also using a parsing rule which take care of the dot does not work, if there are words in front of the date. If it is a "phrases" parsing rule it says, that I can't use it for multiple sentences (or so) and if it is an "aggregate" one I can't ensure, that there is no other word between the date and the month.

So, does anyone of you has any idea or knows a workaround for this problem?
Maybe is there a chance to use a dictionary only in context of one parsing rule?
Updated on 2013-01-17T16:14:36Z at 2013-01-17T16:14:36Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Annotatoin from character rule does not work in sentence

    ‏2013-01-17T01:48:10Z  
    One possible work around is to create a character rules which can affect tokenization. It means that you need to create a character rule "file" with the "Affect Tokenization" option. (Or select your character rules file, open properties, and change Tokenization option.)

    When creating the following character rules with the affect tokenization option, your sample sentences would be annotated as expected.

    
    $zahlM0 = [0-9]; $day = (0? $zahlM0)|((1|2) $zahlM0)|(3 (0|1)); $month = Nov|Nov.|November; $year = ((1|2) $zahlM0)? $zahlM0 $zahlM0; $date = $day  \. \p
    {space
    } $month \p
    {space
    } $year; $date 
    {anno: 
    "com.ibm.langware.MyDate"
    };
    
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Annotatoin from character rule does not work in sentence

    ‏2013-01-17T16:14:36Z  
    One possible work around is to create a character rules which can affect tokenization. It means that you need to create a character rule "file" with the "Affect Tokenization" option. (Or select your character rules file, open properties, and change Tokenization option.)

    When creating the following character rules with the affect tokenization option, your sample sentences would be annotated as expected.

    <pre class="jive-pre"> $zahlM0 = [0-9]; $day = (0? $zahlM0)|((1|2) $zahlM0)|(3 (0|1)); $month = Nov|Nov.|November; $year = ((1|2) $zahlM0)? $zahlM0 $zahlM0; $date = $day \. \p {space } $month \p {space } $year; $date {anno: "com.ibm.langware.MyDate" }; </pre>
    Thank you, it works fine!