Topic
  • 4 replies
  • Latest Post - ‏2012-09-27T14:00:23Z by OlgaLind
OlgaLind
OlgaLind
34 Posts

Pinned topic What is the best solution for homogeneous parts of sentence?

‏2012-09-20T10:52:52Z |
Hi,

I would like to know if there are any flexible ways to handle such a pattern when there are several homogeneous parts of sentence separated by a comma or "and" conjunction and when we need to annotate everything between commas or "and" conjunction as single annotations.

The pattern which I have is the following:

DictLocationIndicator -> (one or more)token1 -> (optional)"and" -> (optional)"," -> (zero or more)token2 -> (optional)"and" -> (optional)"," -> (zero or more)token3 -> ... You see there is a certain repetition pattern, so it looks like we need to use groups in the rule.

In human language it would be something like this:

*(There are located) an old shop, a new shop and a shop which has a red roof.

So I need the parts between commas or "and" conjunction to be annotated. In this particular sentence we would get 3 annotations:
an old shop, a new shop, a shop which has a red roof.

Is it possible to create such a rule? When trying groups, I get one-word annotations as an, old, shop, but not the an old shop.

Thank you
Updated on 2012-09-27T14:00:23Z at 2012-09-27T14:00:23Z by OlgaLind
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: What is the best solution for homogeneous parts of sentence?

    ‏2012-09-20T13:40:07Z  
    Hi, There is a way to do this, but it is not as "simple" in this case because you are using tokens. It would be easier if you were using annotations between the "commas" and "and".

    I attached the rule that creates the output you are looking for (you just need to import it into a rules database, preferably a new one so it doesn't interfere with other rules). It will be quite hard to show you with screenshots because the selection tree is a bit long.

    This rule will detect the entities between "Located" and "the commas", the entities between commas, and the one after "and" in the sentence:
    Located an old shop, an old shop, an old shop and a new shop with a new roof.

    You will need to create 2 custom dictionaries and add them to your configuration as they are needed as input types by the rule: a dictionary with the Type DictAnd, and containing the word "and", the second with the Type DictLocation, and add the word "located".

    When you have a look at the rule, you will see how the groups are done, the group basically covers the "comma followed by an entry" or "and followed by an entry", instead of the other way around.
    Also in the constraints Tab, I gave priority to the And dictionary, so it will break the sentence. If you don't do this, the last annotation will cover "an old shop and a new shop with a new roof".

    Please have a look at the rule and see how it is constructed, this will help you to understand the logic behind it.
  • OlgaLind
    OlgaLind
    34 Posts

    Re: What is the best solution for homogeneous parts of sentence?

    ‏2012-09-25T07:29:54Z  
    Hi, There is a way to do this, but it is not as "simple" in this case because you are using tokens. It would be easier if you were using annotations between the "commas" and "and".

    I attached the rule that creates the output you are looking for (you just need to import it into a rules database, preferably a new one so it doesn't interfere with other rules). It will be quite hard to show you with screenshots because the selection tree is a bit long.

    This rule will detect the entities between "Located" and "the commas", the entities between commas, and the one after "and" in the sentence:
    Located an old shop, an old shop, an old shop and a new shop with a new roof.

    You will need to create 2 custom dictionaries and add them to your configuration as they are needed as input types by the rule: a dictionary with the Type DictAnd, and containing the word "and", the second with the Type DictLocation, and add the word "located".

    When you have a look at the rule, you will see how the groups are done, the group basically covers the "comma followed by an entry" or "and followed by an entry", instead of the other way around.
    Also in the constraints Tab, I gave priority to the And dictionary, so it will break the sentence. If you don't do this, the last annotation will cover "an old shop and a new shop with a new roof".

    Please have a look at the rule and see how it is constructed, this will help you to understand the logic behind it.
    Hi Amine,

    Thank you for sharing your experience, I was able to test the rule successfully. However, the rule which you provided does not cover this case:

    There are located a new shop and an old shop and a shop with a red roof

    I agree this is not quite nattural "and" usage, but who knows what we can meet :)

    I also created my own rules for homogeneous parts of sentence parsing, if you are interested in, please look at the attachment. There are two rules. To use them you need to create a Dictionary "DictLOcationIndicator" with the word "located".

    Thanks!
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: What is the best solution for homogeneous parts of sentence?

    ‏2012-09-25T09:50:46Z  
    • OlgaLind
    • ‏2012-09-25T07:29:54Z
    Hi Amine,

    Thank you for sharing your experience, I was able to test the rule successfully. However, the rule which you provided does not cover this case:

    There are located a new shop and an old shop and a shop with a red roof

    I agree this is not quite nattural "and" usage, but who knows what we can meet :)

    I also created my own rules for homogeneous parts of sentence parsing, if you are interested in, please look at the attachment. There are two rules. To use them you need to create a Dictionary "DictLOcationIndicator" with the word "located".

    Thanks!
    You are right, I didn't take this scenario under consideration.
    You can cover this scenario with the rule I have sent you if you make a small modification (set the repeats for the group containing "and" to "0 or more" (*), instead of optional (?).

    there is also the scenario like:
    There are located a new shop, and an old shop and a shop with a red roof.

    which in this case, you just need to add an optional token with the value "," before the "and" in the last group. This would give you the possibility to cover the above example.

    This attached updated rule is easier to maintain and expand when you find more scenarios.
  • OlgaLind
    OlgaLind
    34 Posts

    Re: What is the best solution for homogeneous parts of sentence?

    ‏2012-09-27T14:00:23Z  
    You are right, I didn't take this scenario under consideration.
    You can cover this scenario with the rule I have sent you if you make a small modification (set the repeats for the group containing "and" to "0 or more" (*), instead of optional (?).

    there is also the scenario like:
    There are located a new shop, and an old shop and a shop with a red roof.

    which in this case, you just need to add an optional token with the value "," before the "and" in the last group. This would give you the possibility to cover the above example.

    This attached updated rule is easier to maintain and expand when you find more scenarios.
    Thank you, I tested your rule today. This rule really matches all lowercase homogeneous parts of sentence. This basic rule can be used in many projects. Mine also works properly, except there are two rules, not one.

    Btw, at first I couldn't figure out why your rule didn't work quite a while, and then I realised that I should replace your Dictionary names with mine in the rule. In spite of the fact Dictionary names were the same, the prefixes were different.