Topic
4 replies Latest Post - ‏2012-10-02T13:42:39Z by SystemAdmin
SystemAdmin
SystemAdmin
197 Posts
ACCEPTED ANSWER

Pinned topic Populate and access a structured data (Object or DB) extracted from a text

‏2012-08-28T13:14:24Z |
Hi All,

I would like to have a view in ICA with a mixed of unstructured and structured data.

The system crawler some company news. There will be a page to show the last news related to a specific company (Unstructured data), and some data extracted from the news, for example:

"Samsung Electronics has signed an agreement with Dutch chipmaker ASML to invest €779m in the latter's Customer Co-Investment Program that aims to develop new chip making technology."
In this case, I would like to show for all companies the investment present in news crawled. On the page We will have: Investment: €779m.

To do it, I would like to access a structured data similar an object or a database: Company->Investiment->Value.

My question is how can I populate and access a structured data extracted from a text using ICA 3.0 ?

Carlo
Updated on 2012-10-02T13:42:39Z at 2012-10-02T13:42:39Z by SystemAdmin
  • bfoyle
    bfoyle
    60 Posts
    ACCEPTED ANSWER

    Re: Populate and access a structured data (Object or DB) extracted from a text

    ‏2012-08-28T21:57:39Z  in response to SystemAdmin
    I believe the best course of action is to work on a custom annotator stage that essentially looks up the relational data in the tables when a specific annotation is identified. In the example given, if Samsung Electronics is annotated as a company, then that company name could be looked up in the db and some resulting value could theoretically be appended as an additional annotation.
    • SystemAdmin
      SystemAdmin
      197 Posts
      ACCEPTED ANSWER

      Re: Populate and access a structured data (Object or DB) extracted from a text

      ‏2012-08-29T19:03:02Z  in response to bfoyle
      Hi Bob,

      Thank you for the answer.

      Only to confirm our understand: I will get some annotation from the text like company ("Samsung Electronics") and investment ("€779m") and populate a database.
      In a custom view it will access the database to show the information Company: Samsung Electronics and Invest:€779m .

      Somebody can ask me: Why do not use document Facets ? The problem I have is I can have more then one Company and more than one investment in only one document. I would like to see the sum of investment for each company in many documents.

      Is it the best/correct way ?

      Carlo
      • bfoyle
        bfoyle
        60 Posts
        ACCEPTED ANSWER

        Re: Populate and access a structured data (Object or DB) extracted from a text

        ‏2012-08-31T16:46:01Z  in response to SystemAdmin
        I think I misunderstood the original question / scenario. Is the investment amount in the documents you are analyzing or is it in an external database (outside of the documents)?

        What I was initially describing would work in the instance where you extract the company name from the document / news article and then need to go look-up in the database the investment amount for that company in the external database.

        The way I re-read what you are asking is that the company information and the investment amount are both in the document / news article. If so, then both can be annotated and extracted using standards rules and dictionaries...no custom stage required.

        I think then, the next thing you are looking for is a total value of investment across multiple documents for company x. For this, I would need to first normalize the value of investment in my annotator in ICA Studio to be a value...not text. Then when ICA processes this data through the pipeline, I would specify an export option to database. In that database, you should be able to query for company name x and get back the sum of the investment values extracted from the documents and exported to the db.

        A couple of caveats.
        First, I haven't set up a test and tried this so this is all my expectation and what I would try to do...but there may be hiccups.

        Second, I don't know your content but it occurs to me that you should be careful that the principle of what you are doing may be flawed. Document1 may be referring to the same investment as Document56 and Document127 ...so summing them may be overstating investment amount.
        • SystemAdmin
          SystemAdmin
          197 Posts
          ACCEPTED ANSWER

          Re: Populate and access a structured data (Object or DB) extracted from a text

          ‏2012-10-02T13:42:39Z  in response to bfoyle
          Hi Bob,

          Yes, your second description are correct.

          I was doing some test and the XML export works well, since it shows the position (Begin and End) for each aspect and I can

          It is only for a test and in a final solution I intend to do UIMA annotator to deal it.

          Thank you !