I would like to have a view in ICA with a mixed of unstructured and structured data.
The system crawler some company news. There will be a page to show the last news related to a specific company (Unstructured data), and some data extracted from the news, for example:
"Samsung Electronics has signed an agreement with Dutch chipmaker ASML to invest €779m in the latter's Customer Co-Investment Program that aims to develop new chip making technology."
In this case, I would like to show for all companies the investment present in news crawled. On the page We will have: Investment: €779m.
To do it, I would like to access a structured data similar an object or a database: Company->Investiment->Value.
My question is how can I populate and access a structured data extracted from a text using ICA 3.0 ?
Pinned topic Populate and access a structured data (Object or DB) extracted from a text
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-10-02T13:42:39Z at 2012-10-02T13:42:39Z by SystemAdmin
bfoyle 060001WDQ360 Posts
Re: Populate and access a structured data (Object or DB) extracted from a text2012-08-28T21:57:39ZThis is the accepted answer. This is the accepted answer.I believe the best course of action is to work on a custom annotator stage that essentially looks up the relational data in the tables when a specific annotation is identified. In the example given, if Samsung Electronics is annotated as a company, then that company name could be looked up in the db and some resulting value could theoretically be appended as an additional annotation.
Re: Populate and access a structured data (Object or DB) extracted from a text2012-08-29T19:03:02ZThis is the accepted answer. This is the accepted answer.
- bfoyle 060001WDQ3
Thank you for the answer.
Only to confirm our understand: I will get some annotation from the text like company ("Samsung Electronics") and investment ("€779m") and populate a database.
In a custom view it will access the database to show the information Company: Samsung Electronics and Invest:€779m .
Somebody can ask me: Why do not use document Facets ? The problem I have is I can have more then one Company and more than one investment in only one document. I would like to see the sum of investment for each company in many documents.
Is it the best/correct way ?
bfoyle 060001WDQ360 Posts
Re: Populate and access a structured data (Object or DB) extracted from a text2012-08-31T16:46:01ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin 110000D4XK
What I was initially describing would work in the instance where you extract the company name from the document / news article and then need to go look-up in the database the investment amount for that company in the external database.
The way I re-read what you are asking is that the company information and the investment amount are both in the document / news article. If so, then both can be annotated and extracted using standards rules and dictionaries...no custom stage required.
I think then, the next thing you are looking for is a total value of investment across multiple documents for company x. For this, I would need to first normalize the value of investment in my annotator in ICA Studio to be a value...not text. Then when ICA processes this data through the pipeline, I would specify an export option to database. In that database, you should be able to query for company name x and get back the sum of the investment values extracted from the documents and exported to the db.
A couple of caveats.
First, I haven't set up a test and tried this so this is all my expectation and what I would try to do...but there may be hiccups.
Second, I don't know your content but it occurs to me that you should be careful that the principle of what you are doing may be flawed. Document1 may be referring to the same investment as Document56 and Document127 ...so summing them may be overstating investment amount.
Re: Populate and access a structured data (Object or DB) extracted from a text2012-10-02T13:42:39ZThis is the accepted answer. This is the accepted answer.
- bfoyle 060001WDQ3
Yes, your second description are correct.
I was doing some test and the XML export works well, since it shows the position (Begin and End) for each aspect and I can
It is only for a test and in a final solution I intend to do UIMA annotator to deal it.
Thank you !