• Share
  • ?
  • Profiles ▼
  • Communities ▼
  • Apps ▼

Blogs

  • My Blogs
  • Public Blogs
  • My Updates
  • Administration
  • Log in to participate

▼ Tags

 

▼ Similar Entries

Data Visualization: ...

Blog: The Tech Park
PhilipP. 310002RC19
Updated
0 people like thisLikes 0
No CommentsComments 0

Rhapsody model migra...

Blog: Notes from Io...
dmmckinn 1200006SCS
Updated
0 people like thisLikes 0
No CommentsComments 0

Analyzing Global Mir...

Blog: The BVQ Blog
mipi 270004DGB0
Updated
0 people like thisLikes 0
No CommentsComments 0

CE / CLM Standard de...

Blog: Notes from Io...
dmmckinn 1200006SCS
Updated
0 people like thisLikes 0
No CommentsComments 0

Configuring RTC to e...

Blog: Notes from Io...
dmmckinn 1200006SCS
Updated
0 people like thisLikes 0
No CommentsComments 0

▼ Similar Ideas

Résolution des chall...

Ideation Blog: IBM Tunisia H...
marouene.boubakri 31000009XK
Updated
Votes 5 No CommentsComments 0

Re: 2014 2nd Edition...

Ideation Blog: IBM PureData-...
shubho 270001FMSR
Updated
No Votes 0 No CommentsComments 0

Data Versioning

Ideation Blog: IBM Forms Exp...
SG_devWorks 270006BH8P
Updated
No Votes 0 CommentsComments 1

Importance of settin...

Ideation Blog: IBM PureData-...
DeepashriKrishnaraja 270001C7Y3
Updated
Votes 2 CommentsComments 5

Creating User Define...

Ideation Blog: IBM PureData-...
NeerajGaurav 060000R793
Updated
Votes 4 No CommentsComments 0

▼ Archive

  • January 2014
  • September 2013
  • December 2012
  • July 2012
  • March 2012
  • November 2011
  • October 2011

▼ Blog Authors

IBM Big Data

View All Entries
Clicking the button causes a full page refresh. The user could go to the "Entry list" region to view the new content.) Entry list

Building Extractors using InfoSphere™ BigInsights Text Analytics in Eclipse

NisanthSimon 270001RF0C | | Tags:  hadoop biginsights infosphere articles aql big analysis text data analytics ‎ | 2 Comments ‎ | 13,188 Views
 
This blog provides an overview on

1) Text Analytics System
2) How to setup the Text Analytics System in Eclipse.
3) How to build the extractors to extract structured information from unstructured or semistructured text.
4) How to run and cross verify the extracted results in Eclipse.
5) How to extract the AOG from the Eclipse project.
 
Pre-Requsite
Install IBM InfoSphere BigInsights v1.2
Eclipse Version 3.4 or 3.6 is required

 
1) Text Analytics system
Many enterprises maintain large amount of unstructured data like emails, logs, call-center records, wikis, blogs etc. There is an increasing need for understanding this unstructured data and provide better insight. The Text Analytics system provides scalability and ease-of-use in developing extractors that extract structured information from unstructured data. Our information extraction system is built around Annotation Query Language (AQL), a declarative rule language with a familiar SQL-like syntax.

Refer http://publib.boulder.ibm.com/infocenter/bigins/v1r2/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.biginsights.doc%2Fdoc%2Fbiginsights_aqlref_con_aql-overview.html for various constructs available in AQL.
 
 
2) How to setup the Text Analytics System in Eclipse
a) Start BigInsight and open BigInsight console using http://localhost:8080/BigInsights/console/NodeAdministration.jsp
b) Click on Eclipse Tool Tab on top-right side.
 
image
c) Open Eclipse and Install new software and provide the above plug-in repository URL.

3) How to build the extractors to extract structured information from unstructured or semistructured text.
Follow below steps to build annotators using Eclipse.
 
a) Open the Eclipse workspace. Go to File --> New  --> others --> BigInsights --> BigInsights Project and create a new project.
 
image 
b)  To create an Annotator file. Right click on the project created and new --> BigInsights --> AQL file. Provide an AQL file name. You can write AQL in it.
 
image


image
 4) How to run and cross verify the extracted results in Eclipse
 a) Open the Project properties and update the main AQL file. You can add Dictionaries, UDF and other dependent jars in data path.
 
image
b) To execute the AQL, right click on Main AQL file and open run configuration. Update the Input collection field.
 
image
c) You can see the extracted results under Annotation Explorer. Here the column Span Attribute Value will have the extracted text and column Span Attribute Name refers to View Name.Attribute name.
 image
 
There are various options to filter the results. Here I am filtering the results only for view NameCandidate.
 
image
You can see the provenance, by double clicking the 'Explain' in NameCandiate tab. Provenance provides backtracking on the views to understand which which views are responsible for the extracted result.
 
 
image 
To cross-verify with the Input Corpus, click on the fileName in Input Document column under Annotation Explorer.Here, select the attributes and the value for that attribute will be highlighted in the input corpus.
 
image 
 5) How to extract the AOG from the Eclipse project.
 
AOG is the compiled plan of the extractor. You can extract the AOG by right clicking the project and Export --> Text Analytics.
 
image 

  image
 
Now you generted the AOG. You can deploy the AOG in BigInsight and test for larger Corpus. I will be covering - How to run the AOG from JAQL in my next blog.
  • Add a Comment Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry
Notify Other People
notification

Send Email Notification

+

Quarantine this entry

deleteEntry
duplicateEntry

Mark as Duplicate

  • Previous Entry
  • Main
  • Next Entry
Feed for Blog Entries | Feed for Blog Comments | Feed for Comments for this Entry