Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

IBM Accelerator for Machine Data Analytics, Part 2: Speeding up analysis of new log types

Sonali Surange (ssurange@us.ibm.com), Software Architect, IBM
Author photo
Sonali Surange is an IBM Software Architect working on IBM's big data products and technologies. She has filed numerous patents, published over 15 technical papers with IBM developerWorks, and presented in numerous technical conferences. Sonali is a past recipient of the IBM Outstanding Technical Achievement Award, Women of Color STEM Technical All Star Award, and was recognized as an IBM developerWorks Professional Author in 2012.

Summary:  Machine logs from diverse sources are generated in an enterprise in voluminous quantities. IBM® Accelerator for Machine Data Analytics simplifies the task of implementation required so analysis of semi-structured, unstructured or structured textual data is accelerated.

View more content in this series

Date:  17 Jan 2013
Level:  Intermediate PDF:  A4 and Letter (2869 KB | 35 pages)Get Adobe® Reader®

Activity:  7933 views
Comments:  

Understand the code

Review the code in email.aql and extractor_email.aql, shown in Listing 3.


Listing 3. email.aql
                
|--------10--------20--------30--------40--------50--------60--------70--------80--------|
module email;
                
create view email_base as 
select 
R.match 
from 
Regex(/([A-Za-z.\d;'\-<>]+\@[A-Za-z.\d;'\-<>]+(?=(?=,)\s*[A-Za-z.\@\d';\
-<>]+|))/, Document.text) R;
create view toEmail as 
select 
e.match
from
email_base e
where MatchesRegex(/To:\s/, LeftContext(e.match, 4));
                
create view To as 
select D.match as span, GetText(D.match) as text, GetString('To') as field_type
from toEmail D;

export view To;

create view fromEmail as 
select 
e.match
from
email_base e
where MatchesRegex(/From:\s/, LeftContext(e.match, 6));

create view From as 

select D.match as span, GetText(D.match) as text, GetString('From') as field_type
from fromEmail D;    

export view From;                           
            

Observe the following in the code.

  • A base view email_base was created.
  • A view fromEmail is created to pick emails representing the sender.
  • A view From is created the view fromEmail. This is the final view to export!
  • A similar toEmail view is created to pick emails representing the receiver.
  • A view called To is created in the view toEmail. This is the final view to export!

Note: The final views that are exported have the following simple naming convention that must be followed.

  • The view must contain a field called span, representing the span where the value is found.
  • The view must contain a field called text, representing the text value found.
  • The view must contain a field called field_type, representing the view.

You will soon get some insights into how these naming conventions are used, but first, you can publish the custom application.

11 of 18 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data, Information Management
ArticleID=854936
TutorialTitle=IBM Accelerator for Machine Data Analytics, Part 2: Speeding up analysis of new log types
publish-date=01172013
author1-email=ssurange@us.ibm.com
author1-email-cc=