Topic
  • 4 replies
  • Latest Post - ‏2013-09-11T05:16:56Z by MichaelSelehov
MichaelSelehov
MichaelSelehov
3 Posts

Pinned topic Using predefined views with TextExtract

‏2013-09-04T09:07:45Z |

Hello!

I'm trying to write my first Streams application that should read the text file and extract all the person names it finds. It's very close to the FeatureDemo example from Text Toolkit but since I need to use different languages and by other reasons I do not like to use the simple .aql script used in the sample.

As far as I could understand Text toolkit has built-in views I can use without writing my own .aql scripts. And it looks like "Person" view is exactly what I need. Unfortunately, I couldn't find any example of the code that utilizes the predefined views. I was trying to intuitively wring "something that should work" but it didn't help. 

That's what I made so far (and obviously  it isn't working at all):

      stream<rstring Person> persons = com.ibm.streams.text.analytics::TextExtract(document) {

               param

               outputViews: "Person";

               outputMode: "multiPort";

      }

Any help would be appreciated.

  • KrisWH
    KrisWH
    13 Posts
    ACCEPTED ANSWER

    Re: Using predefined views with TextExtract

    ‏2013-09-09T20:32:53Z  

    Hi!

    Thank you for the hints. I found that it's not really necessary to unjar the modules as I can instead specify the .jar file as a module path. But I'm still struggling to make this work. I've tried two different approaches. First was plain usage of "Person" view:

          stream<rstring person> names_rstring = com.ibm.streams.text.analytics::TextExtract(documents) {
                   param
                   moduleName: "Person";
                   modulePath: "/home/streamsadmin/TextExtract/lib/TextAnalytics/data/tam/BigInsightsWesternNERMultilingual.jar";
                   outputMode: "multiPort";        
          }

    This one gave me the following error: "Wrong number of output ports, expected 0 found 1"

    I tought it might be because I didn't specify any output view so I've tried adding outputViews: "Person" param but got the following: "Invalid output name 'Person' (valid names are [])"

    Then I tried to use an .aql file with the following body:

    module test;
     
    import view Person;
     
    create view names as
    select P.person as person from Person P;
     
    output view names;

    But then I've got a very ridiculous error compiling this: 

    Exception in thread "Thread-6" com.ibm.avatar.api.exceptions.CompilerException: Compiling AQL encountered 1 errors: 
    null

     

    Meanwhile I also tried to use createTypes.pl script but it never worked for me with the following Java exception: " Exception in thread "main" java.lang.ClassFormatError: com.ibm.streams.text.InspectAQL (unrecognized class file version)"

     

    What are my mistakes? It looks like I'm pretty close to the goal but need a just couple more steps.

    In the SPL, you're referencing the module Person in BigInsightsWesternNERMultilingual.jar, but that module doesn't have any output views, which is why you get an error.   As you guessed, you have to create a module (you used test) to create output views, but you didn't reference that in the SPL.

    Here's an example.  First, we'll output all the fields of the person view, as rstrings.   Next, we'll create our own output view, and output just the first name field of the Person view. 

    Create the following AQL file in a directory called importDemo.  We need to import the Person view, and then we create our own view with just the first name.

    module importDemo;
    import view Person from module BigInsightsExtractorsExport as myPerson;
    output view myPerson;

     

    create view myPerson2 as
         select A.firstname as first from myPerson A;

    output view myPerson2;

    Now, use the following SPL fragment to call the AQL:

    (stream<rstring firstname, rstring middlename, rstring lastname, rstring person> PersonStream;
     stream<rstring first> myPerson2Stream) = com.ibm.streams.text.analytics::TextExtract(inputStream) {
          param
          modulePath: "/home/streamsadmin/TextExtract/lib/TextAnalytics/data/tam/BigInsightsWesternNERMultilingual.jar";
          uncompiledModules: "../importDemo";  // if relative, relative to the data directory
          outputMode: "multiPort";
          tokenizer: "MULTILINGUAL";
    }

    Since the AQL has two output views and you're using all of them, you don't need to use the outputViews parameter in this case.   You probably want to adjust the AQL so it has the output view you want in the format you want it. 

    One note, when you import a view, it's name doesn't change.  Even though you import it as myPerson and refer to it as myPerson, if you were to list it in the output views, you'd call it Person, and it's full name remains BigInsightsExtractorsExport.Person.  If you want to output it under a different name, you need to use

    output view ... as ...
    

    (By the way, you're right not to use the getNames AQL in FeatureDemo--that's t really toy AQL that isn't meant to be used, it's just to show how the TextExtract operator works.) 

  • MohanDani
    MohanDani
    1 Post

    Re: Using predefined views with TextExtract

    ‏2013-09-06T16:21:13Z  

    Hi ,

    You can use the built in .aql's to locate them you need to get to <StreamsInstall Location>/toolkits/com.ibm.streams.text/lib/TextAnalytics/data/tam

    in this folder you will get BigInsightsChineseNERMultilingual.jar,BigInsightsJapaneseNERMultilingual.jar,BigInsightsWesternNERMultilingual.jar and

    BigInsightsWesternNERStandard.jar you can extract these(un JAR) and locate the tam files which provides the views that you are interested and provide that tam's path to resolve your issue. specify the tam path name and view,it should work just like featureDemo Example. Please let me know if you have additional problems,i will help you out.

  • MichaelSelehov
    MichaelSelehov
    3 Posts

    Re: Using predefined views with TextExtract

    ‏2013-09-09T11:47:58Z  
    • MohanDani
    • ‏2013-09-06T16:21:13Z

    Hi ,

    You can use the built in .aql's to locate them you need to get to <StreamsInstall Location>/toolkits/com.ibm.streams.text/lib/TextAnalytics/data/tam

    in this folder you will get BigInsightsChineseNERMultilingual.jar,BigInsightsJapaneseNERMultilingual.jar,BigInsightsWesternNERMultilingual.jar and

    BigInsightsWesternNERStandard.jar you can extract these(un JAR) and locate the tam files which provides the views that you are interested and provide that tam's path to resolve your issue. specify the tam path name and view,it should work just like featureDemo Example. Please let me know if you have additional problems,i will help you out.

    Hi!

    Thank you for the hints. I found that it's not really necessary to unjar the modules as I can instead specify the .jar file as a module path. But I'm still struggling to make this work. I've tried two different approaches. First was plain usage of "Person" view:

          stream<rstring person> names_rstring = com.ibm.streams.text.analytics::TextExtract(documents) {
                   param
                   moduleName: "Person";
                   modulePath: "/home/streamsadmin/TextExtract/lib/TextAnalytics/data/tam/BigInsightsWesternNERMultilingual.jar";
                   outputMode: "multiPort";        
          }

    This one gave me the following error: "Wrong number of output ports, expected 0 found 1"

    I tought it might be because I didn't specify any output view so I've tried adding outputViews: "Person" param but got the following: "Invalid output name 'Person' (valid names are [])"

    Then I tried to use an .aql file with the following body:

    module test;
     
    import view Person;
     
    create view names as
    select P.person as person from Person P;
     
    output view names;

    But then I've got a very ridiculous error compiling this: 

    Exception in thread "Thread-6" com.ibm.avatar.api.exceptions.CompilerException: Compiling AQL encountered 1 errors: 
    null

     

    Meanwhile I also tried to use createTypes.pl script but it never worked for me with the following Java exception: " Exception in thread "main" java.lang.ClassFormatError: com.ibm.streams.text.InspectAQL (unrecognized class file version)"

     

    What are my mistakes? It looks like I'm pretty close to the goal but need a just couple more steps.

  • KrisWH
    KrisWH
    13 Posts

    Re: Using predefined views with TextExtract

    ‏2013-09-09T20:32:53Z  

    Hi!

    Thank you for the hints. I found that it's not really necessary to unjar the modules as I can instead specify the .jar file as a module path. But I'm still struggling to make this work. I've tried two different approaches. First was plain usage of "Person" view:

          stream<rstring person> names_rstring = com.ibm.streams.text.analytics::TextExtract(documents) {
                   param
                   moduleName: "Person";
                   modulePath: "/home/streamsadmin/TextExtract/lib/TextAnalytics/data/tam/BigInsightsWesternNERMultilingual.jar";
                   outputMode: "multiPort";        
          }

    This one gave me the following error: "Wrong number of output ports, expected 0 found 1"

    I tought it might be because I didn't specify any output view so I've tried adding outputViews: "Person" param but got the following: "Invalid output name 'Person' (valid names are [])"

    Then I tried to use an .aql file with the following body:

    module test;
     
    import view Person;
     
    create view names as
    select P.person as person from Person P;
     
    output view names;

    But then I've got a very ridiculous error compiling this: 

    Exception in thread "Thread-6" com.ibm.avatar.api.exceptions.CompilerException: Compiling AQL encountered 1 errors: 
    null

     

    Meanwhile I also tried to use createTypes.pl script but it never worked for me with the following Java exception: " Exception in thread "main" java.lang.ClassFormatError: com.ibm.streams.text.InspectAQL (unrecognized class file version)"

     

    What are my mistakes? It looks like I'm pretty close to the goal but need a just couple more steps.

    In the SPL, you're referencing the module Person in BigInsightsWesternNERMultilingual.jar, but that module doesn't have any output views, which is why you get an error.   As you guessed, you have to create a module (you used test) to create output views, but you didn't reference that in the SPL.

    Here's an example.  First, we'll output all the fields of the person view, as rstrings.   Next, we'll create our own output view, and output just the first name field of the Person view. 

    Create the following AQL file in a directory called importDemo.  We need to import the Person view, and then we create our own view with just the first name.

    module importDemo;
    import view Person from module BigInsightsExtractorsExport as myPerson;
    output view myPerson;

     

    create view myPerson2 as
         select A.firstname as first from myPerson A;

    output view myPerson2;

    Now, use the following SPL fragment to call the AQL:

    (stream<rstring firstname, rstring middlename, rstring lastname, rstring person> PersonStream;
     stream<rstring first> myPerson2Stream) = com.ibm.streams.text.analytics::TextExtract(inputStream) {
          param
          modulePath: "/home/streamsadmin/TextExtract/lib/TextAnalytics/data/tam/BigInsightsWesternNERMultilingual.jar";
          uncompiledModules: "../importDemo";  // if relative, relative to the data directory
          outputMode: "multiPort";
          tokenizer: "MULTILINGUAL";
    }

    Since the AQL has two output views and you're using all of them, you don't need to use the outputViews parameter in this case.   You probably want to adjust the AQL so it has the output view you want in the format you want it. 

    One note, when you import a view, it's name doesn't change.  Even though you import it as myPerson and refer to it as myPerson, if you were to list it in the output views, you'd call it Person, and it's full name remains BigInsightsExtractorsExport.Person.  If you want to output it under a different name, you need to use

    output view ... as ...
    

    (By the way, you're right not to use the getNames AQL in FeatureDemo--that's t really toy AQL that isn't meant to be used, it's just to show how the TextExtract operator works.) 

  • MichaelSelehov
    MichaelSelehov
    3 Posts

    Re: Using predefined views with TextExtract

    ‏2013-09-11T05:16:56Z  
    • KrisWH
    • ‏2013-09-09T20:32:53Z

    In the SPL, you're referencing the module Person in BigInsightsWesternNERMultilingual.jar, but that module doesn't have any output views, which is why you get an error.   As you guessed, you have to create a module (you used test) to create output views, but you didn't reference that in the SPL.

    Here's an example.  First, we'll output all the fields of the person view, as rstrings.   Next, we'll create our own output view, and output just the first name field of the Person view. 

    Create the following AQL file in a directory called importDemo.  We need to import the Person view, and then we create our own view with just the first name.

    module importDemo;
    import view Person from module BigInsightsExtractorsExport as myPerson;
    output view myPerson;

     

    create view myPerson2 as
         select A.firstname as first from myPerson A;

    output view myPerson2;

    Now, use the following SPL fragment to call the AQL:

    (stream<rstring firstname, rstring middlename, rstring lastname, rstring person> PersonStream;
     stream<rstring first> myPerson2Stream) = com.ibm.streams.text.analytics::TextExtract(inputStream) {
          param
          modulePath: "/home/streamsadmin/TextExtract/lib/TextAnalytics/data/tam/BigInsightsWesternNERMultilingual.jar";
          uncompiledModules: "../importDemo";  // if relative, relative to the data directory
          outputMode: "multiPort";
          tokenizer: "MULTILINGUAL";
    }

    Since the AQL has two output views and you're using all of them, you don't need to use the outputViews parameter in this case.   You probably want to adjust the AQL so it has the output view you want in the format you want it. 

    One note, when you import a view, it's name doesn't change.  Even though you import it as myPerson and refer to it as myPerson, if you were to list it in the output views, you'd call it Person, and it's full name remains BigInsightsExtractorsExport.Person.  If you want to output it under a different name, you need to use

    <pre dir="ltr">output view ... as ... </pre>

    (By the way, you're right not to use the getNames AQL in FeatureDemo--that's t really toy AQL that isn't meant to be used, it's just to show how the TextExtract operator works.) 

    Hi Kristen!

    Thank you very much for your help! Now I finally got it to work and I have a feeling that now I know how to use the predefined extractors. There will be a lot more to do for my project but at least now it's no longer stuck at the very beginning.

    Thanks again!

    Michael.