Examples

Specialized toolkits - release 4.3.1.0-prod20190605 > com.ibm.streams.text 2.3.2 > com.ibm.streams.text.analytics > TextExtract > Examples

These examples demonstrate required and optional parameters for several common scenarios.

Using extractors created in the Information Extraction Web Tool

You can create your extractors in the web tool that is part of BigInsights and also included with Streams. If you would like to learn how to access the tool or export extractors created therein, see this page. Once you have created and exported your extractor, you can use it with this operator to analyze streaming data.

The following snippet demonstrates how to configure the TextExtract operator to use the exported extractor. Assuming you have exported an extractor from the web tool and the contents of the exported zip file have been placed in the etc/exportedAQL folder of your SPL project. We also assume the extractor you created is named "PhoneNumberExtractor", and that it extracts a name and a phone number.


composite ExtractPhoneNumbers {
  graph
    stream<tuple<rstring phoneRecords, int32id> inputRecords, 
    rstring name, rstring address> line = FileSource(){
      param
        file: "input.txt";
        format: text; 
    }

    stream<rstring name, rstring number>  PhoneNumber = com.ibm.streams.text.analytics::TextExtract(line){
      param 
        moduleSearchPath:  "etc/exportedAQL" ; //location of the unpacked zip file within the project
        outputViews: "PhoneNumberExtractor"; //name of the extractor we created
        inputDoc: "inputRecords";
        outputMode: "multiPort";
        tokenizer: "STANDARD";
    }         

    () as Sink1 = FileSink(PhoneNumber){
      param 
        file: "NamesAndPhoneNumbers.txt";
        format: csv; 
        flush: 1u; 
    }
}

Other examples

Scenario: You have an application that uses the pre-built Sentiment extractor and other compiled/uncompiled modules.

Required parameters:

moduleName
modulePath
externalResourcesDir
uncompiledModules (optional)

Example:


   stream<OutputType> CustomSentimentStream =
   TextExtract(Input)
   {
       param
        inputDoc : $inputName ;
        moduleName : "module name>;
        uncompiledModules: "path to any uncompiled modules";
        modulePath :  "path to your compiled module",     pathToActionAPIJars(STANDARD),        pathToSentimentJars(STANDARD) ;
        externalResourcesDir: "Path to required sentiment dictionaries, default is STREAMS_INSTALL/toolkits/com.ibm.streams.text/impl/lib/dictionaries/sentiment
     }

Scenario: You would like to use one of the pre-built extractorsextractors in your application. For example, you would like to identify mentions of person names in incoming input.

Required parameters:

moduleName: "BigInsightsExtractorsOutput"

modulePath: The path to the pre-built extractors within the toolkit. You can use the various utility functions included in this toolkit to retrieve the path to the right module. In this instance we use the BigInsightsWesternNERStandard function to get the path to the standard modules.


type PeopleType=      rstring firstname, rstring middlename, rstring lastname, rstring person;
stream<list<PeopleType>Person>PeopleNameStream =  TextExtract(Input) {
  param
   moduleName: "BigInsightsExtractorsOutput";//this is the name of the module containing all the prebuilt extractors, including the Person extractor
   modulePath: BigInsightsWesternNERStandard(); //use the function to specify the path to the standard module
   inputDoc: "input_attr"; //name of attribute in the input stream with the text to analyze
}

On input containing the following text: "Mary said hi, then Jane Roberts called Fanny Thomas", The output will be a list containing the following 3 tuples:


   firstname="Mary",middlename="",lastname="",person="Mary",
   firstname="Jane",middlename="",lastname="Roberts",person="Jane Roberts",
   firstname="Fanny",middlename="",lastname="Thomas",person="Fanny Thomas"

Scenario: You have uncompiled and compiled modules in a zip file that were created using the web tool in IBM BigInsights Text Analytics that you would like to use. For this example, we assume the zip file has bene extracted into a folder on the local file system.
Required parameters:
- moduleSearchPath: Specify the path to the folder into which the zip file's contents have been extracted.
Scenario: You have uncompiled modules that you want to compile and load.
- Required parameters:
  - uncompiledModules: Specify the path to the uncompiled modules.
  - modulePath (optional): If the uncompiled modules refer to other modules, specify the path to the dependencies.
  - moduleName (optional): Specify the name(s) of any compiled modules that you want to load.
- Optional parameters:
  - externalView, externalDictionary, externalTable, languageCode, moduleOutputDir, outputMode, passThrough (only if the outputMode parameter value is not singlePort), and tokenizer.
Scenario: You have only compiled modules to be loaded.
- Required parameters:
  - moduleName: Specify the compiled module names that are to be loaded.
  - modulePath: Specify the path for the compiled modules that are listed in the moduleName parameter.
- Optional parameters:
  - externalView, externalDictionary, externalTable, languageCode, outputMode, passThrough (only if the outputMode parameter value is not singlePort), and tokenizer.

The following example demonstrates the use of external dictionaries and the uncompiledModules and moduleOutputDir parameters:


composite ExtractNames{
  graph
    stream<rstring oneline> line = FileSource(){
      param 
        file: "input.txt"; 
        format: line; 
    }
    // Find names in the input document 
    stream<rstring match> names = com.ibm.streams.text.analytics::TextExtract(line) {
      param 
        moduleName: DictTest; 
        uncompiledModules:"/home/tester/text/data/loadExtDictTest";
        moduleOutputDir:"/home/tester/text/data/DictTest"; 
        externalDictionary:"loadExtDictTest.externalDict=/home/tester/text/data/strictfirst.dict"; 
        outputMode: multiPort; 
    }          
    // Here you can see the titles and names from all the chapters. 
    () as NamesSink = FileSink(names) { 
      param 
        file: "names.out"; 
        format: csv; 
        flush: 1u; 
    } 
}

The following example demonstrates the use of external tables:


composite TablesDemo{
  graph
    // Read the files so that you have the whole file in one tuple.
    stream<rstring oneline> line = FileSource() {
      param 
        file: "input.txt"; 
        format: line; 
    }
    // Find titles 
    (stream<rstring match> titles ; 
     stream<rstring match> names) 
     = com.ibm.streams.text.analytics::TextExtract(line ) {
      param
        moduleName: TableTest; 
        uncompiledModules:"file:///home/tester/text/data/loadExtTableTest";
        moduleOutputDir :"file:///home/tester/text/data/TableTest"; 
        ExternalTable:
        "loadExtTableTest.extTab1=/home/tester/text/data/ExtTab1.csv",
        "loadExtTableTest.extTab2=/home/tester/text/data/ExtTab2.csv"; 
        outputMode: multiPort; 
    }      
    // Here you can see the titles and names from all the chapters. 
    () as NamesSink = FileSink(titles) { 
      param 
        file: "titles.out"; 
        format: csv; 
        flush: 1u; 
    } 
}

The following example demonstrates the use of external views. The modules in this example expect an external view along with the input document. The schema for the external views is specified in the input port. The attributes from the input port are then mapped to the external views (as defined in the module) by using the externalView parameter.


composite Main { 
  graph 
    stream<rstring fileName, rstring contents,
    list<tuple<rstring firstName, rstring lastName, uint32 age>> infoextView, 
    list<tuple<rstring caller rstring callee, uint64 duration>> 
    callInfoextView> Annotated = FileSource() {
      param 
        file: sampleFile.dat; 
    }
    // Find titles 
    (stream<list<tuple<rstring names>> titles, 
    list<tuple<rstring match>> matchList> Phone2) = 
    com.ibm.streams.text.analytics::TextExtract(Annotated){ 
      param
        uncompiledModules:"/home/tester/text/data/loadExtViewTest"; 
        moduleOutputDir:"/home/tester/text/data/ViewTest"; 
        externalView: 
        "loadExtViewTest.info=infoextView",
        "loadExtViewTest.callInfo=callInfoextView”; 
        outputMode: singlePort; 
        inputDoc: "contents"; 
    }
    () as NamesSink = FileSink(Phone2) {
      param 
        file: "Phone2.out"; 
        format: csv; 
        flush: 1u; 
    } 
}

The following example demonstrates the use of passThrough parameter:


composite PassThroughDemo{
  graph
    // Read the files so that you have the whole file in one tuple.
    stream<rstring oneline> line = FileSource()  {
      param 
        file: "input.txt"; 
        format: line; 
    }  
    (stream<rstring file, rstring person, rstring phone,
    rstring personphone> demo3_passThrough; 
    stream<rstring file, rstring entireDoc> demo3_nophone) 
    = TextExtract( line) { 
      param 
        moduleName: "phoneModule";
        ModulePath: "/home/tester/com.ibm.streams.text/data/test";
        inputDoc: "oneline";
        passThrough: true;  
        outputMode: multiPort; 
    } 
    () as NamesSink1 = FileSink(demo3_nophone){ 
      param 
        file: "demo3_nophone .out";
        format: csv; 
        flush: 1u; 
    }  
    () as NamesSink2 = FileSink(demo3_passThrough) { 
      param
        file: "demo3_passThrough .out"; 
        format: csv; 
        flush: 1u; 
    } 
}