createTypes script

The createTypes.pl script can be used to create applications from modules that contain Annotated Query Language (AQL) files. For example, after creating extractors in the Information Extraction Web Tool in BigInsights, the script could be used to generate an SPL application that utilizes the extractors created in the web tool. The generated streams processing applications can be used as a starting point for developing applications and the generated types might make it easier to maintain SPL applications that involve text analytics.

The createTypes.pl script takes modules or TAM files as arguments. The script can be used to create output stream types that match the output views of the AQLs in the modules or TAM files. See the documentation for the TextExtract operator for additional information on the mapping of AQL types to SPL types. Note that if an AQL output view has a field with a null type, then the generated SPL attribute for that field will have a type of rstring. The script can also optionally create a composite operator that you can use as part of an application. These features make it possible to call createTypes.pl from a Makefile and generate the SPL from modules or TAM files, which can help you detect incompatibilities between your modules or TAM files and your application before run time.

Parameters

The script provides the following optional parameters to control the output types produced and the parameters of the TextExtract operator:
  • compositeName: The name to use for the composite operator that contains the TextExtract operator. By default, the script derives a composite name from the AQL file.
  • convertSpan: If this parameter is specified, the script converts span data types to string data types.
  • copyallattributes: Copy all the original attributes to the output. The new attributes are added to the end of the original attributes. This parameter can optionally be used with the makecomposite parameter.
  • extDictionaries: The names (including the path names) for the external dictionaries.
  • extraAttr attrName: Deprecated: This parameter is deprecated and might become obsolete. Use the copyallattributes parameter instead.

    Add an attribute with the name attrName to all the output streams (except the passThrough stream) of the composite operator. This attribute must be present on the input stream for the composite, and you must pass the type of the attribute to the composite operator in the attrName_type parameter. This parameter can be specified multiple times.

  • extTables: The names (including the path names) for the external tables.
  • floattype: The type of float variable to use. Options are float32 or float64.
  • help: Displays usage information for the script.
  • inputDoc: The default text field. See the text parameter description for the TextExtract operator.
  • inttype: The type of integer to use. Options are int32 or int64.
  • languageCode: The ISO language code that is used by IBM BigInsights.
  • main: Create a Main composite operator to call the created composite operator.
  • makecomposite: Create a composite operator as well as the types.
  • moduleOutputDir: The path name where the modules are saved after they are compiled.
  • modulePath: The path name where the modules are located.
  • moduleSearchPath: The path to search for both uncompiled modules. Most useful when consuming exported AQL from BigInsights Text Analytics web tool.
  • modules: The names of the modules to be loaded.
  • namespace: The namespace of the composite operator. You must create this namespace directory and move the generated files into it.
  • noconvertSpan: If this parameter is specified, do not convert spans to strings.
  • nospl: Do not generate SPL; print the views and types to standard output and then exit.
  • outputfile: The name of the output file. The SPL extension is automatically added if it is not specified.
  • outputMode: The output mode. Options are singlePort or multiPort.
  • outputview: The output view to activate. It can be specified multiple times. If this parameter is not specified, all output views are included.
  • passThrough: Set the passThrough parameter to true in the composite operator.
  • spantype: Set the type for Span data types and set the noconvertSpan parameter.
  • stringtype: The type of string. Options are ustring or rstring.
  • strictMapping: Control whether or not null values returned from the BigInsights Text Analytics runtime result in an exception or are ignored.
  • suppressPunctuation: Create a suppressPunctuation parameter in the composite operator with a default value of true.
  • tokenizer: The default tokenizer parameter. See the tokenizer parameter description for the TextExtract operator.
  • uncompiledModules: The names of the modules that are to be compiled.

Examples

In some scenarios, you might want to create a reusable instantiation of a TextExtract operator with a particular set of parameters. For example, imagine you have a simple uncompiled module created without the BigInsights web tool, that uses some of the pre-built named entity extractors. These pre-built extractors can be found in $STREAMS_INSTALL/toolkits/com.ibm.streams.text/lib/TextAnalytics/data/tam/named-entities. In the simple case, if you would like to use the standard set of named entity extractors, you could use the following command:


createTypes.pl --uncompiledModules <path_to_custom_module>
--modulePath <path_to_pre_built_extractors>/BigInsightsWesternNERStandard.jar
--makecomposite
--main
--compositename sample

That example produces an SPL file with the following contents. Note that the modulePath parameter is now set to the same value used by the script.


type PersonType = rstring firstname, rstring middlename, rstring lastname, rstring person;
type myPerson2Type = rstring first;
type AllAnnotationsType = list<PersonType> Person , list<myPerson2Type> myPerson2 ;
// a composite for outputMode singlePort
public composite sample(input inputStream;
output AllAnnotationsStream ) { 
graph
  (stream<AllAnnotationsType> AllAnnotationsStream) 
  = com.ibm.streams.text.analytics::TextExtract(inputStream) {
    param
      modulePath: 
      "<path_to_pre_built_extractor>/BigInsightsWesternNERStandard.jar";
      uncompiledModules: "<path_to_custom_module>";
      outputMode: "singlePort";
      tokenizer: "STANDARD";
  }
}

This composite operator can be useful if it is used in multiple places. To have a ready-to-go application, you can include the --main option, which also produces a Main.spl file that calls this composite operator.

Another common use case is generating SPL for an extractor created using the BigInsights web tool. As was mentioned before, you can export the uncompiled AQL for the extractor using the "Export AQL" feature in the web tool. This will produce a zip file containing all the needed modules to run your extractor within Streams. You can then use the createTypes script to produce a composite that can be used within your streams application. For example, assuming that you have created an extractor named PersonSearch, and the exported zip file has been extracted into a folder called exportedAQL. You can use the moduleSearchPath parameter with the script to generate an SPL application ready to process text using the TextExtract operator.

Executing the following command will produce an SPL application that will use the PersonSearch extractor to analyze the file.


createTypes.pl --moduleSearchPath exportedAQL/ --outputView PersonSearch  --makecomposite --outputfile PersonSearchExtractor.spl --outputMode multiPort --main --compositeName PersonSearchExtractor

Two files will be generated, one named Main.spl and another named PersonSearchExtractor.spl. The parameter outputView allows you to indicate that only the output from the PersonSearch extractor should be produced. The main parameter indicates that a complete application should be created.

Contents of PersonSearchExtractor.spl:


	type PersonSearchType = rstring name, rstring city;
	
	// a composite for outputMode multiPort
	public composite PersonSearch(input InputStream;
			output PersonSearchStream) {
	
	graph
		(stream<PersonSearchType> PersonSearchStream) 
			= com.ibm.streams.text.analytics::TextExtract(InputStream) {
		param
			moduleSearchPath: "/homes/sample/exportedAQL";
			outputMode: "multiPort";
			outputViews: "PersonSearch";
			tokenizer: "STANDARD";
		}
	}

Contents of Main.spl:


composite Main {
        param
    expression<rstring> $inputFile: getSubmissionTimeValue("inputFile");
        graph
       
        stream<rstring inputText> Instream = FileSource() {
            param
          file: $inputFile;
          format: line;
        }
	stream<PersonSearchType> PersonSearch=PersonSearch (Instream) {
        }

	() as SinkPersonSearch = FileSink(PersonSearch) {
		 param
		   file: "PersonSearch_PersonSearch.txt";
	}
}

The moduleSearchPath parameter must be absolute or relative to the directory from within which the createTypes.pl script is run.