What’s new in version 2.0.0 ?

The Telecommunications Event Data Analytics toolkit version 2.0.0 release provides several new features.

Important: Some functional enhancements and extensions will require an adaptation of any existing TEDA applications (interfaces) to work with new release. Please see the What's changed? section for details on how to migrate existing projects.

The highlights of the new release are:

  • Benefit from the improved configuration of your Lookup Manager application
    • The XML format description is simpler now.
    • The database access specification and the mapping of the database column is more flexible
    • The database source can delete lookup data now
    • The database credentials and the database name are configurable with Configuration Application in the IBM Streams console
  • Benefit from the support of the partitioned BloomFilter feature in your ITE application
    • The housekeeping and the clean-up of the data deduplication is much faster than traditional time scheduled housekeeping of data stored the Bloom-Filter.
    • The Bloom Filter cleans the partitions automatically and without file processing interruption.
    • If you enable the data deduplication checkpointing then the framework uses the data deduplication checkpoint information in case of failed file or after job cancellation only.
    • You can look for duplicates in all partitions. It simplifies the configuration and accelerates the housekeeping in the typical scenario of the time-based duplicate detection for the dedicated time slot.
  • Benefit from the support to provide streams of your ITE application to other applications, which can dynamically connect and apply its own business logic on the exported data.
    • You can select the export interfaces with the new configuration parameter ite.export.streams.
    • The connected applications do not cause back-pressure in the ITE application.
  • Benefit from the new DirectoryWatch operator, which adds watches to the system's inotify functionality to monitor directories and report file changes using less CPU than the standard spl.adapter::DirectoryScan operator. Hint: You need to check whether your file system supports the inotify functionality before using the operator.

  • The CSVParse operator provides new custom output functions to get error descriptions when parsing fails

  • Benefit from the new functions in the com.ibm.streams.teda.file and com.ibm.streams.teda.file.path namespaces

  • The Lookup Manager provides new parameters, that enhance the handling of CSV file with enrichment data.
    • You decide how to handle header lines or lines that are empty.
    • You can use quoted values of attributes.
    • You can specify the separator character or characters that delimit the fields in the CSV input.
    • You can define the end-of-line marker.
  • The Lookup Manager creates new unique prefix for shared memory segments that allows the resource based sharing of one host with different users and for different Lookup Manager jobs.

  • Use the new <namespace>.context.custom::ContextContainer operator to implement multi-level contexts or contexts with different algorithms

The following defects are resolved with Telecommunications Event Data Analytics toolkit version 2.0.0:

  • Operators
    • The StructureParse operator now compiles successfully for every SPL output schema. The operator still supports output attributes that do not participate in the mapping and get their default values assigned. But, in earlier releases, an, for example, enum output attribute failed to compile. The mapping definition still supports the following primitive types only: boolean, intb, uintb, floatb, rstring[n], and blob[n]
    • The BloomFilter operator provides correct duplicate results using the searchAllPartitions parameter You can use the searchAllPartitions parameter with corrected duplicate detection now.
  • Application Framework
    • The com.ibm.streams.teda.internal::DirScan operator reports files once now if used with a relative directory path that is specified with the ite.ingest.directory.inputListFile parameter
    • The LookupCache operator is now thread-safe and able to run in a parallel region in the same PE
    • The new <namespace>.lookup::MultiLookupOperator is a new design of the TEDA enrichment operation based on shared memory access. It is a run time optimized version of the <namespace>.lookup::LookupCache operator for applications which require multiple enrichment operations in a sequence. The operator is able to perform multiple lookup operations in a single operator. Thus it avoids functional overhead. Additionally, the operator allows the definition of filter expressions for each lookup operation and the insertion of default values if there was not match found.
    • The build issue in ITE application with variant C and 2 groups is resolved.
    • The new Lookup Manager customizing sample XML file removes misleading relation between key and value expressions. The input stream attribute used in the key assignment definition is not required in the value assignment of the store definition.

Benefit from the new functions in the com.ibm.streams.teda.file and com.ibm.streams.teda.file.path namespaces

The symlink function creates symbolic links in the file system.

The space function determines the total, free, and available disk space capacity for a mounted file system.

The dirname function extracts the string from the provided path, which specifies the parent directory.

The filename function extracts the string from the provided path, which specifies the file name.

The stem function extracts the string from the provided path, which specifies the file name without the extension.

The extension function extracts the string from the provided path, which specifies the extension of the file name.

Use the new <namespace>.context.custom::ContextContainer operator to implement multi-level contexts or contexts with different algorithms

The new <namespace>.context.custom::ContextContainer operator is introduced. It allows you to implement a multi-level context logic or contexts with different algorithms.

With its default implementation, you can address <namespace>.context.custom::ContextDataProcessor instances as before. The tuples that are sent to an instance are handled in it and do not leave the ContextDataProcessor instance.

Some use cases require that contexts can talk to each other. For example, you send tuples to contexts and the implemented algorithms produce outputs that must be merged (from multiple contexts) to produce the final result. In this case, you customize the ContextContainer composite operator and do not use the ContextDataProcessor anymore.

Benefit from improved error messages for rejected CSV input records

In case a CSV record cannot be parsed due to format errors, the information logged to the reject file now contains the detailed error reason in additiona to the error code and the line number. For example: 1,"Conversion error: cannot convert value at position 4 to integral type, value='notanumber'",40

The CSVParse operator provides new custom output functions to get error descriptions when parsing fails

The CSVParse operator provides two new custom output functions on the error port. The rstring MessageId() function returns an rstring containing the id of the parse or conversion error that occurred. The rstring Message() function returns an rstring containing a description of the parse or conversion error that occurred.