IBM InfoSphere Streams Version 4.1.1

Operator DataExplorerPush

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.dataexplorer/op$com.ibm.streams.bigdata.dataexplorer$DataExplorerPush.svg

The DataExplorerPush operator pushes data from InfoSphere Streams into IBM InfoSphere Data Explorer.

The DataExplorerPush operator uses the BigIndex API to insert records into InfoSphere Data Explorer 8.2-2, 8.2-3, and 9.0-0. Each input tuple that is processed updates or adds a single record in InfoSphere Data Explorer. When you specify the recordIdAttribute parameter, the DataExplorerPush operator assumes that the parameter value contains the record ID for the record to be inserted. If a record with the same record ID and the same record type exists within the collection, an update occurs. When you do not specify the recordIdAttribute parameter, InfoSphere Streams generates a record ID based on an internal record ID generation algorithm.

Each attribute in the input tuple maps to a field in the record with the same name as the attribute, except when the attributes are specified by the recordIdAttribute or suppress parameters. The attributes that are specified by the optional recordIdAttribute and suppress parameters are not added to the record.

Behavior in a consistent region

The DataExplorerPush operator is not supported in a consistent region. The operator does not checkpoint or restore its internal states.

Exceptions

The following errors and exceptions can occur:
  • Run time errors that cause the operator to stop running. The DataExplorerPush operator throws an exception and terminates in the following cases:
    • The file that is specified in the connectionDocument parameter, or the default etc/connections.txt file does not exist. This situation also occurs when the user that is running the SPL application does not have access to the file.
    • The InfoSphere Data Explorer endpoint URI that is configured in the ZooKeeper namespace through the entity model is invalid or cannot be reached.
    • The user ID that is configured in the ZooKeeper namespace through the entity model does not have sufficient privileges or is invalid.
    • The password for the user, which is configured in the ZooKeeper namespace through the entity model, is incorrect.
    • The ZooKeeper namespace in the connection file is a nonexistent namespace.
    • The record type that is specified in the operator invocation does not refer to an entity type that exists in the entity model file that is loaded in the ZooKeeper namespace.
    • The servers that are identified in the ZooKeeper endpoints are not valid server names.
    • The connection file referred to by the SPL application is invalid. The connection file is invalid in the following scenarios:
      • The file contains an invalid entry. Valid entries are zookeeperNamespace and zookeeperEndpoints.
      • The file does not contain a value for any of the entries.
      • The file contains repetitive entries for any of the allowable entries.
      • The file does not contain both of the lines, which specify the ZooKeeper namespace and the ZooKeeper endpoints.
    • Either the recordIdAttribute or one of the attributes that is specified in the retrievableAttributes, sortableAttributes, filterableAttributes, nonSearchableAttributes, or suppress parameter refers to an attribute that is not present in the input stream.
    • An attribute in the input stream has an unsupported data type, the suppress parameter is specified, and this attribute is not present in the suppress parameter list.
  • Run time errors that cause a particular record to fail during indexing. For example:
    • Any scenarios that result in a RequestStatus of ERROR for a tuple fail to be indexed and are written to the optional error output port, if one is specified. If the optional error output port is not specified, the failed record is silently dropped. In both of these cases, the operator continues to process the subsequent tuples without terminating.
  • Run time errors that cause a particular record to be dropped and do not write to the optional error output port, even if one is specified:
    • A record contains an empty string or a string with only space or white space characters in the attribute that is specified by the recordIdAttribute parameter.
    • A record contains an empty string or a string with only space or white space characters in all of the attributes that are not present in the suppress parameter list. In this case, all of the attributes that are not present in the suppress parameter list have rstring or ustring data types.
  • Compile-time errors. The DataExplorerPush operator throws a compile-time error in the following scenarios:
    • An attribute in the input stream has an unsupported data type and the suppress parameter is not specified.
Examples
These examples demonstrate how to use the DataExplorerPush operator.

Summary

Ports
This operator has 1 input port and 1 or more output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 13 parameters. (connectionDocument, collectionName, recordIdAttribute, recordType, retrievableAttributes, sortableAttributes, filterableAttributes, nonSearchableAttributes, suppress, flushBatchSize, flushInterval, indexThreadCount, maximumInProgressRequests)
Metrics
This operator reports 4 metrics.

Properties

Implementation
Java

Input Ports

Ports (0)

The DataExplorerPush operator has one non-windowed input port.

Properties

Output Ports

Assignments
Java operators do not support output assignments.
Ports (0)

The DataExplorerPush operator has one optional output port.

If the insertion of a record fails, a tuple is submitted to the optional output port. The output tuple is submitted after the processing of the corresponding input is complete. The order of the tuples that come into the operator, and the order of the tuples that are written to the optional output port when an indexing failure occurs are not the same. Tuple ordering is not preserved.

The output port has five attributes. The attributes can have any name. Three of the attributes are mandatory and two are optional. The first mandatory attribute is an embedded tuple that contains any of the matching attributes in the input stream of the input tuple that pertains to the indexing error. The other two mandatory attributes correspond to the recordId of the record that failed to be indexed and the indexing error message. The data types of these attributes are rstring. The two optional attributes represent the collection name and record type, which have rstring data types. These attributes are mapped positionally. The first of these optional attributes contains the collectionName and the second attribute (if it is specified) is the recordType.

Properties

Ports (1...)

Properties

Parameters

connectionDocument

This optional parameter specifies the name of the connection file, which contains the information that is necessary to connect to the InfoSphere Data Explorer instance.

If a relative path is specified, the file name is relative to the application directory. If this parameter is not specified, the operator looks for the connection information in application_directory/etc/connections.txt.

The connection file must contain the following lines:

zookeeperNamespace=zookeeper_namespace
zookeeperEndpoints=zookeeper_endpoints
zookeeper_namespace is name of the ZooKeeper namespace that is created. The entity model that contains information that is required in a clustered environment is stored in the ZooKeeper namespace. zookeeper_endpoints is a single string or a set of strings that specify the ZooKeeper endpoints. The string must have one of the following formats:
  • A port is specified for each of the ZooKeeper servers: zookeeperServer1:Port1, zookeeperServer2:Port2.....,zookeeperServern:Portn
  • The port is not specified for any of the ZooKeeper servers: zookeeperServer1,zookeeperServer2.....,zookeeperServerN
  • The port is specified for some of the ZooKeeper servers. zookeeperServer1:Port1, zookeeperServer2...., zookeeperServerN

If a port is not specified, the default value is 2181.

These lines can occur in any order, and spaces around the equal sign are ignored. Both of these lines are mandatory. For example, if you specify zookeeperNamespace but do not specify zookeeperEndpoints, a runtime error occurs.

Properties
collectionName

This optional parameter specifies the collection name on the IBM InfoSphere Data Explorer instance that is used for indexing.

Properties
recordIdAttribute

This optional parameter specifies the name of the attribute that contains the record identifier for that record that is to be pushed to InfoSphere Data Explorer. If this parameter is specified and a record with the same record ID exists in the collection in InfoSphere Data Explorer, an update occurs. If a record with the same record ID exists in the same collection but has a different record type, the update does not occur. If the parameter is not specified, InfoSphere Streams generates the record ID based on an internal record ID generation algorithm.

The attribute that is represented by the recordIdAttribute parameter has a int8, int16, int32, uint8, uint16, uint32, rstring, or ustring data type. The attribute is mapped to its string representation by using the tuple.getString() function.

Properties
recordType

This mandatory parameter specifies the type of record that is built.

Properties
retrievableAttributes

This optional parameter specifies the list of input attributes whose fields are marked as retrievable. Fields are not retrievable by default.

Properties
sortableAttributes

This optional parameter specifies the list of input attributes whose fields are marked as sortable. Fields are not sortable by default.

Properties
filterableAttributes

This optional parameter specifies the list of input attributes whose fields are marked as filterable. Fields are not filterable by default.

Properties
nonSearchableAttributes

This optional parameter specifies the list of input attributes whose fields are marked as nonsearchable. Fields are searchable by default.

Properties
suppress

This optional parameter specifies the list of input attributes that are not added as fields.

Properties
flushBatchSize

This optional parameter specifies when the operator flushes data from the queue and sends it to InfoSphere Data Explorer. The default value is 100.

Properties
flushInterval

This optional parameter specifies the time interval in milliseconds that the operator waits before it flushes data from the queue. The operator waits for the duration that is specified by the flushInterval parameter and then flushes the data even if it has not reached the flushBatchSize limit. The default value is 5000 milliseconds.

Properties
indexThreadCount

This optional parameter specifies the number of threads in the operator that can flush data to Data Explorer. The default value is 20.

Properties
maximumInProgressRequests

This optional parameter specifies the maximum size of the queue to hold data. When the maximum size for the queue is reached, all subsequent calls to InfoSphere Data Explorer are blocked until there is space in the queue. The default value is 5000.

Properties

Metrics

nRecordsPushed - Counter

The number of records that were pushed to IBM InfoSphere Data Explorer and indexed.

nRequestsOutstanding - Gauge

The number of records that are waiting to be indexed.

nRecordsFailed - Counter

The number of records that failed to be pushed to IBM InfoSphere Data Explorer.

nRecordsWithNonIndexableFields - Counter
The number of records containing fields that are non-indexable. This metric is incremented in the following cases:
  • A record contains an empty string or a string with only space or white space characters in the attribute that is specified by the recordIdAttribute parameter.
  • A record contains an empty string or a string with only space or white space characters in all of the attributes that are not present in the suppress parameter list. In this case, all of the attributes that are not present in the suppress parameter list have rstring or ustring data types.
  • A record contains only attributes that have rstring or ustring data types. All of the attributes have only space or white space characters, including the attribute that is specified in the recordIdAttribute parameter. This scenario is a combination of the two other scenarios.

Libraries

Java Operator Code
Library Path: ../../impl/lib/com.ibm.streams.dataexplorer.jar
The name of the BigSearch api jar. This is an environment variable that needs to be set prior to use of this operator.
Library Path: @BIGSEARCH_JAR@