IBM Streams 4.2.1

Operator NetezzaLoad

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.db/op$com.ibm.streams.db.netezza$NetezzaLoad.svg

The NetezzaLoad operator performs high speed loads into a Netezza database. This operator uses a delimited string that is created by the NetezzaPrepareLoad operator as input and loads by using the external table interface. For information about external tables, see the Netezza documentation about data loading.

The Netezza ODBC driver is required to use the NetezzaLoad operator.

In addition to the parameters listed, the operator accepts unrecognized parameters and attempts to use them as an external table option. These options are described in the Netezza documentation about data loading. The parameter must match an external table option name, which is not case-sensitive. The parameter value is then used as the external table option. Some potential options to specify include Delimiter, NullValue, EscapeChar, MaxErrors, SocketBufSize, and LogDir.

No validation occurs at compile time, since the operator is unaware of these values. If an invalid option is specified, an error occurs when the operator is run. Internally, the external table interface writes the problematic delimited strings to the nzbad file. The location of the file is specified by the LogDir external table option. The default is /tmp. The nzbad file is overwritten on every load and the records are written, one per tuple, to the optional error output port. If the error threshold is not reached, the nzbad file is not checked. Checking the file would result in an unwanted performance hit with the operator.

Tip: If your application has multiple NetezzaLoad operator instances that use the error output port and write to the same table, consider specifying different LogDir values for each instance. This strategy ensures that errors are not assigned to an incorrect instance. For example, an operator error in instance A might be assigned to instance B if only one LogDir parameter is specified.

Behavior in a consistent region

A NetezzaLoad operator can be used in a consistent region. It cannot be the start operator of a consistent region.

In a consistent region, a NetezzaLoad operator cannot have its commitOnPunct parameter set to true. It cannot have a control port. The configured value of the transaction_batchsize is ignored. Instead, database commits are performed on consistent region checkpoints, and database rollbacks are performed on consistent region resets. If a consistent region is reset and tuples are replayed after a NetezzaLoad operator has successfully completed its checkpoint, it is possible for the same inserts to be attempted more than one time.

On drain: The execution of the operator's insert statement completes. If this results in any database errors and the operator has an error output port, error tuples are generated and submitted to the error output port.

On checkpoint: A database commit is performed.

On reset: A rollback is performed.

Exceptions

The MaxErrors setting in the Netezza external tables interface indicates the number of errors that can occur before the system stops processing rows and rolls back all contents of the current batch. Setting MaxErrors to zero causes the system to never return an error. An example of a reason for such errors occurs when you place a string value into a numeric field in the database. When these errors occur, Netezza writes the problematic row (the delimited string) in the nzbad file.

Similar to other external table options, the MaxErrors setting can be specified as an operator parameter. If an error occurs that causes the current MaxErrors threshold to be reached, the operator reads the contents of the current file with the error and writes them to the operator log. In addition, the records from the file are written to the optional output port if it is used in the application.

If NoRetry is specified for the reconnectionPolicy parameter and the initial connection attempt fails while the NetezzaLoad operator is starting, it throws an ODBCOperatorShutdownException.

Other exceptions, such as with database connection, are logged similar to other database toolkit operators.

Character Encoding
Examples

Summary

Ports
This operator has 2 input ports and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports arbitrary parameters in addition to 12 specific parameters.

Required: access, connection

Optional: connectionDatabase, connectionDocument, connectionPassword, connectionPolicy, connectionUser, fifoDir, loadOnPunctuation, reconnectionBound, reconnectionInterval, reconnectionPolicy

Metrics
This operator reports 3 metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

The NetezzaLoad operator has one required input port.

The tuples on the required input port must consist of one rstring attribute. The rstring attribute that is received on the input port must be a delimited string, typically created by the NetezzaPrepareLoad operator. The NetezzaLoad operator writes records at intervals that are defined by the loadOnPunctuation parameter and the transaction_batchsize attribute within the connection.xml document. The input port is non-mutable and its punctuation mode is Oblivious.

Properties

Ports (1)

The NetezzaLoad operator has one optional input port, which allows operator configuration to be changed at run time.

Properties

Output Ports

Assignments
This operator does not allow assignments to output attributes.
Ports (0)

The NetezzaLoad operator is configurable with one optional error output port. The output port is non-mutating. The optional error output port submits one or more tuples when the error threshold specified by the MaxErrors external table option is reached. For example, if MaxErrors is 2, the error output port submits a tuple when the second error occurs.

Properties

Parameters

This operator supports arbitrary parameters in addition to 12 specific parameters.

Required: access, connection

Optional: connectionDatabase, connectionDocument, connectionPassword, connectionPolicy, connectionUser, fifoDir, loadOnPunctuation, reconnectionBound, reconnectionInterval, reconnectionPolicy

access

Specifies the name of an access_specification element in the connection specifications document.

Properties

connection

Specifies the name of the connection_specification element in the connection specification document that identifies the external service to which the operator connects.

Properties

connectionDatabase

Specifies the data source name of the database. If specified, this value overrides anything specified in the 'database' attribute of the ODBC element of the connection.xml file that is specified by the connectionDocument parameter.

Properties

connectionDocument

Specifies the pathname of a file containing the connection and access specifications identified by the connection and access parameters. If not specified, the file etc/connections.xml (under the application directory) is used.

Properties

connectionPassword

Specifies the password used to connect to the database. If specified, this value overrides anything specified in the 'password' attribute of the ODBC element of the connection.xml file that is specified by the connectionDocument parameter.

Properties

connectionPolicy

This optional parameter specifies the policy that is used to determine when a database connection, or subsequent reconnection after a database connection failure, occurs. The valid values are Immediate and Deferred. The default value is Immediate.

If Immediate is specified, the connection to the database is attempted when the operator is started. If a connection fails while an operator is running, a reconnection is attempted immediately when the disconnection is detected. This connection behavior minimizes delays in tuple processing.

If Deferred is specified, the initial connection to the database is not attempted until a connection is needed, which is usually the first time that an SQL statement is run. If a connection fails while an operator is running, the operator does not try to connect to the database until the next time that a connection is needed.

Properties

connectionUser

Specifies the user name or identifier used to connect to the database. If specified, this value overrides anything specified in the 'user' attribute of the ODBC element of the connection.xml file that is specified by the connectionDocument parameter.

Properties

fifoDir

This optional parameter specifies a directory where the operator can create a FIFO that is used internally by the operator. The default parameter value is the /tmp directory. The name of the FIFO is a unique value for each instance of the operator and the FIFO is removed when the operator terminates in a normal fashion.

Properties

loadOnPunctuation

This optional parameter specifies whether an external table load is closed and forces records to be committed. The default value is false. If the parameter is set to true, when a window punctuation is received on the input port, the operator concludes the current load and sends that data to the database. Then, the running row count is reset to zero, and a new load is started on the next incoming tuple.

Properties

reconnectionBound

This optional parameter specifies the number of successive connection attempts that occur when a connection fails or a disconnect occurs. It is used only when the reconnectionPolicy parameter is set to BoundedRetry; otherwise, it is ignored. The default parameter value is 5.

Properties

reconnectionInterval

This optional parameter specifies the amount of time (in seconds) that the operator waits between successive connection attempts. It is used only when the reconnectionPolicy parameter is set to BoundedRetry or InfiniteRetry; othewise, it is ignored. The default parameter value is 10.

Properties

reconnectionPolicy

This optional parameter specifies the policy that is used by the operator to handle database connection failures. The valid values are: NoRetry, InfiniteRetry, and BoundedRetry. The default value is InfiniteRetry.

If NoRetry is specified and a database connection failure occurs, the operator does not try to connect to the database again. The operator shuts down at startup time if the initial connection attempt fails.

If BoundedRetry is specified and a database connection failure occurs, the operator tries to connect to the database again up to a maximum number of times. The maximum number of connection attempts is specified in the reconnectionBound parameter. The sequence of connection attempts occurs at startup time. If a connection does not exist, the sequence of connection attempts also occurs before each operator is run.

If InfiniteRetry is specified, the operator continues to try and connect indefinitely until a connection is made. This behavior blocks all other operator operations while a connection is not successful. For example, if an incorrect connection password is specified in the connection configuration document, the operator remains in an infinite startup loop until a shutdown is requested.

Properties

Metrics

droppedTuples - Counter

Number of input tuples which could not be loaded successfully.

successfulLoadCount - Counter

Number of successful Netezza loads.

failedLoadCount - Counter

Number of failed Netezza loads.

Libraries

No description for library.
Command: ../../Common/ODBCLibInfo.pl