IBM Db2

The IBM Db2 source reads data from an IBM Db2 database. For information about supported versions, see Supported systems and versions.

The IBM Db2 source can use multiple threads to process tables in parallel.

When you configure the IBM Db2 source, you specify the database name, host name, and port number. In addition, you can enable the source to use SSL. You can also use a connection to configure the source.

The source generates record header attributes that enable you to use the sources of a record in flow processing.

Before you use the IBM Db2 source, you must complete some prerequisite tasks.

Prerequisites

About this task

Before you configure the IBM Db2 source, complete the following prerequisite tasks.

Procedure

Verify that the Db2Data Collector flight.enable property is set to auto or true.
To connect to IBM Db2 z/OS, install and activate an appropriate license certificate:
1. Install an IBM Db2 Connect Unlimited Edition license certificate. For more information, see the IBM Db2 documentation.
2. Activate the license certificate. For more information, see the IBM Db2 documentation.

Multithreaded processing

The IBM Db2 source uses multiple concurrent threads to process data based on the Number of Threads property. Each table is processed by only one thread a time, which ensures that the order of batches and records from each table is preserved.

As the flow runs, each thread connects to the source system, creates a batch of data, and passes the batch to an available flow runner. A flow runner is a sourceless flow instance - an instance of the flow that includes all of the processors, executors, and targets in the flow and handles all flow processing after the source.

Each flow runner processes one batch at a time, just like a flow that runs on a single thread. When the flow of data slows, the flow runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time flow property to specify the interval or to opt out of empty batch generation.

Multithreaded flows preserve the order of records within each batch, just like a single-threaded flow. But since batches are processed by different flow runners, the order that batches are written to targets is not ensured.

For more information about multithreaded flows, see Multithreaded flow overview.

Record header attributes

The IBM Db2 source creates record header attributes that include information about the originating file for the record.

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with header attributes.

The IBM Db2 source creates the following record header attributes:

schemaName - Name of the schema containing the table the record comes from.
tableName - Name of the table the record comes from.

Field attributes

The IBM Db2 source generates field attributes for columns converted certain data types in Data Collector. The attributes provide additional information about each field.

The source generates the following field attributes data types:

Data Collector Data Type	Generated Field Attribute	Description
Date	unit	Original unit in which the date was obtained from IBM Db2.
Datetime	unit	Original unit in which the date was obtained from IBM Db2.
Decimal	precision	Precision of the value obtained from IBM Db2.
Decimal	bit_width	Original bit width in which the value was obtained from IBM Db2.
Decimal	scale	Scale of the value obtained from IBM Db2.
Decimal and String	flight_type	Original flight type.
Double and Float	precision	Original precision obtained from IBM Db2.
Integer, Long, and Short	bit_width	Original bit width in which the value was obtained from IBM Db2.
Integer, Long, and Short	signed	Whether the original value obtained from IBM Db2 was signed.
Time	unit	Original unit in which the date was obtained from IBM Db2.

IBM Db2 data types

The IBM Db2 source converts IBM Db2 data types into Data Collector data types before passing them to the flow.

The following table describes how IBM Db2 data types are converted to Data Collector data types:

IBM Db2 Data Type	Data Collector Data Type
BigInt	Long
Binary	Byte Array
Binary_View	Byte Array
Boolean	Boolean
Char	String
Clob	String
Date	Date
Dbclob	String
DecFloat	String
Decimal	Decimal
Double	Double
Duration	Long
Fixed_Size_Binary	Byte Array
Fixed_Size_List	List
Graphic	String
Integer	Integer
Large_Binary	Byte Array
Large_List	List
Large_List_View	List
List	List
List_View	List
Nchar	String
Numeric	Decimal
Nvarchar	String
Real	Float
SmallInt	Short
Struct	Map
Time	Time
Timestamp	Datetime
Varchar	String
Vargraphic	String
XML	String

Key column data types

The source supports the following Data Collector data types as key columns:

Boolean
Char
Date
Datetime
Decimal
Double
Float
Integer
Long
Short
String
Time
Zoned Datetime

Troubleshooting IBM Connectivity Service

The source connects to the database through IBM Connectivity Service. IBM Connectivity Service and all stages that use the service are bundled into the IBM Connectivity Service stage library, streamsets-datacollector-ibm-connectivity-service-lib. By default, Data Collector runs IBM Connectivity Service when the IBM Connectivity Service stage library is installed.

If IBM Connectivity Service stops responding, Data Collector attempts to restart the service. If Data Collector is unable to restart IBM Connectivity Service, you can use the following API calls to view the IBM Connectivity Service state and restart IBM Connectivity Service:

Call	CURL request
Get IBM Connectivity Service state	curl --location 'https://<engine url>/rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>
Restart IBM Connectivity Service	curl --location 'https://<engine url>//rest/v1/flight/restart' --header 'Authorization: Bearer <IBM Cloud API key>
Get IBM Connectivity Service logs	curl --location 'https://<engine url>//rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Note: You use an IBM Cloud API key to make these API calls. For information on managing IBM Cloud API keys, see the IBM Cloud documentation.

Configuring an IBM Db2 source

About this task

Configure an IBM Db2 source to read data from an IBM Db2 database.

Procedure

In the Properties panel, on the General tab, configure the following properties:

General Property	Description
Name	Stage name.
Description	Optional description.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the flow for error handling. Stop Flow - Stops the flow.

On the DB2 tab, configure the following properties:

Db2 Property	Description
Connection 7.4 and later	Connection that defines the information that is required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can specify the details in local properties. When you select a connection, the flow canvas hides properties that are defined in the connection.
Host	Host name or IP address of the database.
Port	Database port number.
Database Type	Type of Db2 database to connect to.
Database	Database name.
Use SSL	Enables SSL/TLS.
SSL Certificate	SSL certificate of the host. Enter a credential function that returns the certificate or enter the contents of the certificate.

On the Credentials tab, configure the following properties:

Credentials Property Description

User Name Database user name.

Password Database password.

Credentials Property	Description
User Name	Database user name.
Password	Database password.

On the Tables tab, configure the following properties:

Tables Property	Description
Schema	Name of the schema that contains the table to read. Using simple or bulk edit mode, click Add to add additional schemas.
Table Name	Name of the table to read. Using simple or bulk edit mode, click Add to add additional tables.
Key Columns	Ordered list of key columns to use as offset columns of the select query. As a best practice, a key column should be an incremental and unique column that does not contain null values. If the column is not unique, meaning multiple rows can have the same value for this column, there is potential for data loss when the flow restarts.
Maximum Batch Size	Maximum number of records processed at one time. Honors values up to the Data Collector maximum batch size.
Number of Threads	Number of threads the source generates and uses for multithreaded processing.

On the Advanced tab, optionally configure advanced properties.

The defaults for these properties should work in most cases:

Advanced Property	Description
Connection Validation RPC Timeout	Maximum number of seconds to wait for an IBM Connectivity Service connection or authentication. Default is 3 minutes.
Generic RPC Timeout	Maximum number of seconds to wait for all other IBM Connectivity Service actions. Default is 15 minutes.