IBM Db2

The IBM Db2 source reads data from an IBM Db2 database. For information about supported versions, see Supported systems and versions.

The IBM Db2 source can use multiple threads to process tables in parallel.

When you configure the IBM Db2 source, you specify the database name, host name, and port number. In addition, you can enable the source to use SSL. You can also use a connection to configure the source.

The source generates record header attributes that enable you to use the sources of a record in flow processing.

Before you use the IBM Db2 source, you must complete some prerequisite tasks.

Prerequisites

About this task

Before you configure the IBM Db2 source, complete the following prerequisite tasks.

Procedure

  1. Verify that the Db2Data Collector flight.enable property is set to auto or true.
  2. To connect to IBM Db2 z/OS, install and activate an appropriate license certificate:
    1. Install an IBM Db2 Connect Unlimited Edition license certificate. For more information, see the IBM Db2 documentation.
    2. Activate the license certificate. For more information, see the IBM Db2 documentation.

Multithreaded processing

The IBM Db2 source uses multiple concurrent threads to process data based on the Number of Threads property. Each table is processed by only one thread a time, which ensures that the order of batches and records from each table is preserved.

As the flow runs, each thread connects to the source system, creates a batch of data, and passes the batch to an available flow runner. A flow runner is a sourceless flow instance - an instance of the flow that includes all of the processors, executors, and targets in the flow and handles all flow processing after the source.

Each flow runner processes one batch at a time, just like a flow that runs on a single thread. When the flow of data slows, the flow runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time flow property to specify the interval or to opt out of empty batch generation.

Multithreaded flows preserve the order of records within each batch, just like a single-threaded flow. But since batches are processed by different flow runners, the order that batches are written to targets is not ensured.

For more information about multithreaded flows, see Multithreaded flow overview.

Record header attributes

The IBM Db2 source creates record header attributes that include information about the originating file for the record.

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with header attributes.

The IBM Db2 source creates the following record header attributes:
  • schemaName - Name of the schema containing the table the record comes from.
  • tableName - Name of the table the record comes from.

Field attributes

The IBM Db2 source generates field attributes for columns converted certain data types in Data Collector. The attributes provide additional information about each field.

The source generates the following field attributes data types:
Data Collector Data Type Generated Field Attribute Description
Date unit Original unit in which the date was obtained from IBM Db2.
Datetime unit Original unit in which the date was obtained from IBM Db2.
Decimal precision Precision of the value obtained from IBM Db2.
Decimal bit_width Original bit width in which the value was obtained from IBM Db2.
Decimal scale Scale of the value obtained from IBM Db2.
Decimal and String flight_type Original flight type.
Double and Float precision Original precision obtained from IBM Db2.
Integer, Long, and Short bit_width Original bit width in which the value was obtained from IBM Db2.
Integer, Long, and Short signed Whether the original value obtained from IBM Db2 was signed.
Time unit Original unit in which the date was obtained from IBM Db2.

IBM Db2 data types

The IBM Db2 source converts IBM Db2 data types into Data Collector data types before passing them to the flow.

The following table describes how IBM Db2 data types are converted to Data Collector data types:
IBM Db2 Data Type Data Collector Data Type
BigInt Long
Binary Byte Array
Binary_View Byte Array
Boolean Boolean
Char String
Clob String
Date Date
Dbclob String
DecFloat String
Decimal Decimal
Double Double
Duration Long
Fixed_Size_Binary Byte Array
Fixed_Size_List List
Graphic String
Integer Integer
Large_Binary Byte Array
Large_List List
Large_List_View List
List List
List_View List
Nchar String
Numeric Decimal
Nvarchar String
Real Float
SmallInt Short
Struct Map
Time Time
Timestamp Datetime
Varchar String
Vargraphic String
XML String

Key column data types

The source supports the following Data Collector data types as key columns:

  • Boolean
  • Char
  • Date
  • Datetime
  • Decimal
  • Double
  • Float
  • Integer
  • Long
  • Short
  • String
  • Time
  • Zoned Datetime

Troubleshooting IBM Connectivity Service

The source connects to the database through IBM Connectivity Service. IBM Connectivity Service and all stages that use the service are bundled into the IBM Connectivity Service stage library, streamsets-datacollector-ibm-connectivity-service-lib. By default, Data Collector runs IBM Connectivity Service when the IBM Connectivity Service stage library is installed.

If IBM Connectivity Service stops responding, Data Collector attempts to restart the service. If Data Collector is unable to restart IBM Connectivity Service, you can use the following API calls to view the IBM Connectivity Service state and restart IBM Connectivity Service:
Call CURL request
Get IBM Connectivity Service state

curl --location 'https://<engine url>/rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Restart IBM Connectivity Service

curl --location 'https://<engine url>//rest/v1/flight/restart' --header 'Authorization: Bearer <IBM Cloud API key>

Get IBM Connectivity Service logs

curl --location 'https://<engine url>//rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Note: You use an IBM Cloud API key to make these API calls. For information on managing IBM Cloud API keys, see the IBM Cloud documentation.

Configuring an IBM Db2 source

About this task

Configure an IBM Db2 source to read data from an IBM Db2 database.

Procedure

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the flow for error handling.
    • Stop Flow - Stops the flow.
  2. On the DB2 tab, configure the following properties:
    Db2 Property Description
    Connection

    7.4 and later

    Connection that defines the information that is required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can specify the details in local properties. When you select a connection, the flow canvas hides properties that are defined in the connection.

    Host Host name or IP address of the database.
    Port Database port number.
    Database Type Type of Db2 database to connect to.
    Database Database name.
    Use SSL Enables SSL/TLS.
    SSL Certificate SSL certificate of the host. Enter a credential function that returns the certificate or enter the contents of the certificate.
  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    User Name Database user name.
    Password Database password.
  4. On the Tables tab, configure the following properties:
    Tables Property Description
    Schema

    Name of the schema that contains the table to read.

    Using simple or bulk edit mode, click Add to add additional schemas.

    Table Name Name of the table to read.

    Using simple or bulk edit mode, click Add to add additional tables.

    Key Columns Ordered list of key columns to use as offset columns of the select query.

    As a best practice, a key column should be an incremental and unique column that does not contain null values. If the column is not unique, meaning multiple rows can have the same value for this column, there is potential for data loss when the flow restarts.

    Maximum Batch Size Maximum number of records processed at one time. Honors values up to the Data Collector maximum batch size.
    Number of Threads Number of threads the source generates and uses for multithreaded processing.
  5. On the Advanced tab, optionally configure advanced properties.

    The defaults for these properties should work in most cases:

    Advanced Property Description
    Connection Validation RPC Timeout Maximum number of seconds to wait for an IBM Connectivity Service connection or authentication.

    Default is 3 minutes.

    Generic RPC Timeout Maximum number of seconds to wait for all other IBM Connectivity Service actions.

    Default is 15 minutes.