IBM watsonx.data

The IBM watsonx.data target writes data to IBM watsonx.data. For information about supported versions, see Supported systems and versions.

When you configure the IBM watsonx.data target, you specify the host name, port number, and deployment details. In addition, you can enable the target to use SSL. You can also use a connection to configure the target.

You specify tables to load the data into and optionally configure the target to create new tables when needed.

Before you use the IBM watsonx.data target, you must complete some prerequisite tasks.

Prerequisite

About this task

Before you configure the IBM watsonx.data target, complete the following prerequisite task.

Procedure

Verify that the Db2Data Collector flight.enable property is set to auto or true.

Specifying tables

You can use the IBM watsonx.data target to write to one or more tables.

Specify the tables to use based on how many tables you want to write to:
Single table
To write data to a single table, simply enter the table name and schema name for the table.
Multiple tables in a single schema
To write data to multiple tables in a single schema, specify a record field or record header attribute in the record that defines the table names as follows.
To specify a field that defines the table names:
${record:value('/<field_name>')}
To specify an attribute that defines the table names:
${record:attribute('<attribute_name>')}
Multiple tables in multiple schemas
To write data to multiple tables in multiple schemas, specify record fields or record header attributes in the record that define the table and schema names as follows.
To specify a field that defines the table or schema names:
${record:value('/<field_name>')}
To specify an attribute that defines the table or schema names:
${record:attribute('<attribute_name>')}

Use the Table Name and Schema Name properties on the Table tab to specify the tables to write to.

Creating tables

You can configure the IBM watsonx.data target to create tables when they do not exist in IBM watsonx.data.

The target creates tables, columns, and primary key columns as follows:
Table
The target creates a table if the specified table does not exist. The target names new tables from the first record processed.
Table columns
In the created tables, the target determines the columns from the first record processed. The target infers the data types from the data in the record.
Primary key columns
The target creates tables without primary key columns.

Select the Auto Create Table property on the Tables tab to configure the target to create tables.

watsonx.data data types

The IBM watsonx.data target converts Data Collector data types into IBM watsonx.data data types.

The following Data Collector data types are converted to the UTC time zone during conversion to watsonx.data data types:
  • Date
  • Datetime
  • Time
  • Zoned Datetime
The following table describes how Data Collector data types are converted to standard IBM watsonx.data data types:
Data Collector data type IBM watsonx.data data type
Boolean Boolean
Byte Integer
Char Varchar
Date Date
Datetime Timestamp
Decimal Decimal
Double Double
Float Real
Integer Integer
Long Bigint
Short Integer
String Varchar
Time Time
Zoned Datetime Timestamp

Troubleshooting IBM Connectivity Service

The IBM watsonx.data target connects to the database through IBM Connectivity Service. IBM Connectivity Service and all stages that use the service are bundled into the IBM Connectivity Service stage library, streamsets-datacollector-ibm-connectivity-service-lib. By default, Data Collector runs IBM Connectivity Service when the IBM Connectivity Service stage library is installed.

If IBM Connectivity Service stops responding, Data Collector attempts to restart the service. If Data Collector is unable to restart IBM Connectivity Service, you can use the following API calls to view the IBM Connectivity Service state and restart IBM Connectivity Service:
Call CURL request
Get IBM Connectivity Service state

curl --location 'https://<engine url>/rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Restart IBM Connectivity Service

curl --location 'https://<engine url>//rest/v1/flight/restart' --header 'Authorization: Bearer <IBM Cloud API key>

Get IBM Connectivity Service logs

curl --location 'https://<engine url>//rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Note: You use an IBM Cloud API key to make these API calls. For information on managing IBM Cloud API keys, see the IBM Cloud documentation.

Configuring an IBM watsonx.data target

About this task

Configure an IBM watsonx.data target to write data to IBM watsonx.data.

Procedure

  1. In the Properties panel, on the General tab, configure the following properties:
    General property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the flow.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the flow for error handling.
    • Stop Flow - Stops the flow.
  2. On the watsonx.data tab, configure the following properties:
    watsonx.data property Description
    Connection

    7.4 and later

    Connection that defines the information that is required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can specify the details in local properties. When you select a connection, the flow canvas hides properties that are defined in the connection.

    Host IBM watsonx.data host name or IP address.
    Port IBM watsonx.data port number.
    Deployment Type Type of deployment to connect to:
    • IBM watsonx.data on IBM Cloud
    • IBM watsonx.data on Red Hat OpenShift
    CRN Name of the watsonx.data cloud resource.

    Required for watsonx.data deployed on IBM Cloud.

    Instance ID IBM watsonx.data instance ID. You can find the instance ID in the navigation menu of the watsonx.data console.

    Required for watsonx.data deployed on Red Hat OpenShift.

    Engine ID IBM watsonx.data Presto (Java) engine ID.
    Engine Host IBM watsonx.data Presto (Java) engine host name or IP address.
    Engine Port IBM watsonx.data Presto (Java) engine port number.
    Use SSL Enables SSL/TLS.
    SSL Certificate SSL certificate of the host. Enter a credential function that returns the certificate or enter the contents of the certificate.
  3. On the Credentials tab, configure the following properties:
    Credentials property Description
    Authentication Method Authentication method to use for IBM watsonx.data deployed on RedHat OpenShift:
    • Username and Password
    • Username and API Key
    Username User name for accessing the watsonx.data data source.

    Required for watsonx.data deployed on Red Hat OpenShift.

    Password Password associated with the user name for accessing the watsonx.data data source.

    Required for watsonx.data deployed on Red Hat OpenShift using user name and password authentication.

    API Key API key associated with the user name for accessing the watsonx.data data source.

    For IBM watsonx.data deployed on IBM Cloud, you can find the API key on the IBM Cloud console under Manage > Access (IAM) > API keys.

    For IBM watsonx.data deployed on Red Hat OpenShift, the API key can be generated from the watsonx.data profile settings.

    Required for watsonx.data deployed on IBM Cloud and for watsonx.data deployed on Red Hat OpenShift using user name and API key authentication.
  4. On the Tables tab, configure the following properties:
    Tables property Description
    Catalog Name Name of the catalog that contains the schema to write to.
    Schema Name Name of the schema that contains the tables to write to. You can use an expression that evaluates to the record field or record header attribute in the record containing the schema name.
    Table Name Names of the table to write to. You can us an expression that evaluates to the record field or record header attribute in the record containing the table name.
    Auto Create Table If the specified table does not exist in the database, the target creates the table in the specified schema.
  5. On the Advanced tab, optionally configure advanced properties.

    The defaults for these properties should work in most cases:

    Advanced property Description
    Ignore Extra Fields Ignores extra record fields not present in the target table. When this property is disabled, extra record fields are sent to error.
    Number of Threads Maximum number of connections that the target uses to write to the database across all flow runners. If this property is set to a number smaller than the number of flow runners, the target uses one connection for each flow runner.

    Records in a batch going to the same table are processing in a single thread, in the same order they are received.

    The default, 0, ensures that the target uses the same number of connections as threads used by the flow.

    Connection Validation RPC Timeout Maximum number of seconds to wait for an IBM Connectivity Service connection or authentication.

    Default is 3 minutes.

    Generic RPC Timeout Maximum number of seconds to wait for all other IBM Connectivity Service actions.

    Default is 15 minutes.