IBM watsonx.data

The IBM watsonx.data target writes data to IBM watsonx.data. For information about supported versions, see Supported systems and versions.

When you configure the IBM watsonx.data target, you specify the host name, port number, and deployment details. In addition, you can enable the target to use SSL. You can also use a connection to configure the target.

You specify tables to load the data into and optionally configure the target to create new tables when needed.

Before you use the IBM watsonx.data target, you must complete some prerequisite tasks.

Prerequisite

About this task

Before you configure the IBM watsonx.data target, complete the following prerequisite task.

Procedure

Verify that the Db2Data Collector flight.enable property is set to auto or true.

Specifying tables

You can use the IBM watsonx.data target to write to one or more tables.

Specify the tables to use based on how many tables you want to write to:

Single table

To write data to a single table, simply enter the table name and schema name for the table.

Multiple tables in a single schema

To write data to multiple tables in a single schema, specify a record field or record header attribute in the record that defines the table names as follows.

To specify a field that defines the table names:

${record:value('/<field_name>')}

To specify an attribute that defines the table names:

${record:attribute('<attribute_name>')}

Multiple tables in multiple schemas

To write data to multiple tables in multiple schemas, specify record fields or record header attributes in the record that define the table and schema names as follows.

To specify a field that defines the table or schema names:

${record:value('/<field_name>')}

To specify an attribute that defines the table or schema names:

${record:attribute('<attribute_name>')}

Use the Table Name and Schema Name properties on the Table tab to specify the tables to write to.

Creating tables

You can configure the IBM watsonx.data target to create tables when they do not exist in IBM watsonx.data.

The target creates tables, columns, and primary key columns as follows:

Table: The target creates a table if the specified table does not exist. The target names new tables from the first record processed.
Table columns: In the created tables, the target determines the columns from the first record processed. The target infers the data types from the data in the record.
Primary key columns: The target creates tables without primary key columns.

Select the Auto Create Table property on the Tables tab to configure the target to create tables.

watsonx.data data types

The IBM watsonx.data target converts Data Collector data types into IBM watsonx.data data types.

The following Data Collector data types are converted to the UTC time zone during conversion to watsonx.data data types:

Date
Datetime
Time
Zoned Datetime

The following table describes how Data Collector data types are converted to standard IBM watsonx.data data types:

Data Collector data type	IBM watsonx.data data type
Boolean	Boolean
Byte	Integer
Char	Varchar
Date	Date
Datetime	Timestamp
Decimal	Decimal
Double	Double
Float	Real
Integer	Integer
Long	Bigint
Short	Integer
String	Varchar
Time	Time
Zoned Datetime	Timestamp

Troubleshooting IBM Connectivity Service

The IBM watsonx.data target connects to the database through IBM Connectivity Service. IBM Connectivity Service and all stages that use the service are bundled into the IBM Connectivity Service stage library, streamsets-datacollector-ibm-connectivity-service-lib. By default, Data Collector runs IBM Connectivity Service when the IBM Connectivity Service stage library is installed.

If IBM Connectivity Service stops responding, Data Collector attempts to restart the service. If Data Collector is unable to restart IBM Connectivity Service, you can use the following API calls to view the IBM Connectivity Service state and restart IBM Connectivity Service:

Call	CURL request
Get IBM Connectivity Service state	curl --location 'https://<engine url>/rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>
Restart IBM Connectivity Service	curl --location 'https://<engine url>//rest/v1/flight/restart' --header 'Authorization: Bearer <IBM Cloud API key>
Get IBM Connectivity Service logs	curl --location 'https://<engine url>//rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Note: You use an IBM Cloud API key to make these API calls. For information on managing IBM Cloud API keys, see the IBM Cloud documentation.

Configuring an IBM watsonx.data target

About this task

Configure an IBM watsonx.data target to write data to IBM watsonx.data.

Procedure

In the Properties panel, on the General tab, configure the following properties:

General property	Description
Name	Stage name.
Description	Optional description.
Required Fields	Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses. Records that do not include all required fields are processed based on the error handling configured for the flow.
Preconditions	Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the flow for error handling. Stop Flow - Stops the flow.

On the watsonx.data tab, configure the following properties:

watsonx.data property	Description
Connection 7.4 and later	Connection that defines the information that is required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can specify the details in local properties. When you select a connection, the flow canvas hides properties that are defined in the connection.
Host	IBM watsonx.data host name or IP address.
Port	IBM watsonx.data port number.
Deployment Type	Type of deployment to connect to: IBM watsonx.data on IBM Cloud IBM watsonx.data on Red Hat OpenShift
CRN	Name of the watsonx.data cloud resource. Required for watsonx.data deployed on IBM Cloud.
Instance ID	IBM watsonx.data instance ID. You can find the instance ID in the navigation menu of the watsonx.data console. Required for watsonx.data deployed on Red Hat OpenShift.
Engine ID	IBM watsonx.data Presto (Java) engine ID.
Engine Host	IBM watsonx.data Presto (Java) engine host name or IP address.
Engine Port	IBM watsonx.data Presto (Java) engine port number.
Use SSL	Enables SSL/TLS.
SSL Certificate	SSL certificate of the host. Enter a credential function that returns the certificate or enter the contents of the certificate.

On the Credentials tab, configure the following properties:

Credentials property	Description
Authentication Method	Authentication method to use for IBM watsonx.data deployed on RedHat OpenShift: Username and Password Username and API Key
Username	User name for accessing the watsonx.data data source. Required for watsonx.data deployed on Red Hat OpenShift.
Password	Password associated with the user name for accessing the watsonx.data data source. Required for watsonx.data deployed on Red Hat OpenShift using user name and password authentication.
API Key	API key associated with the user name for accessing the watsonx.data data source. For IBM watsonx.data deployed on IBM Cloud, you can find the API key on the IBM Cloud console under Manage > Access (IAM) > API keys. For IBM watsonx.data deployed on Red Hat OpenShift, the API key can be generated from the watsonx.data profile settings. Required for watsonx.data deployed on IBM Cloud and for watsonx.data deployed on Red Hat OpenShift using user name and API key authentication.

On the Tables tab, configure the following properties:

Tables property	Description
Catalog Name	Name of the catalog that contains the schema to write to.
Schema Name	Name of the schema that contains the tables to write to. You can use an expression that evaluates to the record field or record header attribute in the record containing the schema name.
Table Name	Names of the table to write to. You can us an expression that evaluates to the record field or record header attribute in the record containing the table name.
Auto Create Table	If the specified table does not exist in the database, the target creates the table in the specified schema.

On the Advanced tab, optionally configure advanced properties.

The defaults for these properties should work in most cases:

Advanced property	Description
Ignore Extra Fields	Ignores extra record fields not present in the target table. When this property is disabled, extra record fields are sent to error.
Number of Threads	Maximum number of connections that the target uses to write to the database across all flow runners. If this property is set to a number smaller than the number of flow runners, the target uses one connection for each flow runner. Records in a batch going to the same table are processing in a single thread, in the same order they are received. The default, 0, ensures that the target uses the same number of connections as threads used by the flow.
Connection Validation RPC Timeout	Maximum number of seconds to wait for an IBM Connectivity Service connection or authentication. Default is 3 minutes.
Generic RPC Timeout	Maximum number of seconds to wait for all other IBM Connectivity Service actions. Default is 15 minutes.