IBM Cloud Object Storage

6.4 and later

The IBM Cloud Object Storage target writes data to IBM Cloud Object Storage. For information about supported versions, see Supported systems and versions.

When you configure the IBM Cloud Object Storage target, you specify the endpoint to connect to. In addition, you can enable the target to use SSL. You can also use a connection to configure the target.

You specify the file name, file suffix, and file type to write to, and optionally specify file and object prefixes. You also specify the table and table type to load the data into.

Before you use the IBM Cloud Object Storage target, you must complete some prerequisite tasks.

Prerequisite

About this task

Before you configure the IBM Cloud Object Storage target, complete the following prerequisite task.

Procedure

Verify that the Db2Data Collector flight.enable property is set to auto or true.

Bucket

When you configure the bucket where records should be written, you can specify an exact bucket name or you can use an expression that evaluates to a bucket name.

For example, to write to buckets based on data in the Type field, you can use the following expression to define the bucket: ${record:value('/Type')}.

With this expression, the target writes records to buckets based on the data in the Type field.

Troubleshooting IBM Connectivity Service

The IBM Cloud Object Storage target connects to the database through IBM Connectivity Service. IBM Connectivity Service and all stages that use the service are bundled into the IBM Connectivity Service stage library, streamsets-datacollector-ibm-connectivity-service-lib. By default, Data Collector runs IBM Connectivity Service when the IBM Connectivity Service stage library is installed.

If IBM Connectivity Service stops responding, Data Collector attempts to restart the service. If Data Collector is unable to restart IBM Connectivity Service, you can use the following API calls to view the IBM Connectivity Service state and restart IBM Connectivity Service:
Call CURL request
Get IBM Connectivity Service state

curl --location 'https://<engine url>/rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Restart IBM Connectivity Service

curl --location 'https://<engine url>//rest/v1/flight/restart' --header 'Authorization: Bearer <IBM Cloud API key>

Get IBM Connectivity Service logs

curl --location 'https://<engine url>//rest/v1/flight/state' --header 'Authorization: Bearer <IBM Cloud API key>

Note: You use an IBM Cloud API key to make these API calls. For information on managing IBM Cloud API keys, see the IBM Cloud documentation.

Configuring an IBM Cloud Object Storage target

About this task

Configure an IBM Cloud Object Storage target to write data to IBM Cloud Object Storage.

Procedure

  1. In the Properties panel, on the General tab, configure the following properties:
    General property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the flow.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the flow for error handling.
    • Stop Flow - Stops the flow.
  2. On the COS tab, configure the following properties:
    COS property Description
    Connection

    7.4 and later

    Connection that defines the information that is required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can specify the details in local properties. When you select a connection, the flow canvas hides properties that are defined in the connection.

    Endpoint URL of the endpoint to connect to.
    Use SSL Enables SSL.

    Enable only if the IBM Cloud Object Storage instance is configured to accept SSL connections.

    SSL Certificate SSL certificate of the host. Enter a credential function that returns the certificate or enter the contents of the certificate.
  3. On the Credentials tab, configure the following properties:
    Credentials property Description
    Authentication Method Authentication method to use:
    • Access key and secret key
    • Resource instance and API key
    • Resource instance, API key, access key and secret key
    • Service credentials (full JSON snippet)
    Access Key IBM Cloud Object Storage access key ID.

    You can find the access key on the IBM Cloud console under Resources > Storage > Service credentials > access_key_id.

    Required for access key and secret key authentication.

    Secret Key IBM Cloud Object Storage secret key.

    You can find the secret key on the IBM Cloud console under Resources > Storage > Service credentials > secret_access_key.

    Required for access key and secret key authentication.

    API Key API key associated with the IBM Cloud Object Storage service.

    You can find the API key on the IBM Cloud console under Resources > Storage > Service credentials > apikey.

    Required for resource instance and API key authentication.

    Resource Instance ID IBM Cloud Object Storage resource instance key ID.

    You can find the resource instance key on the IBM Cloud console under Resources > Storage > Service credentials > resource_instance_key.

    Required for resource instance and API key authentication.

    Service Credentials (full JSON snippet) The contents of the IBM Cloud Object Storage service credentials file.

    You can find the service credentials on the IBM Cloud console under Resources > Storage > Service credentials. Click on the name of the credentials to expand the credentials.

  4. On the Files tab, configure the following properties:
    Files property Description
    Bucket Bucket to use when writing records.

    Enter a bucket name or define an expression that evaluates to bucket names.

    When using datetime variables in the expression, be sure to configure the time basis for the stage.

    File Name Suffix Suffix to use for file names, such as txt or json. When used, the target adds a period and the configured suffix as follows: <file name>.<suffix>.

    You can include periods within the suffix, but do not start the suffix with a period. Forward slashes are not allowed.

    File Name Prefix Optional prefix to use for file names. When used, the target adds the configured prefix followed by an underscore as follows: <prefix>_<file name>.<suffix>.
    Object Key Prefix Prefix to use for all object keys.
    File Format Format of the file to write to:
    • Avro
    • CSV
    • Delimited
    • Excel
    • JSON
    • ORC
    • Parquet
    • SAV
    • XML
    First Line is Header Use the first line written to the file as the header.

    Available for CSV, Delimited, and Excel file formats.

    Delimiter Optional delimiter to use in the file name.
    Include Runner IDs Include runner IDs in the file name.

    Use for multithreaded flows.

  5. On the Tables tab, configure the following properties:
    Tables property Description
    Table Format Format of the table to write to:
    • Delta Lake
    • Flat File
    • Iceberg

    Default is Flat File.

    Table Name Name of the table to write to.

    Required for Delta Lake and Iceberg table formats.

    Table Namespace Namespace that contains the table to write to.

    Required for Iceberg table format.

    Endpoint Folder Endpoint folder that contains the table to write to.

    Required for Delta Lake and Iceberg table formats.

    Table Data File Format Format of the table data file.
  6. On the Advanced tab, configure the following properties:
    Advanced property Description
    Connection Validation RPC Timeout Maximum number of seconds to wait for an IBM Connectivity Service connection or authentication.

    Default is 3 minutes.

    Generic RPC Timeout Maximum number of seconds to wait for all other IBM Connectivity Service actions.

    Default is 15 minutes.