Properties reference: CloudObjectStorage Connector

This topic lists all properties that you can set to configure the stage.

Connection

Login URL
The URL for logging to IBM Cloud Object Storage. Find this URL by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, and then clicking Endpoint in the left pane. Copy the value of the public endpoint that you want to use.
  • Type: string
  • Default: https://s3-api.us-geo.objectstorage.softlayer.net
Use resource instance ID
Choose this option to use Resource instance id instead of Access Key
  • Type: boolean
  • Default: false
Access key
Connecting to the IBM COS service with the S3 API requires credentials and an endpoint. Credentials consist of an Access Key and a Secret Key. Find the Access Key by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of access_key, not including quotation marks.
  • Type: string
Secret key
Connecting to the IBM COS service with the S3 API requires credentials and an endpoint. Credentials consist of an Access Key and a Secret Key. Find the Secret Key by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of secret_key, not including quotation marks.
  • Type: protected string
IAM URL
The URL that IBM Cloud Object Storage should use to authenticate the API key with Identity and Access Management (IAM).
  • Type: string
Resource instance ID
The identifier of the resource instance that you created when you ordered IBM Cloud Object Storage. Find the resource instance ID by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of resource_instance_id, not including quotation marks.
  • Type: string
API key
A token that is used to call the Watson IoT Platform HTTP APIs. API keys are assigned roles that grant them authorization to call certain sets of HTTP APIs. Find the API key by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of api_key, not including quotation marks.
  • Type: protected string
Region
Use this option to specify the geographical location of the data centers where the data should be read from or where the data should be stored.
  • Type: string

Usage

Bucket
The name of the bucket that contains the files to read or write.
  • Type: string
Create bucket
Use this option to create the bucket with the name given in the bucket property.
  • Type: boolean
  • Default: false
Storage class
Select a storage class for the created bucket from the following list: Standard, Vault, Cold vault, Flex.
  • Type: selection
  • Default: Standard
  • Values:
    • Standard
    • Vault
    • Cold vault
    • Flex
Read mode
The method for reading files
  • Type: selection
  • Default: Read a single file
  • Values:
    • Read a single file
    • Read multiple files using wildcards
    • Read multiple files using regex expression
Write mode
Use this property to specify the file name to write into or delete the file that matches the name.
  • Type: selection
  • Default: Write
  • Values:
    • Write
    • Delete
File name
Specify the name of the file
  • Type: string
Infer schema
Use this property to infer the schema (metadata) from the input data in the file.
  • Type: boolean
  • Default: false
Infer as VARCHAR
Treat the data in all columns as VARCHARs
  • Type: boolean
  • Default: false
File format
The format of the file
  • Type: selection
  • Default: CSV
  • Values:
    • Delimited
    • CSV
    • Parquet
    • Avro
    • JSON
    • Excel
    • ORC
Partitioned
Write the file as multiple partitions
  • Type: boolean
  • Default: false
Wave handling properties
Use the properties under this category to define how the data should be handled when it is being streamed as waves (i.e in batches) from the upstream stage. Typically, this would be required for any source stages that are configured to send the data in waves.
  • Type: category
Append unique identifier
Use this property to choose if a unique identifier is to be appended to the file name. When the value of this property is set to yes, then the file name gets appended with the unique identifier, and a new file would be written for every wave of data that is streamed into the stage. When the value of this property is set to No, then the file would be overwritten on every wave.
  • Type: boolean
  • Default: false
File size threshold
Specify the threshold for the file size in megabytes. Processing nodes will start a new file each time the size exceeds the value specified in the threshold.
  • Type: integer
  • Default: 1
File format properties
Specify the file syntax for delimited files.
  • Type: category
Header
Select Yes if the first row of the file contains field headers and is not part of the data. If you select Yes when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
  • Type: boolean
  • Default: false
Include types
Select Yes to append the data type to each field name that the connector writes in the first row of the output.
  • Type: boolean
  • Default: false
Field delimiter
The character that separates each value from the next value, for example, a comma
  • Type: selection
  • Default: comma
  • Values:
    • comma
    • tab
    • colon
Row delimiter
The character or characters that separate one line from another, for example, CR/LF (Carriage Return/Line Feed)
  • Type: selection
  • Default: Newline
  • Values:
    • Line feed
    • Carriage return
    • Carriage return line feed
    • Newline
Null value
The value that represents null (a missing value) in the file, for example, NULL
  • Type: string
Escape character
The character that's used to escape other characters, for example, a backslash. Escaping is a string technique that identifies characters as being part of a string value.
  • Type: selection
  • Default: None
  • Values:
    • None
    • Double quote
    • Single quote
    • Backslash
Quote character
The character that's used to enclose string values, for example, a double quotation marks
  • Type: selection
  • Default: None
  • Values:
    • None
    • Double quote
    • Single quote
Encoding
The appropriate character encoding for your data, for example, UTF-8
  • Type: string
  • Default: utf-8
Decimal format
The format of decimal values, for example, #,###.##
  • Type: string
Date format
Specify a string that defines the format for fields that have the Date data type.
  • Type: string
Time format
Specify a string that defines the format for fields that have the Time data type.
  • Type: string
Timestamp format
Specify a string that defines the format for fields that have the Timestamp data type.
  • Type: string
Parquet format properties
Parquet File Format Properties
  • Type: category
Parquet compression codec
Select the Parquet compression algorithm to compress the data.
  • Type: selection
  • Default: None
  • Values:
    • None
    • Snappy
    • Gzip
Avro format properties
Avro File Format Properties
  • Type: category
Avro compression codec
Select the Avro compression algorithm to compress the data
  • Type: selection
  • Default: None
  • Values:
    • None
    • Deflate
    • Snappy
    • Bzip2
ORC format properties
ORC File Format Properties
  • Type: category
ORC compression codec
Select the compression codec to use when writing
  • Type: selection
  • Default: None
  • Values:
    • None
    • ZLib
    • Snappy
    • LZO
    • LZ4
Excel format properties
Excel File Format Properties
  • Type: category
Cell range
The range of cells to retrieve from the Excel worksheet, for example, C1:F10
  • Type: string
Excel worksheet name
Use this property to specify the name of the Excel worksheet to read.
  • Type: string
Invalid data handling
How to handle values that are not valid: fail the job, null the column, or drop the row
  • Type: selection
  • Default: Fail
  • Values:
    • Fail
    • Column
    • Row
Row limit
Specify the maximum number of records to read from the file per node. If a value is not specified for this property, the entire file is read.
  • Type: string
Byte limit
Specify the maximum number of bytes to return. Use any of these suffixes: KB, MB, GB, or TB.
  • Type: string
Java settings
Properties for specifying JVM options
  • Type: category
Heap size
Heap size(MB). This property corresponds to the -Xmx command line option.
  • Type: integer
  • Default: 256
  • Minimum: 128
JVM options
Enter additional command line arguments to the Java Virtual Machine.
  • Type: string