Properties reference: CloudObjectStorage Connector
This topic lists all properties that you can set to configure the stage.
Connection
- Login URL
- The URL for logging to IBM Cloud Object Storage. Find this URL by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, and then clicking Endpoint in the left pane. Copy the value of the public endpoint that you want to use.
- Type: string
- Default: https://s3-api.us-geo.objectstorage.softlayer.net
- Use resource instance ID
- Choose this option to use Resource instance id instead of Access Key
- Type: boolean
- Default: false
- Access key
- Connecting to the IBM COS service with the S3 API requires credentials and an endpoint. Credentials consist of an Access Key and a Secret Key. Find the Access Key by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of access_key, not including quotation marks.
- Type: string
- Secret key
- Connecting to the IBM COS service with the S3 API requires credentials and an endpoint. Credentials consist of an Access Key and a Secret Key. Find the Secret Key by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of secret_key, not including quotation marks.
- Type: protected string
- IAM URL
- The URL that IBM Cloud Object Storage should use to authenticate the API key with Identity and Access Management (IAM).
- Type: string
- Resource instance ID
- The identifier of the resource instance that you created when you ordered IBM Cloud Object Storage. Find the resource instance ID by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of resource_instance_id, not including quotation marks.
- Type: string
- API key
- A token that is used to call the Watson IoT Platform HTTP APIs. API keys are assigned roles that grant them authorization to call certain sets of HTTP APIs. Find the API key by going to https://console.ng.bluemix.net/dashboard/services, clicking the Cloud Object Storage service, clicking Service credentials in the left pane, and then clicking View credentials in the Actions column of the Service Credentials table. Copy the value of api_key, not including quotation marks.
- Type: protected string
- Region
- Use this option to specify the geographical location of the data centers where the data should be read from or where the data should be stored.
- Type: string
Usage
- Bucket
- The name of the bucket that contains the files to read or write.
- Type: string
- Create bucket
- Use this option to create the bucket with the name given in the bucket property.
- Type: boolean
- Default: false
- Storage class
- Select a storage class for the created bucket from the following list: Standard, Vault, Cold vault, Flex.
- Type: selection
- Default: Standard
- Values:
- Standard
- Vault
- Cold vault
- Flex
- Read mode
- The method for reading files
- Type: selection
- Default: Read a single file
- Values:
- Read a single file
- Read multiple files using wildcards
- Read multiple files using regex expression
- Write mode
- Use this property to specify the file name to write into or delete the file that matches the name.
- Type: selection
- Default: Write
- Values:
- Write
- Delete
- File name
- Specify the name of the file
- Type: string
- Infer schema
- Use this property to infer the schema (metadata) from the input data in the file.
- Type: boolean
- Default: false
- Infer as VARCHAR
- Treat the data in all columns as VARCHARs
- Type: boolean
- Default: false
- File format
- The format of the file
- Type: selection
- Default: CSV
- Values:
- Delimited
- CSV
- Parquet
- Avro
- JSON
- Excel
- ORC
- Partitioned
- Write the file as multiple partitions
- Type: boolean
- Default: false
- Wave handling properties
- Use the properties under this category to define how the data should be handled when it is being streamed as waves (i.e in batches) from the upstream stage. Typically, this would be required for any source stages that are configured to send the data in waves.
- Type: category
- Append unique identifier
- Use this property to choose if a unique identifier is to be appended to the file name. When the value of this property is set to yes, then the file name gets appended with the unique identifier, and a new file would be written for every wave of data that is streamed into the stage. When the value of this property is set to No, then the file would be overwritten on every wave.
- Type: boolean
- Default: false
- File size threshold
- Specify the threshold for the file size in megabytes. Processing nodes will start a new file each time the size exceeds the value specified in the threshold.
- Type: integer
- Default: 1
- File format properties
- Specify the file syntax for delimited files.
- Type: category
- Header
- Select Yes if the first row of the file contains field headers and is not part of the data. If you select Yes when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
- Type: boolean
- Default: false
- Include types
- Select Yes to append the data type to each field name that the connector writes in the first row of the output.
- Type: boolean
- Default: false
- Field delimiter
- The character that separates each value from the next value, for example, a comma
- Type: selection
- Default: comma
- Values:
- comma
- tab
- colon
- Row delimiter
- The character or characters that separate one line from another, for example, CR/LF (Carriage Return/Line Feed)
- Type: selection
- Default: Newline
- Values:
- Line feed
- Carriage return
- Carriage return line feed
- Newline
- Null value
- The value that represents null (a missing value) in the file, for example, NULL
- Type: string
- Escape character
- The character that's used to escape other characters, for example, a backslash. Escaping is a string technique that identifies characters as being part of a string value.
- Type: selection
- Default: None
- Values:
- None
- Double quote
- Single quote
- Backslash
- Quote character
- The character that's used to enclose string values, for example, a double quotation marks
- Type: selection
- Default: None
- Values:
- None
- Double quote
- Single quote
- Encoding
- The appropriate character encoding for your data, for example, UTF-8
- Type: string
- Default: utf-8
- Decimal format
- The format of decimal values, for example, #,###.##
- Type: string
- Date format
- Specify a string that defines the format for fields that have the Date data type.
- Type: string
- Time format
- Specify a string that defines the format for fields that have the Time data type.
- Type: string
- Timestamp format
- Specify a string that defines the format for fields that have the Timestamp data type.
- Type: string
- Parquet format properties
- Parquet File Format Properties
- Type: category
- Parquet compression codec
- Select the Parquet compression algorithm to compress the data.
- Type: selection
- Default: None
- Values:
- None
- Snappy
- Gzip
- Avro format properties
- Avro File Format Properties
- Type: category
- Avro compression codec
- Select the Avro compression algorithm to compress the data
- Type: selection
- Default: None
- Values:
- None
- Deflate
- Snappy
- Bzip2
- ORC format properties
- ORC File Format Properties
- Type: category
- ORC compression codec
- Select the compression codec to use when writing
- Type: selection
- Default: None
- Values:
- None
- ZLib
- Snappy
- LZO
- LZ4
- Excel format properties
- Excel File Format Properties
- Type: category
- Cell range
- The range of cells to retrieve from the Excel worksheet, for example, C1:F10
- Type: string
- Excel worksheet name
- Use this property to specify the name of the Excel worksheet to read.
- Type: string
- Invalid data handling
- How to handle values that are not valid: fail the job, null the column, or drop the row
- Type: selection
- Default: Fail
- Values:
- Fail
- Column
- Row
- Row limit
- Specify the maximum number of records to read from the file per node. If a value is not specified for this property, the entire file is read.
- Type: string
- Byte limit
- Specify the maximum number of bytes to return. Use any of these suffixes: KB, MB, GB, or TB.
- Type: string
- Java settings
- Properties for specifying JVM options
- Type: category
- Heap size
- Heap size(MB). This property corresponds to the -Xmx command line option.
- Type: integer
- Default: 256
- Minimum: 128
- JVM options
- Enter additional command line arguments to the Java Virtual Machine.
- Type: string