Properties reference: Amazon S3

This topic lists all properties that you can set to configure the stage.

Connection

For more information, see the Defining a connection to Amazon S3 topic.

Region
Specify the Amazon Web Services geographical region where the data is stored or where you want to store the data. For a list of regions, see the Amazon S3 documentation.
  • Type: string
Use credentials file
Select whether to use a file that contains the access key and secret key when the connector connects to Amazon S3.
  • Type: boolean
  • Default: true
Credentials file
Specify the path to a file that contains the access key and secret key. If a value is not specified for this property, the credentials must be specified for the Access key and Secret key properties.
  • Type: string
Access key
Specify the Amazon Web Services access key. If you specify an access key, you must also specify a secret key.
  • Type: string
Secret key
Specify the Amazon Web Services secret key. If you specify a secret key, you must also specify an access key.
  • Type: protected string
Use proxy server
Specify whether to use proxy server for connection to Amazon S3.
  • Type: boolean
  • Default: false
Proxy server name
Specify the proxy server name.
  • Type: string
Proxy server port
Specify the port number where the proxy server is running.
  • Type: integer
Proxy server username
Specify the proxy server username.
  • Type: string
Proxy server password
Specify the proxy server password.
  • Type: protected string

Usage

Is reject link
Select Yes if the link is configured to carry rejected data from the source stage.
  • Type: boolean
  • Default: false
Write mode
Select Write to write a file per node, or select Delete to delete files.
  • Type: selection
  • Default: Write
  • Values:
    • Write
    • Delete
For more information, see the Configuring the Amazon S3 connector as a target topic.
Read mode
Select Read single file to read from a single file or Read multiple files to read from the files that match a specified file prefix. Select List buckets to list the buckets for your account in the specified region. Select List files to list files files for your account in the specified bucket.
  • Type: selection
  • Default: Read single file
  • Values:
    • Read single file
    • Read multiple files
    • List buckets
    • List files
For more information, see the Configuring the Amazon S3 connector as a source topic.
Bucket
Specify the Amazon S3 bucket to read files from.
  • Type: string
File name
Specify the name of the file to read from. If the Read mode is Read multiple files or List files specify a file prefix. When specifying a prefix, append a / character if the prefix represents a folder.
  • Type: string
For more information, see the File name options for reading and writing partitioned data topic.
Exclude files
Specify a comma-separated list of file prefixes to exclude from the files that are read. If a prefix includes a comma, escape the comma by using a backslash (\\).
  • Type: string
Include child folders
Specify whether to read files that are in child folders of the prefix that is specified for the File name property. If you exclude child folders, the prefix that is specified must include a trailing forward slash (/).
  • Type: boolean
  • Default: true
Bucket
Specify the Amazon S3 bucket to write files to or delete files from.
  • Type: string
File name
Specify the name of the file to write to.
  • Type: string
For more information, see the File name options for reading and writing partitioned data topic.
Wave handling properties
Use the properties under this category to define how the data should be handled when it is being streamed as waves (i.e in batches) from the upstream stage. Typically, this would be required for any source stages that are configured to send the data in waves
  • Type: category
Append unique identifier
Use this property to choose if a unique identifier is to be appended to the file name. When the value of this property is set to yes, then the file name gets appended with the unique identifier, and a new file would be written for every wave of data that is streamed into the stage. When the value of this property is set to No, then the file would be overwritten on every wave
  • Type: boolean
  • Default: false
File size threshold
Specify the threshold for the file size in megabytes. Processing nodes will start a new file each time the size exceeds the value specified in the threshold and on reaching the wave boundary. The file will be written only on the wave boundary and hence the threshold value specified is only a soft limit. The actual size of the file can be higher than the specified threshold depending on the size of the wave
  • Type: integer
  • Default: 1
Create bucket
Select Yes to create the bucket that is specified in the Bucket property or the Bucket column. If a value is specified for the Region property, the bucket is created in that region.
  • Type: boolean
  • Default: false
Append unique ID
Select whether to append a unique set of characters to identify the bucket to the bucket name that is created.
  • Type: boolean
  • Default: false
If file exists
Specify what the connector does when it tries to write a file that already exists. Select Overwrite file to overwrite a file if it already exists, Do not overwrite file to not overwrite the file and stop the job, or Fail to stop the job with an error message.
  • Type: selection
  • Default: Overwrite file
  • Values:
    • Overwrite file
    • Do not overwrite file
    • Fail
Reject mode
Specify what the connector does when a record that contains invalid data is found in the source file. Select Continue to read the rest of the file, Fail to stop the job with an error message, or Reject to send the rejected data to a reject link.
  • Type: selection
  • Default: Continue
  • Values:
    • Continue
    • Fail
    • Reject
File name column
Specify the name of the column to write the source file name to.
  • Type: string
File format
  • Type: selection
  • Default: Delimited
  • Values:
    • Delimited
    • Comma-separated value (CSV)
    • Amazon RedShift
    • Avro
    • Parquet
    • orc
Delimited format properties
Specify the file syntax for delimited files.
  • Type: category
Record limit
Specify the maximum number of records to read from the Amazon S3 file per node. If a value is not specified for this property, the entire file is read.
  • Type: integer
  • Minimum: 0
Encoding
Specify the encoding of the Amazon S3 files to read or write, for example, UTF-8.
  • Type: string
For more information, see the File encoding topic.
Include byte order mark
Specify whether to include a byte order mark in the file when the file encoding is a Unicode encoding such as UTF-8, UTF-16, or UTF-32.
  • Type: boolean
  • Default: false
For more information, see the File encoding topic.
Record definition
Select whether the record definition is provided to the Amazon S3 connector from the source file, a delimited string, a file that contains a delimited string, or a schema file. When runtime column propagation is enabled, this metadata provides the column definitions. If a schema file is provided, the schema file overrides the values of formatting properties in the stage and the column definitions that are specified on the Columns page of the output link.
  • Type: selection
  • Default: None
  • Values:
    • None
    • Amazon S3 file
    • Delimited string
    • Delimited string in a file
    • Schema file
Definition source
If the record definition is a delimited string, enter a delimited string that specifies the names and data types of the files. Use the format name:data_type, and separate each field with the delimiter specified as the >B<Field delimiter>/B< property. If the record definition is in a delimited string file or Osh schema file, specify the full path of the file.
  • Type: string
For more information, see the Metadata formatting options topic.
First row is header
Select Yes if the first row of the file contains field headers and is not part of the data. If you select Yes, when the connector writes data, the field names will be the first row of the output. If runtime column propagation is enabled, metadata can be obtained from the first row of the file.
  • Type: boolean
  • Default: false
Include data types
Select Yes to append the data type to each field name that the connector writes in the first row of the output.
  • Type: boolean
  • Default: false
Data format
Select how binary data is represented. Binary data includes data that is of integer, float, double, or binary data types. If variable length binary fields are written as binary, they are prefixed with a 4-byte integer that represents the size of the field.
  • Type: selection
  • Default: Text
  • Values:
    • Text
    • Binary
Quote binary data
Specify whether data in fields that are of binary data types are enclosed in quotation marks. The quotation mark character is specified by the Quotation mark property.
  • Type: boolean
  • Default: false
Field delimiter
Specify a string or one of the following values: <NL>, <CR>, <LF>, <TAB>. The string can include Unicode escape strings in the form \\uNNNN where NNNN is the Unicode character code.
  • Type: string
  • Default: ,
For more information, see the Field delimiter and row delimiter properties topic.
Row delimiter
Specify a string or one of the following values: <NL>, <CR>, <LF>, <TAB>. The string can include Unicode escape strings in the form \\uNNNN where NNNN is the Unicode character code.
  • Type: string
  • Default: <NL>
For more information, see the Field delimiter and row delimiter properties topic.
Escape character
Specify the character to use to escape field and row delimiters. If an escape character exists in the data, the escape character is also escaped. Because escape characters require additional processing, do not specify a value for this property if you do not need to include escape characters in the data.
  • Type: string
Quotation mark
  • Type: selection
  • Default: None
  • Values:
    • None
    • Double
    • Single
Null value
When defined as target: Specify the character or string that represents null values in the data. In the output file that is written to Amazon S3, null values are represented by the value that is specified for this property. To specify that an empty string represents a null value, specify "" (two double quotation marks).
  • Type: string
Field formats
  • Type: category
For more information, see the Formatting options for Decimal, Time, Date, and Timestamp data types topic.
Decimal format
Specify a string that defines the format for fields that have the Decimal or Numeric data type.
  • Type: string
Date format
Specify a string that defines the format for fields that have the Date data type.
  • Type: string
Time format
Specify a string that defines the format for fields that have the Time data type.
  • Type: string
Timestamp format
Specify a string that defines the format for fields that have the Timestamp data type.
  • Type: string
Trace file
Specify the full path to a file to contain trace information from the parser for delimited files. Because writing to a trace file requires additional processing, specify a value for this property only during job development.
  • Type: string
Avro settings
Avro configuration properties
  • Type: category
Output as JSON
Specify if each rows in the avro file should be exported as JSON to a string column.
  • Type: boolean
  • Default: false
Avro settings
Avro configuration properties
  • Type: category
Use schema file
Specify if you would like to provide the Avro schema using a schema file. It is recommended to use No for primitive datatypes and Yes for complex datatypes.
  • Type: boolean
  • Default: false
Input as JSON
Specify if each rows in avro file should be imported from a JSON string.
  • Type: boolean
  • Default: false
Avro schema file
Specify the fully qualified path for a JSON file that defines the schema for the Avro file.
  • Type: string
Array keys
If the file format is Avro in a target stage, normalization is controlled via array keys.
  • Type: string
Avro compression codec
Specify the compression algorithm that will be used to compress the data.
  • Type: selection
  • Default: None
  • Values:
    • None
    • Deflate
    • Snappy
    • Bzip2
Parquet settings
Parquet configuration properties
  • Type: category
Temporary staging area
Specify a directory on the engine tier with write permission for the user running the job. This directory will be used to create the temporary files during the job run.
  • Type: string
Parquet settings
Parquet configuration properties
  • Type: category
Temporary staging area
Specify a directory on the engine tier with write permission for the user running the job. This directory will be used to create the temporary files during the job run.
  • Type: string
Block size
Specify the blocksize, default is 100000
  • Type: integer
  • Default: 10000000
Page size
Specify the pagesize, default is 10000
  • Type: integer
  • Default: 10000
Compression type
Specify compression mechanism
  • Type: selection
  • Default: SNAPPY
  • Values:
    • NONE
    • SNAPPY
    • GZIP
    • LZO
ORC Settings
ORC configuration properties
  • Type: category
Temporary staging area
Specify a directory on the engine tier with write permission for the user running the job. This directory will be used to create the temporary files during the job run.
  • Type: string
ORC Settings
ORC configuration properties
  • Type: category
Temporary staging area
Specify a directory on the engine tier with write permission for the user running the job. This directory will be used to create the temporary files during the job run.
  • Type: string
Stripe Size
Stripe Size
  • Type: integer
  • Default: 100000
Buffer Size
Buffer Size
  • Type: integer
  • Default: 10000
Compression Kind
Specify Compression mechanism
  • Type: selection
  • Default: SNAPPY
  • Values:
    • NONE
    • ZLIB
    • SNAPPY
File attributes
File attributes for the S3 files.
  • Type: category
User metadata
Specify metadata in a list of name-value pairs. Separate each name-value pair with a semicolon, for example, Topic=News;SubTopic=Sports. All characters that you specify must be in the US-ASCII character set.
  • Type: string
Server-side encryption
  • Type: selection
  • Default: None
  • Values:
    • None
    • AES-256
    • AWS KMS
Storage class
Specify the storage class for the file. The reduced redundancy storage class provides less redundancy for files than the standard class. For more information, see the Amazon S3 documentation.
  • Type: selection
  • Default: Standard
  • Values:
    • Standard
    • Reduced redundancy
Content type
Specify the content type of the file to write, for example, text/xml or application/x-www-form-urlencoded; charset=utf-8.
  • Type: string
Define lifecycle rules
Specify whether you want to define one or more rules for when a file is set to expire or be archived.
  • Type: boolean
  • Default: false
For more information, see the Lifecycle rules topic.
Rule scope
Specify whether to apply the rule to the file only or to all of the files in the folder that contains the file. If the connector runs in parallel, this property is ignored, and the rule is applied to all of the files in the folder.
  • Type: selection
  • Default: File
  • Values:
    • File
    • Folder
Time period format
Specify whether the lifecycle rule is based on the number of days from the date that the file is created or based on a specific date.
  • Type: selection
  • Default: Days from creation date
  • Values:
    • Days from creation date
    • Specific date
Expiration
Specify whether you want the file to expire. When a file expires, it is deleted from Amazon Web Services. You can specify the date when the file is set to expire or the number of days that the file will exist in Amazon Web Services before it is set to expire.
  • Type: boolean
  • Default: false
Duration
Specify the number of days that the file will exist in Amazon Web Services before it expires.
  • Type: integer
  • Minimum: 1
Expiration date
Specify the date when the file is set to expire in the format "YYYY-MM-DD".
  • Type: string
Archive
Specify whether to archive the file in Amazon Glacier. You can specify the date when the file is set to be archived or the number of days before the file is set to be archived.
  • Type: boolean
  • Default: false
Duration
Specify the number of days that the file will exist in Amazon S3 before it is set to be archived in Amazon Glacier.
  • Type: integer
  • Minimum: 0
Date to archive
Specify the date when the file is set to be archived in Amazon Glacier in the format "YYYY-MM-DD".
  • Type: string
Interval for progress messages
Specify the amount of data in MB that the connector writes to Amazon S3 before the connector writes a progress message to the job log. For example, if the interval is 20 MB, the connector writes a progress message to the log after the connector writes 20 MB of data, 40 MB of data, and so on. If you do not specify an interval, progress messages are not written.
  • Type: integer
  • Minimum: 1
  • Maximum: 2000
Number of parallel writers
Specify the number of writers that will write parts of the file at the same time.
  • Type: integer
  • Default: 5
  • Minimum: 1
Java
  • Type: category
Heap size (MB)
Specify the heap size for the Java Virtual Machine (JVM). Normally, the JVM default can be used, but if insufficient heap size errors occur, you might need to increase the size.
  • Type: integer
  • Minimum: 128
  • Maximum: 2047