Amazon S3 connector (DataStage)
Use the Amazon S3 connector to connect to the Amazon Simple Storage Service (S3) and do various read and write functions.
Prerequisite
Create the connection. For instructions, see Connecting to a data source in DataStage® and the Amazon S3 connection.
DataStage properties
In the Stage tab Properties section, select Use DataStage properties to access properties that are specific for DataStage. These properties provide more features and granular control of the flow execution, similar to the "optimized" connectors.
If you select Use DataStage properties and the file is CSV format, the column values must have double quotation marks around them. If any customization is needed, use the connector's File format properties to change the file's format to Delimited. Then, select the field delimiter, row delimiter, quote character, and escape character.
Configuring the Amazon S3 connector as a source
The available properties for the Read mode depend on whether you select Use DataStage properties.
Read mode | Procedure |
---|---|
Read single file | Specify the bucket name that contains the file, and then specify the name of the file to read. |
Read multiple files |
|
List buckets | No additional configuration is required. |
List files |
|
Configure the read process for when you clear Use DataStage properties.
Read mode | Procedure |
---|---|
Read a single file | Specify the bucket name that contains the file, and then specify the name of the file to read. |
Read binary data | Specify the bucket name that contains the file, and then specify the name of the file to read. |
Read binary data from multiple files using wildcards | Specify a wildcard character in the file name for binary data. For example, File
name: test.*.gz If you use this option, you can read multiple binary files one after another, and each file will be read as a record. If you select Read a file to a row, you must provide two column names in the Output tab of the source stage:
|
Read multiple files using regex expression | Specify the bucket name that contains the files. You can use a Java regex expression for the
file name. Examples:
|
Read multiple files using wildcards | Specify an asterisk (*) to match zero or more characters. For example, specify
*.txt to match all files with the .txt extension. Specify a question mark (?) to match one character. Examples:
|