S3 CSV parsing
Use this information to get to know about how CSV parses S3-objects.
You can define the CSV definitions with input serialization uses these default values:
-
Use
{\n}for row-delimiter. -
Use
{“}for quote. -
Use
{\}for escape characters.
The csv-header-info is parsed upon USE appearing in the
AWS-CLI; this is the first row in the input object containing the schema. Currently, output
serialization and compression-type is not supported. The S3 select engine has a CSV parser which
parses S3-objects:
-
Each row ends with a row-delimiter.
-
The field-separator separates the adjacent columns.
-
The successive field separator defines the
NULLcolumn. -
The quote-character overrides the field-separator; that is, the field separator is any character between the quotes.
-
The escape character disables any special character except the row delimiter.
The following are examples of CSV parsing rules:
| Feature | Description | Input (Tokens) |
|---|---|---|
|
|
Successive field delimiter |
|
|
|
The quote character overrides the field delimiter. |
|
|
|
The escape character overrides the meta-character. |
11,22,str=\”abcd\”\,str2=\”123\”,last ==>
{11}{22}{str=”abcd”,str2=”123”}{last} |
|
|
There is no closed quote; row delimiter is the closing line. |
|
|
|
FileHeaderInfo tag |
USE value means each token on the first line is the column-name; IGNORE value means to skip the first line. |
Reference
-
See Amazon’s S3 Select Object Content API for more details.