S3 CSV parsing

Use this information to get to know about how CSV parses S3-objects.

You can define the CSV definitions with input serialization uses these default values:

Use {\n} for row-delimiter.
Use {“} for quote.
Use {\} for escape characters.

The csv-header-info is parsed upon USE appearing in the AWS-CLI; this is the first row in the input object containing the schema. Currently, output serialization and compression-type is not supported. The S3 select engine has a CSV parser which parses S3-objects:

Each row ends with a row-delimiter.
The field-separator separates the adjacent columns.
The successive field separator defines the NULL column.
The quote-character overrides the field-separator; that is, the field separator is any character between the quotes.
The escape character disables any special character except the row delimiter.

The following are examples of CSV parsing rules:

Table 1. CSV parsing
Feature	Description	Input (Tokens)
`NULL`	Successive field delimiter	`,,1,,2, ==> {null}{null}{1}{null}{2}{null}`
`QUOTE`	The quote character overrides the field delimiter.	`11,22,”a,b,c,d”,last ==> {11}{22}{“a,b,c,d”}{last}`
`Escape`	The escape character overrides the meta-character.	`11,22,str=\”abcd\”\,str2=\”123\”,last ==> {11}{22}{str=”abcd”,str2=”123”}{last}`
`row delimiter`	There is no closed quote; row delimiter is the closing line.	`11,22,a=”str,44,55,66 ==> {11}{22}{a=”str,44,55,66}`
`csv header info`	FileHeaderInfo tag	USE value means each token on the first line is the column-name; IGNORE value means to skip the first line.

Reference

See Amazon’s S3 Select Object Content API for more details.