S3 CSV parsing

Use this information to get to know about how CSV parses S3-objects.

You can define the CSV definitions with input serialization uses these default values:

  • Use {\n} for row-delimiter.

  • Use {“} for quote.

  • Use {\} for escape characters.

The csv-header-info is parsed upon USE appearing in the AWS-CLI; this is the first row in the input object containing the schema. Currently, output serialization and compression-type is not supported. The S3 select engine has a CSV parser which parses S3-objects:

  • Each row ends with a row-delimiter.

  • The field-separator separates the adjacent columns.

  • The successive field separator defines the NULL column.

  • The quote-character overrides the field-separator; that is, the field separator is any character between the quotes.

  • The escape character disables any special character except the row delimiter.

The following are examples of CSV parsing rules:

Table 1. CSV parsing
Feature Description Input (Tokens)

NULL

Successive field delimiter

,,1,,2, ==> {null}{null}{1}{null}{2}{null}

QUOTE

The quote character overrides the field delimiter.

11,22,”a,b,c,d”,last ==> {11}{22}{“a,b,c,d”}{last}

Escape

The escape character overrides the meta-character.

11,22,str=\”abcd\”\,str2=\”123\”,last ==> {11}{22}{str=”abcd”,str2=”123”}{last}

row delimiter

There is no closed quote; row delimiter is the closing line.

11,22,a=”str,44,55,66 ==> {11}{22}{a=”str,44,55,66}

csv header info

FileHeaderInfo tag

USE value means each token on the first line is the column-name; IGNORE value means to skip the first line.