Select content from an object

A SELECT command filters the contents of an object using simple structured query language (SQL) statement. SELECT command to filter object content is only supported on structured objects which are formatted as either JSON, CSV or Apache Parquet. The filtered object content will be streamed as records and the output format supported are JSON or CSV.

SELECT command on objects will only be supported for users who have the s3:GetObject permission on the object. SELECT command will not be supported using anonymous access.

Syntax

POST https://{endpoint}/{object-name}?select&select-type=2 # path style
POST https://{bucket-name}.{endpoint}/{object-name}?select&select-type=2 # virtual host style

Payload elements

The body of the request must contain an XML block with the following schema:
Name Type Description Children Ancestor Constraint
SelectObjectContentRequest Container Root level tag for the Select Object parameters

Expression, ExpressionType, RequestProgress,InputSerialization, OutputSerialization, ScanRange

None Limit 1
Expression String The expression used to query and filter the object contents. None SelectObjectContentRequest Required
ExpressionType String Type of expression provided to query object contents (ex: SQL) None SelectObjectContentRequest Required
RequestProgress Container Specifies if periodic request progress information should be enabled Enabled SelectObjectContentRequest Limit 1
Enabled Boolean Specifies whether periodic progress frames should be sent to client None RequestProgress Limit 1.
InputSerialization Container Describes the format the object that is being queried

CompressionType,

CSV,

JSON,

Apache Parquet

SelectObjectContentRequest Limit 1.
CompressionType String Specifies if the input object is compressed and the corresponding compression format None InputSerialization Must be either NONE, GZIP or BZIP2
CSV Container Describes the serialization of a CSV-encoded object

AllowQuotedRecordDelimiter,Comment, FieldDelimiter,FileHeaderInfo, QuoteCharacter,QuoteEscapeCharacter, RecordDelimiter

InputSerialization Limit 1
AllowQuotedRecordDelimiter Boolean Specifies that CSV field values may contain quoted record delimiters and such records should be allowed. Default value is FALSE. Setting this value to TRUE may lower performance. None CSV
Comment String A single character used to indicate that a row should be ignored when the character is present at the start of that row. You can specify any character to indicate a comment line. The default character is #. None CSV
FieldDelimiter String A single character used to separate individual fields in a record. You can specify an arbitrary delimiter. The default character is , . None CSV
FileHeaderInfo String

Describes the first line of input. Valid values are:

  • NONE: First line is not a header.
  • IGNORE: First line is a header, but you can't use the header values to indicate the column in an expression. You can use column position (such as _1, _2, …) to indicate the column
  • (SELECT s._1 FROM OBJECT s).Use: First line is a header, and you can use the header value to identify a column in an expression .
  • (SELECT "name" FROM OBJECT).

The default value is NONE

None CSV Valid Values: USE (or) IGNORE (or) NONE
QuoteCharacter String A single character used for escaping when the field delimiter is part of the value. For example, if the value is a,b, Amazon S3 wraps this field value in quotation marks, as follows: " a , b". The default character is ". None CSV
QuoteEscapeCharacter String A single character used for escaping the quotation mark character inside an already escaped value. For example, the value """ a , b""" is parsed as " a , b". The default character is ". None CSV An empty string/zero characters is also accepted
RecordDelimiter String A single or two characters (up to 4 bytes) used to separate individual records in the input. Instead of the default value, you can specify an arbitrary delimiter. The default value is \n None CSV
JSON Container Specifies JSON as object’s input serialization format Type InputSerialization Limit 1
Type String JSON Type None JSON Valid Values are DOCUMENT (or) LINES
Parquet Container Specifies Parquet as object’s input serialization format None InputSerialization Limit 1
OutputSerialization Container Describes the format of the data to be returned in response. CSV, JSON SelectObjectContentRequest Limit 1
CSV Container Describes the serialization of CSV-encoded Select results. FieldDelimiter, QuoteCharacter, QuoteEscapeCharacter, QuoteFields, RecordDelimiter OutputSerialization Limit 1
JSON Container Specifies JSON as request's output serialization format. RecordDelimiter OutputSerialization Limit 1
ScanRange Container

Specifies the byte range of the object to get the records from. A record is processed when its first byte is contained by the range. This parameter is optional, but when specified, it must not be empty. See RFC 2616, Section 14.35.1 about how to specify the start and end of the range.

ScanRange may be used in the following ways:

  • <scanrange><start>50</start><end>100</end></scanrange> - process only the records starting between the bytes 50 and 100 (inclusive, counting from zero).
  • <scanrange><start>50</start></scanrange>- process only the records starting after the byte 50.
  • <scanrange><end>50</end></scanrange> - process only the records within the last 50 bytes of the file.
Start, End SelectObjectContentRequest Limit 1
Start Long Specifies the start of the byte range. This parameter is optional. The default value is 0. If only start is supplied, it means scan from that point to the end of the file. For example, <scanrange><start>50</start></scanrange> means scan from byte 50 until the end of the file. None ScanRange Valid values: non-negative integers.
End Long Specifies the end of the byte range. This parameter is optional. The default value is one less than the size of the object being queried. If only the End parameter is supplied, it is interpreted to mean scan the last N bytes of the file. For example, <scanrange><end>50</end></scanrange> means scan the last 50 bytes. None ScanRange Valid values: non-negative integers
<SelectObjectContentRequest>
   <Expression>String</Expression>
   <ExpressionType>String</ExpressionType>
   <RequestProgress>
      <Enabled>boolean</Enabled>
   </RequestProgress>
   <InputSerialization>
      <CompressionType>String</CompressionType>
      <CSV>
         <AllowQuotedRecordDelimiter>boolean</AllowQuotedRecordDelimiter>
         <Comments>String</Comments>
         <FieldDelimiter>String</FieldDelimiter>
         <FileHeaderInfo>String</FileHeaderInfo>
         <QuoteCharacter>String</QuoteCharacter>
         <QuoteEscapeCharacter>String</QuoteEscapeCharacter>
         <RecordDelimiter>String</RecordDelimiter>
      </CSV>
      <JSON>
         <Type>String</Type>
      </JSON>
      <Parquet>
      </Parquet>
   </InputSerialization>
   <OutputSerialization>
      <CSV>
         <FieldDelimiter>String</FieldDelimiter>
         <QuoteCharacter>String</QuoteCharacter>
         <QuoteEscapeCharacter>String</QuoteEscapeCharacter>
         <QuoteFields>String</QuoteFields>
         <RecordDelimiter>String</RecordDelimiter>
      </CSV>
      <JSON>
         <RecordDelimiter>String</RecordDelimiter>
      </JSON>
   </OutputSerialization>
   <ScanRange>
      <End>long</End>
      <Start>long</Start>
   </ScanRange>
</SelectObjectContentRequest>

Sample request

POST /test-select/f1855.data.1.json.original?select=null&select-type=2 HTTP/1.1
Host: 10.141.19.55
Content-Type: text/plain
X-Amz-Content-Sha256: beaead3198f7da1e70d03ab969765e0821b24fc913697e929e726aeaebf0eba3
X-Amz-Date: 20230622T162947Z
Authorization: AWS4-HMAC-SHA256 Credential=SgckncipJn85yWAyr24u/20230622/us-west-2/s3/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-content-sha256;x-amz-date, Signature=2972260823e74a399334286dde50c0e26b40a7626977befc440930c5645b21ae
Content-Length: 378

<SelectObjectContentRequest>
   <Expression>SELECT * FROM S3Object as s</Expression>
   <ExpressionType>SQL</ExpressionType>
   <InputSerialization>
      <CompressionType>NONE</CompressionType>
      <JSON>
         <Type>DOCUMENT</Type>
      </JSON>
   </InputSerialization>
   <OutputSerialization>
      <JSON></JSON>
   </OutputSerialization>
</SelectObjectContentRequest>

Sample response

HTTP/1.1 200 OK
Date: Thu, 22 Jun 2023 16:29:47 GMT
X-Clv-Request-Id: 1d7c9397-0588-4717-a425-e00ae8cda02b
Server: Cleversafe/3.17.5.f1855-s3-select-7
X-Clv-S3-Version: 2.5
x-amz-request-id: 1d7c9397-0588-4717-a425-e00ae8cda02b
Transfer-Encoding: chunked