Select content from an object
A SELECT command filters the contents of an object using simple
structured query language (SQL) statement. SELECT command to filter object content
is only supported on structured objects which are formatted as either JSON, CSV or Apache Parquet.
The filtered object content will be streamed as records and the output format supported are JSON or
CSV.
SELECT command on objects will only be supported for users who have the
s3:GetObject permission on the object. SELECT command will not
be supported using anonymous access.
Syntax
POST https://{endpoint}/{object-name}?select&select-type=2 # path style
POST https://{bucket-name}.{endpoint}/{object-name}?select&select-type=2 # virtual host style
Payload elements
| Name | Type | Description | Children | Ancestor | Constraint |
|---|---|---|---|---|---|
| SelectObjectContentRequest | Container | Root level tag for the Select Object parameters |
Expression, ExpressionType, RequestProgress,InputSerialization, OutputSerialization, ScanRange |
None | Limit 1 |
| Expression | String | The expression used to query and filter the object contents. | None | SelectObjectContentRequest | Required |
| ExpressionType | String | Type of expression provided to query object contents (ex: SQL) | None | SelectObjectContentRequest | Required |
| RequestProgress | Container | Specifies if periodic request progress information should be enabled | Enabled | SelectObjectContentRequest | Limit 1 |
| Enabled | Boolean | Specifies whether periodic progress frames should be sent to client | None | RequestProgress | Limit 1. |
| InputSerialization | Container | Describes the format the object that is being queried |
CompressionType, CSV, JSON, Apache Parquet |
SelectObjectContentRequest | Limit 1. |
| CompressionType | String | Specifies if the input object is compressed and the corresponding compression format | None | InputSerialization | Must be either NONE, GZIP or BZIP2 |
| CSV | Container | Describes the serialization of a CSV-encoded object |
AllowQuotedRecordDelimiter,Comment, FieldDelimiter,FileHeaderInfo, QuoteCharacter,QuoteEscapeCharacter, RecordDelimiter |
InputSerialization | Limit 1 |
| AllowQuotedRecordDelimiter | Boolean | Specifies that CSV field values may contain quoted record delimiters and such records should be allowed. Default value is FALSE. Setting this value to TRUE may lower performance. | None | CSV | |
| Comment | String | A single character used to indicate that a row should be ignored when the character is
present at the start of that row. You can specify any character to indicate a comment line. The
default character is #. |
None | CSV | |
| FieldDelimiter | String | A single character used to separate individual fields in a record. You can specify an arbitrary delimiter. The default character is , . | None | CSV | |
| FileHeaderInfo | String |
Describes the first line of input. Valid values are:
The default value is NONE |
None | CSV | Valid Values: USE (or) IGNORE (or) NONE |
| QuoteCharacter | String | A single character used for escaping when the field delimiter is part of the value. For example, if the value is a,b, Amazon S3 wraps this field value in quotation marks, as follows: " a , b". The default character is ". | None | CSV | |
| QuoteEscapeCharacter | String | A single character used for escaping the quotation mark character inside an already escaped value. For example, the value """ a , b""" is parsed as " a , b". The default character is ". | None | CSV | An empty string/zero characters is also accepted |
| RecordDelimiter | String | A single or two characters (up to 4 bytes) used to separate individual records in the input. Instead of the default value, you can specify an arbitrary delimiter. The default value is \n | None | CSV | |
| JSON | Container | Specifies JSON as object’s input serialization format | Type | InputSerialization | Limit 1 |
| Type | String | JSON Type | None | JSON | Valid Values are DOCUMENT (or) LINES |
| Parquet | Container | Specifies Parquet as object’s input serialization format | None | InputSerialization | Limit 1 |
| OutputSerialization | Container | Describes the format of the data to be returned in response. | CSV, JSON | SelectObjectContentRequest | Limit 1 |
| CSV | Container | Describes the serialization of CSV-encoded Select results. | FieldDelimiter, QuoteCharacter, QuoteEscapeCharacter, QuoteFields, RecordDelimiter | OutputSerialization | Limit 1 |
| JSON | Container | Specifies JSON as request's output serialization format. | RecordDelimiter | OutputSerialization | Limit 1 |
| ScanRange | Container |
Specifies the byte range of the object to get the records from. A record is processed when its first byte is contained by the range. This parameter is optional, but when specified, it must not be empty. See RFC 2616, Section 14.35.1 about how to specify the start and end of the range.
|
Start, End | SelectObjectContentRequest | Limit 1 |
| Start | Long | Specifies the start of the byte range. This parameter is optional. The default value is 0. If
only start is supplied, it means scan from that point to the end of the file. For
example, <scanrange><start>50</start></scanrange> means scan from byte
50 until the end of the file. |
None | ScanRange | Valid values: non-negative integers. |
| End | Long | Specifies the end of the byte range. This parameter is optional. The default value is one
less than the size of the object being queried. If only the End parameter is supplied, it is
interpreted to mean scan the last N bytes of the file. For example,
<scanrange><end>50</end></scanrange> means scan the last 50
bytes. |
None | ScanRange | Valid values: non-negative integers |
<SelectObjectContentRequest>
<Expression>String</Expression>
<ExpressionType>String</ExpressionType>
<RequestProgress>
<Enabled>boolean</Enabled>
</RequestProgress>
<InputSerialization>
<CompressionType>String</CompressionType>
<CSV>
<AllowQuotedRecordDelimiter>boolean</AllowQuotedRecordDelimiter>
<Comments>String</Comments>
<FieldDelimiter>String</FieldDelimiter>
<FileHeaderInfo>String</FileHeaderInfo>
<QuoteCharacter>String</QuoteCharacter>
<QuoteEscapeCharacter>String</QuoteEscapeCharacter>
<RecordDelimiter>String</RecordDelimiter>
</CSV>
<JSON>
<Type>String</Type>
</JSON>
<Parquet>
</Parquet>
</InputSerialization>
<OutputSerialization>
<CSV>
<FieldDelimiter>String</FieldDelimiter>
<QuoteCharacter>String</QuoteCharacter>
<QuoteEscapeCharacter>String</QuoteEscapeCharacter>
<QuoteFields>String</QuoteFields>
<RecordDelimiter>String</RecordDelimiter>
</CSV>
<JSON>
<RecordDelimiter>String</RecordDelimiter>
</JSON>
</OutputSerialization>
<ScanRange>
<End>long</End>
<Start>long</Start>
</ScanRange>
</SelectObjectContentRequest>
Sample request
POST /test-select/f1855.data.1.json.original?select=null&select-type=2 HTTP/1.1
Host: 10.141.19.55
Content-Type: text/plain
X-Amz-Content-Sha256: beaead3198f7da1e70d03ab969765e0821b24fc913697e929e726aeaebf0eba3
X-Amz-Date: 20230622T162947Z
Authorization: AWS4-HMAC-SHA256 Credential=SgckncipJn85yWAyr24u/20230622/us-west-2/s3/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-content-sha256;x-amz-date, Signature=2972260823e74a399334286dde50c0e26b40a7626977befc440930c5645b21ae
Content-Length: 378
<SelectObjectContentRequest>
<Expression>SELECT * FROM S3Object as s</Expression>
<ExpressionType>SQL</ExpressionType>
<InputSerialization>
<CompressionType>NONE</CompressionType>
<JSON>
<Type>DOCUMENT</Type>
</JSON>
</InputSerialization>
<OutputSerialization>
<JSON></JSON>
</OutputSerialization>
</SelectObjectContentRequest>
Sample response
HTTP/1.1 200 OK
Date: Thu, 22 Jun 2023 16:29:47 GMT
X-Clv-Request-Id: 1d7c9397-0588-4717-a425-e00ae8cda02b
Server: Cleversafe/3.17.5.f1855-s3-select-7
X-Clv-S3-Version: 2.5
x-amz-request-id: 1d7c9397-0588-4717-a425-e00ae8cda02b
Transfer-Encoding: chunked