SELECT (to retrieve rows)
Use the SELECT command to retrieve rows from a table or view. It returns the rows that satisfy the criteria you specify, including the WHERE condition. If you omit the WHERE condition, the SELECT command chooses from all rows.
- You can include * in the output list as a shorthand way of indicating all columns of selected
rows. You can also indicate
<table>.*as shorthand for the columns that come from a specific table. Query a table describes functions that you can use within a SELECT command. - You can use the keyword distinct to eliminate duplicate rows from the result. The all keyword (the default) returns all candidate rows, including duplicates.
Syntax
SELECT [ DISTINCT | ALL ] [ * | <col> [ AS <output_name> ]
<expression> [ AS <output_name> ] [,<expression> [ AS <output_name> ]…]
[ FROM <from_item>[,<from_item>…] ]
[ WHERE <condition> ]
[ GROUP BY <expression>[,<expression>…] ]
[ HAVING <condition>[,<condition>…] ]
[ { UNION | INTERSECT | EXCEPT | MINUS }[ DISTINCT | ALL ]
SELECT
[ ORDER BY <expression> [ ASC | DESC | USING <operator>
[NULLS {FIRST | LAST}][, …] ]
[ LIMIT { <count> | ALL } ]<table>
[ [ AS ] <alias> [ ( <column_alias_list> ) ] ] |
[ FOR SYSTEM_TIME
{ AS OF <timestamp-expression> |
BEFORE <timestamp-expr> |
BETWEEN <timestamp-expression1> AND <timestamp-expression2> |
FROM <timestamp-expression1> TO <timestamp-expression2> } ] |
( <select> ) [ AS ] <alias> [ ( <column_alias_list> ) ] |
<from_item> [ NATURAL ] <join_type> <from_item>
[ ON <join_condition> | USING ( <join_column_list> ) ]For more information about FOR SYSTEM_TIME, see Time travel query syntax and timestamps.
Inputs
The SELECT command takes the following inputs:
| Input | Description |
|---|---|
| alias | Specifies a substitute name for the preceding table name. Use an alias for brevity or to eliminate ambiguity for self-joins (where the same table is scanned multiple times). If you write an alias, you can also write a column alias list to provide substitute names for one or more columns of the table. |
| <col> | The name of a column. |
| <output_name> | The name the column is to be given in the output. You typically use this name
to label a column for display. You can also use it to refer to the column value in ORDER BY and
GROUP BY clauses. You cannot use this name in the WHERE or HAVING clauses; write out the expression instead. |
| <expression> | The name of a table column or an expression. |
| FROM | The FROM clause specifies one or more source tables for the SELECT command. If
you specify multiple sources, the result is the Cartesian product of all the rows in all the
sources. Usually, though, you add qualifying conditions to restrict the rows the system returns to
few the Cartesian product. You can parenthesize a subselect command within a FROM clause. Using a subselect command is the only way to get multiple levels of grouping, aggregation, or sorting in a single query. You must specify an alias for the subselect command. A FROM item can be a join clause, which combines two simpler from items. Use parentheses, if necessary, to determine the order of nesting. |
| <from_item> | A table reference, subselect, or join clause. For more information about FOR SYSTEM_TIME, see Time travel query syntax and timestamps. |
| WHERE | This clause has the following general
form:The boolean_expr portion of the clause can consist
of any expression that produces a Boolean value. In many cases, you use the expression as follows:
where:
|
| GROUP BY | Use this clause to divide a table into groups of rows that match one or more
values. The clause specifies a grouped table that is derived by applying the
clause:The GROUP BY clause condenses, into a single row, all selected rows that share values for the grouped columns. The system computes aggregate functions across all rows that make up each group, producing a separate value for each group (whereas without GROUP BY, an aggregate produces a single value that is computed across all the selected rows). When you include the GROUP BY clause, the SELECT command output expression or expressions cannot refer to ungrouped columns except within aggregate functions because there would be more than one possible value to return for an ungrouped column. A group
by value can be:
|
| HAVING | The optional HAVING clause has the general
form:where boolean_expr is the same as specified for the
where clause.The HAVING clause specifies a grouped table that is derived by the elimination of rows that do not satisfy the boolean_expr. The HAVING clause is different from the WHERE clause:
Each column that is referenced in boolean_expr must unambiguously reference a grouping column, unless the reference displays within an aggregate function. In a grouping select, the HAVING clause can only reference expressions that are single-valued within a group. That is, you can only reference group fields, aggregates, or single-valued expressions that are derived from group fields or aggregates (which must include constants). For example, to return grp and counts of
grps with more than four
members:
In a non-grouping select where the select is conceptually grouped by zero group fields, you can only reference aggregates or expressions that are single-valued. For example to return no rows if
there are four or fewer employees in emp, or one row with the count if there are more than four
employees in emp:
|
| <condition> | A Boolean expression that gives a result of true or false. |
| UNION | Operator causes the system to compute the collection of rows that are returned
by the queries. Eliminates duplicate rows unless you specify the ALL
keyword.where table_query specifies any select expression without an ORDER BY, FOR UPDATE, or
LIMIT clause.If you enclose a subexpression in parentheses, you can include ORDER BY and LIMIT clauses. If you do not include parentheses, the clauses are taken to apply to the result of the union, not to its right input expression. The UNION operator computes the collection (set union) of the rows that are returned by the queries involved. The two selects that represent the direct operands of the union must produce the same number of columns, and corresponding columns must be of compatible data types. The result of UNION does not contain any duplicate rows unless you specify the ALL option. The ALL option prevents elimination of duplicates. Multiple UNION operators in the same SELECT command are evaluated left to right, unless you indicate otherwise by using parentheses. |
| INTERSECT | Combines the results of two queries into a single result that comprises all the rows common to both queries. For more information, see INTERSECT operation. |
| EXCEPT or MINUS | Finds the difference between the two queries and the result comprises the rows that belong only to the first query. For more information, see EXCEPT operation. |
| ORDER BY | Use this clause to sort returned rows in the order that you specify. An ORDER
BY clause can be:
A limitation of this feature is that an ORDER BY clause that applies to the result of a union query might only specify an output column name or number, not an expression. If an ORDER BY clause is a simple name that matches both a result column name and an input column name, ORDER BY interprets it as the result column name. This is the opposite of the choice that the GROUP BY clause makes in the same situation. You can specify ascending or descending sort order:
You can specify a sorting order for null data:
|
| LIMIT | Return a subset of the rows that are produced by your query:
When you specify the LIMIT parameter, also specify the ORDER BY parameter. Otherwise, the subset of the rows is returned is indeterminate. |
| <join_column_list> | A join column list. For example, specifying USING (a,b) is equivalent to
specifying ON left_table.a = right_table.a and ON left_table.b =
right_table.b. |
| <join_condition> | A qualification condition. This condition is similar to the WHERE condition except that it only applies to the two from_items being joined in this join clause. |
| <join_type> | One of the following types of join:
|
| <select> | A SELECT command. You must enclose the subselect in parentheses, and provide an alias for
it. When you include a subselect in the FROM clause, the subselect acts as though its output were created as a temporary table during this single select command. |
| <table> | In a FROM clause, the name of a table or view. |
| CROSS JOIN and INNER JOIN | Produces a simple Cartesian product, the same as you would get if you listed
the two items at the top level of FROM. The cross join and inner join on (true) types are equivalent; no rows are removed by qualification. These join types are a notational convenience. You can accomplish the same results by using the FROM and WHERE clauses. |
| LEFT OUTER JOIN | Returns all rows in the qualified Cartesian product (that is, all combined
rows that pass its on condition), plus one copy of each row in the left table for which there was no
right row that passed the on condition. The system considers only the own on or using condition of the join to determine those rows that have matches. It applies OUTER ON or WHERE conditions afterward. |
| RIGHT OUTER JOIN | Returns all the joined rows, plus one row for each unmatched right row
(extended with nulls on the left). This is just a notational convenience. You can convert it to a left outer join by switching the left and right inputs. |
| FULL OUTER JOIN | Returns all the joined rows, plus one row for each unmatched left row
(extended with nulls on the right), plus one row for each unmatched right row (extended with nulls
on the left). For all the join types except cross join, you must write exactly one of:
|
Outputs
The SELECT command has the following outputs:
| Output | Description |
|---|---|
| ROWS | Returns the complete set of rows that result from the query. |
| COUNT | Returns the number of rows that are returned by the query. |
Privileges
You must be the admin user, the table owner, the owner of the database or schema where the table is defined, or your account must have Select privileges on the table or the Table object class (or the External Table class, if the table being selected from is an external table).
Usage
- Join the table
filmswith the tabledistributors:MYDB.SCH1(USER)=> SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d, films f WHERE f.did = d.did; title |did |name |date_prod |kind --------------------------+-----+------------------+-----------+------- The Third Man |101 |British Lion |1949-12-23 |Drama The African Queen |101 |British Lion |1951-08-11 |Romantic Une Femme est une Femme |102 |Jean Luc Godard |1961-03-12 |Romantic Vertigo |103 |Paramount |1958-11-14 |Action Becket |103 |Paramount |1964-02-03 |Drama 48 Hours |103 |Paramount |1982-10-22 |Action War and Peace |104 |Mosfilm |1967-02-12 |Drama West Side Story |105 |United Artists |1961-01-03 |Musical Bananas |105 |United Artists |1971-07-13 |Comedy Yojimbo |106 |Toho |1961-06-16 |Drama There's a Girl in my Soup |107 |Columbia |1970-06-11 |Comedy Taxi Driver |107 |Columbia |1975-05-15 |Action Absence of Malice |107 |Columbia |1981-11-15 |Action Storia di una donna |108 |Westward |1970-08-15 |Romantic The King and I |109 |20th Century Fox |1956-08-11 |Musical Das Boot |110 |Bavaria Atelier |1981-11-11 |Drama - Sum the column
lenof all films and group the results by kind:MYDB.SCH1(USER)=> SELECT kind, SUM(len) AS total FROM films GROUP BY kind; kind |total ---------+------- Action |07:34 Comedy |02:58 Drama |14:28 Musical |06:42 Romantic |04:38 - Sum the column
lenof all films, group the results by kind, and display those group totals that are less than 5 hours:MYDB.SCH1(USER)=> SELECT kind, SUM(len) AS total FROM films GROUP BY kind HAVING SUM(len) < INTERVAL '5 HOUR'; kind | total ---------+------ Comedy | 02:58 Romantic | 04:38 - Sort the contents of the table
distributorsaccording to the contents of the second column (name):MYDB.SCH1(USER)=> SELECT * FROM distributors ORDER BY name; or MYDB.SCH1(USER)=> SELECT * FROM distributors ORDER BY 2; did | name ------+----------------- 109 | 20th Century Fox 110 | Bavaria Atelier 101 | British Lion 107 | Columbia 102 | Jean Luc Godard 113 | Luso Films 104 | Mosfilm 103 | Paramount 106 | Toho 105 | United Artists 111 | Walt Disney 112 | Warner Bros. 108 | Westward - Obtain the union of the tables
distributorsandactors, and restrict the results to those that begin with letter W:MYDB.SCH1(USER)=> SELECT distributors.name FROM distributors WHERE distributors.name LIKE 'W%'; UNION SELECT actors.name FROM actors WHERE actors.name LIKE 'W%'; Walt Disney Walter Matthau Warner Bros. Warren Beatty Westward Woody Allen
Because only distinct rows are wanted, the ALL keyword is omitted.
watsonx.data considerations
- Apache Iceberg tables do not have a maximum character length for the string
data type, which causes a discrepancy with the way Netezza Performance Server treats text types. In order to
compensate for this, Netezza will treat all string columns as having a length of 512 characters. If
your data has a string who's length is greater than 512 characters, the query will fail with an
overflow error.Workaround:
- Set the size of
DATALAKE_STRING_SIZEvariable to a higher value to override the default character limit of 512. - Alternatively, you can manually cast the string column to a Netezza
NVARCHAR type with the required size. For example, to cast to 4096 characters instead, run:
If any data during such a cast still exceeds 4Kb of allocated space, the rest of the data is truncated to the specified size.SELECT stringCol::NVARCHAR(4096) FROM ...
You also must take the following aspects into account:- The
DATALAKE_STRING_SIZEsession variable is applied to all columns of type string in a table. - If you want to treat one column differently from another, you can cast each column to a different size.
- The cast is possible only to NVARCHAR() type.
- The range of
DATALAKE_STRING_SIZEsession variable is[1, 16000]. - A similar behavior is also applicable to binary data type, which uses its
own designated
DATALAKE_BINARY_SIZEenvironment variable with the same default size of 512 bytes and effective range of[1, 64000].
- Set the size of
- If there are schema names in Hive Metastore that are NPS keywords, you cannot use those keywords as identifiers when using the schema. For more information, see Schema names in Hive Metastore (HMS) that are NPS reserved keywords.