SELECT (to retrieve rows)

Use the SELECT command to retrieve rows from a table or view. It returns the rows that satisfy the criteria you specify, including the WHERE condition. If you omit the WHERE condition, the SELECT command chooses from all rows.

Note: For a description of how to use the SELECT command to start a stored procedure on a Netezza Performance Server host, see the SELECT command.

The system forms output rows by computing the select output expressions for each selected row.

You can include * in the output list as a shorthand way of indicating all columns of selected rows. You can also indicate <table>.* as shorthand for the columns that come from a specific table. Query a table describes functions that you can use within a SELECT command.
You can use the keyword distinct to eliminate duplicate rows from the result. The all keyword (the default) returns all candidate rows, including duplicates.

Syntax

Syntax for using the SELECT command to retrieve rows from a table or view:

SELECT [ DISTINCT | ALL ] [ * | <col> [ AS <output_name> ]
  <expression> [ AS <output_name> ] [,<expression> [ AS <output_name> ]…]
    [ FROM <from_item>[,<from_item>…] ]
    [ WHERE <condition> ]
    [ GROUP BY <expression>[,<expression>…] ]
    [ HAVING <condition>[,<condition>…] ]
    [ { UNION | INTERSECT | EXCEPT | MINUS }[ DISTINCT | ALL ]
SELECT
    [ ORDER BY <expression> [ ASC | DESC | USING <operator>
                          [NULLS {FIRST | LAST}][, …] ]
    [ LIMIT { <count> | ALL } ]

Where <from_item> represents:

<table>
     [ [ AS ] <alias> [ ( <column_alias_list> ) ] ] |
     [ FOR SYSTEM_TIME 
       { AS OF <timestamp-expression> | 
         BEFORE <timestamp-expr> | 
         BETWEEN <timestamp-expression1> AND <timestamp-expression2> | 
         FROM <timestamp-expression1> TO <timestamp-expression2> } ] |
     ( <select> ) [ AS ] <alias> [ ( <column_alias_list> ) ] |
      <from_item> [ NATURAL ] <join_type> <from_item>
         [ ON <join_condition> | USING ( <join_column_list> ) ]

For more information about FOR SYSTEM_TIME, see Time travel query syntax and timestamps.

Inputs

The SELECT command takes the following inputs:

Table 1. SELECT inputs
Input	Description
alias	Specifies a substitute name for the preceding table name. Use an alias for brevity or to eliminate ambiguity for self-joins (where the same table is scanned multiple times). If you write an alias, you can also write a column alias list to provide substitute names for one or more columns of the table.
<col>	The name of a column.
<output_name>	The name the column is to be given in the output. You typically use this name to label a column for display. You can also use it to refer to the column value in ORDER BY and GROUP BY clauses. You cannot use this name in the WHERE or HAVING clauses; write out the expression instead.
<expression>	The name of a table column or an expression.
FROM	The FROM clause specifies one or more source tables for the SELECT command. If you specify multiple sources, the result is the Cartesian product of all the rows in all the sources. Usually, though, you add qualifying conditions to restrict the rows the system returns to few the Cartesian product. You can parenthesize a subselect command within a FROM clause. Using a subselect command is the only way to get multiple levels of grouping, aggregation, or sorting in a single query. You must specify an alias for the subselect command. A FROM item can be a join clause, which combines two simpler from items. Use parentheses, if necessary, to determine the order of nesting.
<from_item>	A table reference, subselect, or join clause. For more information about FOR SYSTEM_TIME, see Time travel query syntax and timestamps.
WHERE	This clause has the following general form: `WHERE boolean_expr` The boolean_expr portion of the clause can consist of any expression that produces a Boolean value. In many cases, you use the expression as follows: `expr cond_op expr or log_op expr` where: cond_op can be one of: =, <, <=, >, >= or <>, a conditional operator like all, any, in, like, or a locally defined operator. log_op can be one of: and, or, not. select ignores all rows for which the WHERE condition does not return true.
GROUP BY	Use this clause to divide a table into groups of rows that match one or more values. The clause specifies a grouped table that is derived by applying the clause: `GROUP BY <expression>[,<expression>…]` The GROUP BY clause condenses, into a single row, all selected rows that share values for the grouped columns. The system computes aggregate functions across all rows that make up each group, producing a separate value for each group (whereas without GROUP BY, an aggregate produces a single value that is computed across all the selected rows). When you include the GROUP BY clause, the SELECT command output expression or expressions cannot refer to ungrouped columns except within aggregate functions because there would be more than one possible value to return for an ungrouped column. A group by value can be: An input column name. The name or ordinal number of an output column (select expression). An arbitrary expression that is formed from input-column values. If there is ambiguity, the system interprets a group by name as an input-column name rather than an output column name.
HAVING	The optional HAVING clause has the general form: `HAVING boolean_expr` where boolean_expr is the same as specified for the where clause. The HAVING clause specifies a grouped table that is derived by the elimination of rows that do not satisfy the boolean_expr. The HAVING clause is different from the WHERE clause: The WHERE clause filters individual rows before application of GROUP BY. The HAVING clause filters group rows that are created by GROUP BY. Each column that is referenced in boolean_expr must unambiguously reference a grouping column, unless the reference displays within an aggregate function. In a grouping select, the HAVING clause can only reference expressions that are single-valued within a group. That is, you can only reference group fields, aggregates, or single-valued expressions that are derived from group fields or aggregates (which must include constants). For example, to return grp and counts of grps with more than four members: `SELECT grp, count(id) AS n FROM emp GROUP BY grp HAVING n > 4` In a non-grouping select where the select is conceptually grouped by zero group fields, you can only reference aggregates or expressions that are single-valued. For example to return no rows if there are four or fewer employees in emp, or one row with the count if there are more than four employees in emp: `SELECT count(id) AS n FROM emp HAVING n > 4`
<condition>	A Boolean expression that gives a result of true or false.
UNION	Operator causes the system to compute the collection of rows that are returned by the queries. Eliminates duplicate rows unless you specify the ALL keyword. `table_query UNION [ ALL ] table_query [ ORDER BY <expression> [ASC\|DESC\|USING operator] [, …] ] [ LIMIT { COUNT \| ALL } ] [ OFFSET start ]` where table_query specifies any select expression without an ORDER BY, FOR UPDATE, or LIMIT clause. If you enclose a subexpression in parentheses, you can include ORDER BY and LIMIT clauses. If you do not include parentheses, the clauses are taken to apply to the result of the union, not to its right input expression. The UNION operator computes the collection (set union) of the rows that are returned by the queries involved. The two selects that represent the direct operands of the union must produce the same number of columns, and corresponding columns must be of compatible data types. The result of UNION does not contain any duplicate rows unless you specify the ALL option. The ALL option prevents elimination of duplicates. Multiple UNION operators in the same SELECT command are evaluated left to right, unless you indicate otherwise by using parentheses.
INTERSECT	Combines the results of two queries into a single result that comprises all the rows common to both queries. For more information, see INTERSECT operation.
EXCEPT or MINUS	Finds the difference between the two queries and the result comprises the rows that belong only to the first query. For more information, see EXCEPT operation.
ORDER BY	Use this clause to sort returned rows in the order that you specify. An ORDER BY clause can be: The name or ordinal number of an output column (select expression). The ordinal number refers to the ordinal (left-to-right) position of the result column. This feature makes it possible to define an ordering based on a column that does not have a name. For example: `SELECT title, date_prod + 1 AS newlen FROM films ORDER BY newlen;` An arbitrary expression that is formed from input-column values, including fields that do not display in the select result list. For example: `SELECT name FROM distributors ORDER BY code;` If there is ambiguity, an ORDER BY name is interpreted as an output-column name. A limitation of this feature is that an ORDER BY clause that applies to the result of a union query might only specify an output column name or number, not an expression. If an ORDER BY clause is a simple name that matches both a result column name and an input column name, ORDER BY interprets it as the result column name. This is the opposite of the choice that the GROUP BY clause makes in the same situation. You can specify ascending or descending sort order: ASC Sort in ascending order. This is equivalent to using <. This is the default DESC Sort in descending order. This is equivalent to using >. You can specify a sorting order for null data: NULLS FIRST Null values come before non-null values. For an ascending sort, nulls display first in the output; for a descending sort, they display last. This is the default NULLS LAST Null values come after non-null values.
LIMIT	Return a subset of the rows that are produced by your query: <count> Return at most the specified number of rows. ALL Return all rows. This is the default. When you specify the LIMIT parameter, also specify the ORDER BY parameter. Otherwise, the subset of the rows is returned is indeterminate.
<join_column_list>	A join column list. For example, specifying `USING (a,b)` is equivalent to specifying `ON left_table.a = right_table.a` and `ON left_table.b = right_table.b`.
<join_condition>	A qualification condition. This condition is similar to the WHERE condition except that it only applies to the two from_items being joined in this join clause.
<join_type>	One of the following types of join: [INNER] JOIN LEFT [OUTER] JOIN RIGHT [OUTER] JOIN FULL [OUTER] JOIN CROSS JOIN For an inner or outer join type, you must include exactly one of the following clauses: NATURAL ON <join_condition> USING (<join_column_list>) Do not include any of these clauses for a cross join.
<select>	A SELECT command. You must enclose the subselect in parentheses, and provide an alias for it. When you include a subselect in the FROM clause, the subselect acts as though its output were created as a temporary table during this single select command.
<table>	In a FROM clause, the name of a table or view.
CROSS JOIN and INNER JOIN	Produces a simple Cartesian product, the same as you would get if you listed the two items at the top level of FROM. The cross join and inner join on (true) types are equivalent; no rows are removed by qualification. These join types are a notational convenience. You can accomplish the same results by using the FROM and WHERE clauses.
LEFT OUTER JOIN	Returns all rows in the qualified Cartesian product (that is, all combined rows that pass its on condition), plus one copy of each row in the left table for which there was no right row that passed the on condition. The system considers only the own on or using condition of the join to determine those rows that have matches. It applies OUTER ON or WHERE conditions afterward.
RIGHT OUTER JOIN	Returns all the joined rows, plus one row for each unmatched right row (extended with nulls on the left). This is just a notational convenience. You can convert it to a left outer join by switching the left and right inputs.
FULL OUTER JOIN	Returns all the joined rows, plus one row for each unmatched left row (extended with nulls on the right), plus one row for each unmatched right row (extended with nulls on the left). For all the join types except cross join, you must write exactly one of: ON <join_condition> This is the most general case. You can write any qualification expression that involves the two tables you want to join. USING <join_column_list> For example, specifying `USING (a,b)` is equivalent to specifying `ON left_table.a = right_table.a` and `ON left_table.b = right_table.b`. The term USING implies that only one of each pair of equivalent columns is to be included in the join output, not both. NATURAL Use a USING list that mentions all similarly named columns in the tables.

Outputs

The SELECT command has the following outputs:

Table 2. SELECT outputs
Output	Description
`ROWS`	Returns the complete set of rows that result from the query.
`COUNT`	Returns the number of rows that are returned by the query.

Privileges

You must be the admin user, the table owner, the owner of the database or schema where the table is defined, or your account must have Select privileges on the table or the Table object class (or the External Table class, if the table being selected from is an external table).

Usage

The following provides sample usage.

Join the table films with the table distributors:

MYDB.SCH1(USER)=> SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d, films f WHERE f.did = d.did;

   title                     |did  |name              |date_prod  |kind
   --------------------------+-----+------------------+-----------+-------
   The Third Man             |101  |British Lion      |1949-12-23 |Drama
   The African Queen         |101  |British Lion      |1951-08-11 |Romantic
   Une Femme est une Femme   |102  |Jean Luc Godard   |1961-03-12 |Romantic
   Vertigo                   |103  |Paramount         |1958-11-14 |Action
   Becket                    |103  |Paramount         |1964-02-03 |Drama
   48 Hours                  |103  |Paramount         |1982-10-22 |Action
   War and Peace             |104  |Mosfilm           |1967-02-12 |Drama
   West Side Story           |105  |United Artists    |1961-01-03 |Musical
   Bananas                   |105  |United Artists    |1971-07-13 |Comedy
   Yojimbo                   |106  |Toho              |1961-06-16 |Drama
   There's a Girl in my Soup |107  |Columbia          |1970-06-11 |Comedy
   Taxi Driver               |107  |Columbia          |1975-05-15 |Action
   Absence of Malice         |107  |Columbia          |1981-11-15 |Action
   Storia di una donna       |108  |Westward          |1970-08-15 |Romantic
   The King and I            |109  |20th Century Fox  |1956-08-11 |Musical
   Das Boot                  |110  |Bavaria Atelier   |1981-11-11 |Drama

Sum the column len of all films and group the results by kind:

MYDB.SCH1(USER)=> SELECT kind, SUM(len) AS total FROM films GROUP BY
kind;
   kind     |total
   ---------+-------
   Action   |07:34
   Comedy   |02:58
   Drama    |14:28
   Musical  |06:42
   Romantic |04:38

Sum the column len of all films, group the results by kind, and display those group totals that are less than 5 hours:

MYDB.SCH1(USER)=> SELECT kind, SUM(len) AS total FROM films GROUP BY
kind HAVING SUM(len) < INTERVAL '5 HOUR';
   kind     | total
   ---------+------
   Comedy   | 02:58
   Romantic | 04:38

Sort the contents of the table distributors according to the contents of the second column (name):

MYDB.SCH1(USER)=> SELECT * FROM distributors ORDER BY name;
or
MYDB.SCH1(USER)=> SELECT * FROM distributors ORDER BY 2;
   did   | name
   ------+-----------------
   109   | 20th Century Fox
   110   | Bavaria Atelier
   101   | British Lion
   107   | Columbia
   102   | Jean Luc Godard
   113   | Luso Films
   104   | Mosfilm
   103   | Paramount
   106   | Toho
   105   | United Artists
   111   | Walt Disney
   112   | Warner Bros.
   108   | Westward

Obtain the union of the tables distributors and actors, and restrict the results to those that begin with letter W:

MYDB.SCH1(USER)=> SELECT distributors.name
      FROM distributors
      WHERE distributors.name LIKE 'W%';
      UNION
      SELECT actors.name
      FROM actors
      WHERE actors.name LIKE 'W%';
   Walt Disney
   Walter Matthau
   Warner Bros.
   Warren Beatty
   Westward
   Woody Allen

Because only distinct rows are wanted, the ALL keyword is omitted.

watsonx.data considerations

With 11.2.2.9 release:

Apache Iceberg tables do not have a maximum character length for the string data type, which causes a discrepancy with the way Netezza Performance Server treats text types. In order to compensate for this, Netezza will treat all string columns as having a length of 512 characters. If your data has a string who's length is greater than 512 characters, the query will fail with an overflow error.
Workaround:
- Set the size of DATALAKE_STRING_SIZE variable to a higher value to override the default character limit of 512.
- Alternatively, you can manually cast the string column to a Netezza NVARCHAR type with the required size.
  For example, to cast to 4096 characters instead, run:
```
SELECT stringCol::NVARCHAR(4096) FROM ...
```
  If any data during such a cast still exceeds 4Kb of allocated space, the rest of the data is truncated to the specified size.
You also must take the following aspects into account:
- The DATALAKE_STRING_SIZE session variable is applied to all columns of type string in a table.
- If you want to treat one column differently from another, you can cast each column to a different size.
- The cast is possible only to NVARCHAR() type.
- The range of DATALAKE_STRING_SIZE session variable is [1, 16000].
- A similar behavior is also applicable to binary data type, which uses its own designated DATALAKE_BINARY_SIZE environment variable with the same default size of 512 bytes and effective range of [1, 64000].
If there are schema names in Hive Metastore that are NPS keywords, you cannot use those keywords as identifiers when using the schema. For more information, see Schema names in Hive Metastore (HMS) that are NPS reserved keywords.