OLAP specifications
On-Line Analytical Processing (OLAP) specifications provide the ability to return ranking, row numbering, and other aggregate function information as a scalar value in a query result.
OLAP-specification
An OLAP specification can be included in an expression in a select-clause or the ORDER BY clause of a select-statement. The query result to which the OLAP specification is applied is the result table of the innermost subselect that includes the OLAP specification. OLAP specifications are sometimes referred to as window functions.
An OLAP specification is not valid in a WHERE, VALUES, GROUP BY, HAVING, or SET clause, or join-condition in an ON clause of a joined table. An OLAP specification cannot be used as an argument of an aggregate function in the select-clause.
When invoking an OLAP specification, a window is specified that defines the rows over which the function is applied, and in what order.
- ordered-OLAP-specification
- Specifies OLAP operations that require a window-order-clause.
- LAG or LEAD
- Returns an expression value computed using rows prior to
or following the current row.
offset must be a positive integer or positive bigint constant. If offset is not specified, the value 1 is used.
default-value must be an expression that is castable to the type of expression. If default-value is not specified, the default value is the null value.
If IGNORE NULLS is specified, all rows where the expression value is the null value are not considered in the calculation.
- LAG
- The LAG function returns the expression value for the row at offset rows before the current row. If a window-partition-clause is specified, offset means offset rows before the current row and within the current partition.
- LEAD
- The LEAD function returns the expression value for the row at offset rows after the current row. If a window-partition-clause is specified, offset means offset rows after the current row and within the current partition.
The data type of the result is the data type of expression. The result can be null. If IGNORE NULLS is specified and all values in the window are null, the result is the null value.
- RANK or DENSE_RANK
- Specifies that the ordinal rank of a row within the window is
computed. Rows that are not distinct with respect to the ordering
within their window are assigned the same rank. The results of ranking
may be defined with or without gaps in the numbers resulting from
duplicate values.
The data type of the result is BIGINT. The result cannot be null.
- RANK
- Specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row. Thus, if two or more rows are not distinct with respect to the ordering, then there will be one or more gaps in the sequential rank numbering.
- DENSE_RANK
- Specifies that the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering. Therefore, there will be no gaps in the sequential rank numbering.
- NTILE
- Specifies
that the quantile rank of a row within the window is computed.
The argument must be castable to a BIGINT. If expression does not return a SMALLINT, INTEGER, or BIGINT, it is cast to BIGINT before evaluating the function. It must have a value greater than zero. expression must not contain a scalar-fullselect, a column reference, or a user-defined function reference.
The result is the quantile rank of the current row. The argument determines the number of quantiles in the result, which is determined by dividing the number of rows in the window by the value of the argument. If the number of rows within the window is not evenly divisible by the argument, each quantile will have a least n rows, and quantiles 1 through m will each have n+1 rows.- r is the number of rows in the window
- q is the value of the argument
- m = MOD (r , q )
- n = TRUNC (r , q )
The data type of the result is SMALLINT, INTEGER, or BIGINT based on the value of the argument. If the argument can be null, the result can be null. If the argument is null, the result is the null value.
- CUME_DIST
- A
cumulative distribution function that determines a percentile ranking
for each row, expressed as a decimal fraction ranging from 0 to 1.
Given the default ascending order of rows, CUME_DIST computes the
number of rows that are ranked lower than or equal to the current
row, including the current row, divided by the total number of rows
in the partition. If the window_order_clause specifies descending
order, CUME_DIST computes the number of rows that are ranked higher
than or equal to the current row divided by the total number of rows
in the partition.
For example, given default ordering, if there are ten rows in the partition and the current row ranks lower than six of those rows, the CUME_DIST result is 0.7 (six rows plus the current row = 7/10). The lowest-ranking row in a partition has a CUME_DIST value of 1.0, assuming default ascending order. If there is a single row in the partition, its CUME_DIST value is also 1.0.
The data type of the result is DECFLOAT(34). The result cannot be null.
- PERCENT_RANK
- The
PERCENT_RANK function is a distribution function that returns a relative
percentile rank of a row within an OLAP window, expressed as a value
between 0.0 - 1.0. When the number of rows in the OLAP window is greater
than 1, the result is computed as follows:
- The RANK of the current row in the OLAP window minus 1 divided by the number of rows in the OLAP window minus 1.
Otherwise, the result is 0.0.
The data type of the result is DECFLOAT(34). The result cannot be null.
- numbering-specification
- Specifies an OLAP operation that returns sequential numbers
for each row.
- ROW_NUMBER
- Specifies
that a sequential row number is computed for the row within the window
defined by the ordering, starting with 1 for the first row. If the
ORDER BY clause is not specified in the window, the row numbers are
assigned to the rows in arbitrary order, as returned by the subselect
(not according to any ORDER BY clause in the select-statement).
The data type of the result is BIGINT. The result cannot be null.
- window-partition-clause
- Defines the partition within which the OLAP operation is applied.
- PARTITION BY (partitioning-expression,...)
- Defines the partition within which the OLAP operation is applied. A partitioning-expression is an expression used in defining the partitioning of the result set. Each column name referenced in a partitioning-expression must unambiguously reference a column of the result table of the subselect that contains the OLAP specification. A partitioning-expression cannot include a scalar-fullselect or any function that is not deterministic or has an external action.
- window-order-clause
- Defines the ordering of rows within a partition that is used
to determine the value of the OLAP specification. It does not define
the ordering of the result table.
- ORDER BY (sort-key-expression,...)
- A sort-key-expression is an expression used in defining the ordering of the rows within a window partition. Each column name referenced in a sort-key-expression must unambiguously reference a column of the result table of the subselect, including the OLAP specification. A sort-key-expression cannot include a scalar-fullselect or any function that is not deterministic or that has an external action.
- ASC
- Specifies that the values of the sort-key-expression are used in ascending order.
- DESC
- Specifies that the values of the sort-key-expression are used in descending order.
- NULLS FIRST
- Specifies that the window ordering considers null values before all non-null values in the sort order.
- NULLS LAST
- Specifies that the window ordering considers null values after all non-null values in the sort order.
- ORDER OF table-designator
- Specifies that the same ordering used in table-designator should be applied to the result table of the subselect. There must be a table reference matching table-designator in the FROM clause of the subselect that specifies this clause and the table reference must identify a nested-table-expression or common-table-expression. The subselect (or fullselect) corresponding to the specified table-designator must include an ORDER BY clause that is dependent on the data. The ordering that is applied is the same as if the columns of the ORDER BY clause in the nested subselect (or fullselect) were included in the outer subselect (or fullselect), and these columns were specified in place of the ORDER OF clause.
- OLAP-aggregate-function
- Specifies a function that will compute a single value from the
OLAP window.
- FIRST_VALUE or LAST_VALUE
- Returns the first or last
value in an OLAP window.If IGNORE NULLS is specified, all rows where the expression value is the null value are not considered in the calculation.
- FIRST_VALUE
- The result is the expression value for the first row in an OLAP window.
- LAST_VALUE
- The result is the expression value for the last row in an OLAP window.
- NTH_VALUE
- Returns
the expression value for the nth row in an OLAP window.
n-expression must be an integer constant or variable with a value greater than zero.
If IGNORE NULLS is specified, all rows where the expression value is the null value are not considered in the calculation.
If FROM FIRST is specified, the nth value is computed counting forward from the beginning of the OLAP window.
If FROM LAST is specified, the nth value is computed counting backward from the end of the OLAP window.
The result is the nth value within the OLAP window as determined by n-expression.
The data type of the result is the data type of expression. The result can be null. If n-expression is null, then the result is the null value. If IGNORE NULLS is specified and all values in the window are null, the result is the null value.
FIRST_VALUE(expression) is equivalent to NTH_VALUE(expression, 1) FROM FIRST.
LAST_VALUE(expression) is equivalent to NTH_VALUE(expression, 1) FROM LAST.
- RATIO_TO_REPORT
- Returns
the ratio of an argument to the sum of the arguments in an OLAP window.
For example, the following functions are equivalent:
RATIO_TO_REPORT(expression) OVER (...) CAST(expression AS DECFLOAT(34)) / SUM(CAST(expression as DECFLOAT(34))) OVER(...)
The argument must be an expression that is castable to DECFLOAT(34). The division is performed using DECFLOAT(34).
The data type of the result is DECFLOAT(34). If the argument can be null, the result can be null; if the argument is null, the result is the null value.
- window-aggregation-group-clause
- The aggregation group of a given row is a set of rows that is defined in relation to the given row (in the ordering of the rows in the partition of the given row). window-aggregation-group-clause specifies the aggregation group. If this clause is not specified and a window-order-clause is also not specified, the aggregation group consists of all rows of the window partition. The aggregation group of all rows of the window partition can be explicitly specified using the RANGE or ROWS clauses.
Notes
Comparisons: Partitioning and ordering are performed in accordance with the comparison rules described in Assignments and comparisons.
Collating sequence: If a collating sequence other than *HEX is in effect when the statement that contains the OLAP expression is executed, and the partitioning-expressions or the sort-key-expressions are SBCS data, mixed data, or Unicode data, then the results are determined using the weighted values. The weighted values are derived by applying the collating sequence to the partitioning-expressions and the sort-key-expressions.
Column masks: If a column that is referenced in the partitioning-expression or the sort-key-expression of the OLAP specification is defined to have a column mask, the column mask is not applied.
- a distributed table,
- a table with a read trigger, or
- a logical file built over multiple physical file members.
Determinism: An OLAP specification is non-deterministic.
Syntax alternatives:
- DENSERANK can be specified in place of DENSE_RANK.
- ROWNUMBER can be specified in place of ROW_NUMBER.
- IGNORE NULLS or RESPECT NULLS can be specified as a string-constant final argument to LAG, LEAD, FIRST_VALUE, and LAST_VALUE.
Examples
- Display the ranking of employees, in order by surname, according
to their total salary (based on salary plus bonus) that have a total
salary more than $30,000:
Note that if the result is to be ordered by the ranking, then replace ORDER BY LASTNAME with:SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY, RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY FROM EMPLOYEE WHERE SALARY+BONUS > 30000 ORDER BY LASTNAME
or:ORDER BY RANK_SALARY
ORDER BY RANK() OVER (ORDER BY SALARY+BONUS DESC)
- Rank the departments according to their average total salary:
SELECT WORKDEPT, AVG(SALARY+BONUS) AS AVG_TOTAL_SALARY, RANK() OVER (ORDER BY AVG( SALARY+BONUS) DESC) AS RANK_AVG_SAL FROM EMPLOYEE GROUP BY WORKDEPT ORDER BY RANK_AVG_SAL
- Rank the employees within a department according to their education
level. Having multiple employees with the same rank in the department
should not increase the next ranking value:
SELECT WORKDEPT, EMPNO, LASTNAME, FIRSTNME, EDLEVEL, DENSE_RANK() OVER (PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL FROM EMPLOYEE ORDER BY WORKDEPT, LASTNAME
- Provide row numbers in the result of a query:
SELECT ROW_NUMBER() OVER (ORDER BY WORKDEPT, LASTNAME ) AS NUMBER, LASTNAME, SALARY FROM EMPLOYEE ORDER BY WORKDEPT, LASTNAME
- List the top five wage earners:
SELECT EMPNO, LASTNAME, FIRSTNME, TOTAL_SALARY, RANK_SALARY FROM (SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY, RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY FROM EMPLOYEE) AS RANKED_EMPLOYEE WHERE RANK_SALARY < 6 ORDER BY RANK_SALARY
Note that a nested table expression was used to first compute the result, including the rankings, before the rank could be used in the WHERE clause. A common table expression could also have been used.
- Compute the top 3 stock prices for stock ABC using
NTH_VALUE:
SELECT Symbol, StockDate, Price, FIRST_VALUE(Price) OVER (PARTITION BY Symbol ORDER BY StockDate) AS FIRST_PRICE, NTH_VALUE(Price, 2) OVER (PARTITION BY Symbol ORDER BY StockDate) AS SECOND_PRICE, NTH_VALUE(Price, 3) OVER (PARTITION BY Symbol ORDER BY StockDate) AS THIRD_PRICE FROM DailyStockData WHERE StockDate BETWEEN CURRENT DATE - 1 MONTH AND CURRENT DATE AND Symbol = 'ABC'
- Compute the quartile rank using NTILE:
SELECT proc_id, total_sales, NTILE(4) OVER (ORDER BY total_sales DESC) AS Quartile FROM Sales
- Calculate the 30 day moving average for the stocks 'ABC' and 'XYZ' during
2005:
WITH V1(SYMBOL, TRADINGDATE, MOVINGAVG30DAY) AS ( SELECT SYMBOL, TRADINGDATE, AVG(CLOSINGPRICE) OVER (PARTITION BY SYMBOL ORDER BY TRADINGDATE ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) FROM DAILYSTOCKDATA WHERE SYMBOL IN ('ABC', 'XYZ') AND TRADINGDATE BETWEEN DATE('2005-01-01') - 2 MONTHS AND '2005-12-31' ) SELECT SYMBOL, TRADINGDATE, MOVINGAVG30DAY FROM V1 WHERE TRADINGDATE BETWEEN '2005-01-01' AND '2005-12-31' ORDER BY SYMBOL, TRADINGDATE
- Display the difference between each employee's salary
and the median of the salaries of that employee's department:
SELECT EMPNO, WORKDEPT, SALARY, SALARY - (MEDIAN(SALARY) OVER (PARTITION BY WORKDEPT)) FROM EMPLOYEE ORDER BY WORKDEPT
- Display the difference between each employee's salary
and the 90th percentile of salaries within that employee's department:
SELECT EMPNO, WORKDEPT, SALARY, SALARY - (PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY SALARY) OVER (PARTITION BY WORKDEPT)) FROM EMPLOYEE ORDER BY WORKDEPT
- Find the cumulative distribution and the relative
percentile rank of each employee's salary within their department.
SELECT EMPNO, WORKDEPT, SALARY, CAST(CUME_DIST() OVER (PARTITION BY WORKDEPT ORDER BY SALARY) AS DECIMAL(4,3)) AS CUME_DIST, CAST(PERCENT_RANK() OVER (PARTITION BY WORKDEPT ORDER BY SALARY) AS DECIMAL(4,3)) AS PERCENT_RANK FROM EMP ORDER BY WORKDEPT, SALARY