Window aggregation on a grouping select
This example goes through the steps for writing a SQL query that finds the top city in each region that is based on annual sales totals.
The following example ranks the cities that are based on their
sales amount:
SELECT *, RANK() OVER (ORDER BY amt DESC) AS ranking FROM sales_tbl
WHERE region = 'Northeast' or region = 'Central' ORDER BY amt DESC;
city | state | region | quarter | amt | ranking
-----------+-------+-----------+---------+------+---------
New York | NY | Northeast | 4 | 5000 | 1
New York | NY | Northeast | 3 | 4300 | 2
New York | NY | Northeast | 2 | 3700 | 3
New York | NY | Northeast | 1 | 3000 | 4
Baltimore | MD | Central | 2 | 2500 | 5
Baltimore | MD | Central | 4 | 2500 | 5
Boston | MA | Northeast | 4 | 2400 | 7
Boston | MA | Northeast | 1 | 2000 | 8
Baltimore | MD | Central | 1 | 2000 | 8
Baltimore | MD | Central | 3 | 2000 | 8
Atlanta | GA | Central | 4 | 1700 | 11
Boston | MA | Northeast | 3 | 1700 | 11
Atlanta | GA | Central | 3 | 1600 | 13
Boston | MA | Northeast | 2 | 1500 | 14
Atlanta | GA | Central | 2 | 1500 | 14
Atlanta | GA | Central | 1 | 1500 | 14
(16 rows)
The following example throws an error, as aggregates are not allowed
in the WHERE clause:
SELECT *, RANK() OVER (ORDER BY amt DESC) AS ranking FROM sales_tbl
WHERE ranking <= 2;
ERROR: Aggregates not allowed in WHERE clause
Get the top two by selecting cities with a rank of less than 2:
SELECT * FROM ( SELECT *, RANK() OVER (ORDER BY amt DESC) AS ranking
FROM sales_tbl WHERE region = 'Northeast' or region = 'Central') AS
subset WHERE ranking <= 2;
CITY | STATE | REGION | QUARTER | AMT | PROFIT_MARGIN | RANKING
---------+-------+-----------+---------+------+---------------+---------
New York | NY | Northeast | 4 | 5000 | 20 | 1
New York | NY | Northeast | 3 | 4300 | 22 | 2
(2 rows)
Now rank the cities that are based on total sales for the year.
So first compute total sales for each city for each year:
SELECT city, state, region, SUM(amt) AS yr_sales FROM sales_tbl WHERE
region = 'Northeast' or region = 'Central' GROUP BY region, state,
city;
CITY | STATE | REGION | YR_SALES
----------+-------+-----------+----------
Boston | MA | Northeast | 7600
New York | NY | Northeast | 16000
Baltimore | MD | Central | 9000
Atlanta | GA | Central | 6300
(4 rows)
Now rank the cities on total sales.
SELECT city, state, region, SUM(amt) AS yr_sales, RANK() OVER
(PARTITION BY region ORDER BY SUM(amt)) FROM sales_tbl WHERE region =
'Northeast' or region = 'Central' GROUP BY region, state, city;
CITY | STATE | REGION | YR_SALES | RANK
----------+-------+-----------+----------+------
Boston | MA | Northeast | 7600 | 1
New York | NY | Northeast | 16000 | 2
Atlanta | GA | Central | 6300 | 1
Baltimore | MD | Central | 9000 | 2
(4 rows)
When a window aggregate and a GROUP BY clause display in a single query, the GROUP BY is evaluated first, and the window aggregate function is evaluated on the result of the GROUP BY.
The following example puts it all together to find the top two
cities that are based on total sales:
SELECT * FROM (SELECT city, state, region, SUM(amt) AS yr_sales, RANK
() OVER (PARTITION BY region ORDER BY SUM(amt) DESC) AS ranking FROM
sales_tbl WHERE region = 'Northeast' or region = 'Central' GROUP BY
region, state, city) subset WHERE ranking = 1 ORDER BY yr_sales DESC;
city | state | region | yr_sales | ranking
----------+-------+-----------+----------+---------
New York | NY | Northeast | 16000 | 1
Baltimore | MD | Central | 9000 | 1
(2 rows)