How to do MEDIAN in DB2
After a quick look at Wikipedia MEDIAN I figured this can't be too hard and the result may be of interest to others.
At the beginning there is a table, preferably with some data
Now the MEDIAN is the middle value of a set of values. So in the case above that would be 30.CREATE TABLE T(c1 INT);
We can get to the middle value by counting the number of values, number them while we do that and then take the one in the middle.
OK, but what if there is an even number of rows? In that case we are supposed to take the average between the two closest values.SELECT c1
Now the median should be (18 + 30) / 2 => 24INSERT INTO T VALUES 18;
If we rerun the query above we get: 18 however.
Now one of the more powerful, and rather curious, OLAP functions is LEAD.
LEAD allows us to "shift" values from one row into another.
The expression LEAD(c1, 1) OVER(ORDER BY c1) will pick out the next (1) value of c1 in the sort order.
I dare say the goal appears to be in sight. All we need to do now is, for odd numbers of rows pick c1 as it is and for even numbers average c1 and c1next:SELECT c1, c1next
Success! Now, let's generalize the logic to support GROUP BY. We do thsi by introducing PARTITION BY into the OLAP function which tells them to do their work on a per group (partition) basis.SELECT DECODE((cnt / 2) * 2, cnt , (c1 + c1next) / 2, c1)
Now, just for completeness we check the optimizer plan and make sure it looks straight forwardALTER TABLE t ADD COLUMN c2 INTEGER DEFAULT 1;
Straight as an arrow!Access Plan: