Grouping values by using the LISTAGG function
This tutorial shows you how to use the LISTAGG function, which is used to aggregate a set of string values within a group into a single string. This tutorial introduces the LISTAGG function, provides working examples, and provides a comparison with a similar aggregate function, XMLAGG.
Software requirements and prerequisites
- FL 501 You must activate Db2® 12 function level 501 or higher by applying the
PTF for APAR PI70535.Note: In addition to applying the APAR, you must explicitly activate the function level by issuing the ACTIVATE command. For example:
-db2a ACTIVATE FUNCTION LEVEL (V12R1M501)
- You might also need to bind or rebind your application with
V12R1M501. The following example shows how to rebind DSNTEP3.DSNTEP3.(*) with
REBIND PACKAGE(DSNTEP3.DSNTEP3.(*)) APPLCOMPAT(V12R1M501)
The LISTAGG function is used to aggregate a set of string values within a group into a single string by appending the string-expression values based on the order that's specified in the 'WITHIN GROUP' clause.
- As a single-set aggregate function, LISTAGG operates on all rows and returns a single output row.
- As a group-set aggregate function, LISTAGG operates on and returns an output row for each group that's defined by GROUP BY clause.
The following diagram shows the syntax for LISTAGG:
The input, string-expression, must be a string (character, graphic, or binary), a numeric data type, or a LOB. The numeric argument and LOBs are supported via implicit casting (numeric to VARCHAR; CLOB to VARCHAR of max length of 32704; DBCLOB to VARGRAPHIC of max length of 16352; BLOB to VARBINARY of max length of 32704).
The separator can be a literal, special register, variable, or an expression that is based on literals, special registers, or variables, as long as the expression does not include a non-deterministic function or a function that takes an external action.
The result data type of LISTAGG is determined by the input data type summarized in the following table:
|Input data type||Output data type|
|CHAR(n) or VARCHAR(n)||VARCHAR|
|GRAPHIC(n) or VARGRAPHIC(n)||VARGRAPHIC|
|BINARY(n) or VARBINARY(n)||VARBINARY|
Here's a simple example: suppose we have a table called JANE.EMP that contains the employee name, employee ID, department ID, and year of birth of the employee.
DROP TABLE JANE.EMP; CREATE TABLE JANE.EMP (Dept_ID INT, E_Name VARCHAR(20), EMP_ID INT, Birth INT); INSERT INTO JANE.EMP VALUES(10, 'Jack', 0012, 1983); INSERT INTO JANE.EMP VALUES(10, 'Lily', 0015, 1990); INSERT INTO JANE.EMP VALUES(20, 'Tom', 0019, 1983); INSERT INTO JANE.EMP VALUES(20, 'Bob', 0022, 1976); INSERT INTO JANE.EMP VALUES(20, 'Frank', 0004, 1983); INSERT INTO JANE.EMP VALUES(20, 'Tom', 0014, 1985); INSERT INTO JANE.EMP VALUES(30, 'Jerry', 0028, 1991); INSERT INTO JANE.EMP VALUES(30, 'Chris', 0021, 1981); INSERT INTO JANE.EMP VALUES(30, 'Jill', 0002, 1984); INSERT INTO JANE.EMP VALUES(30, 'Jerry', 0031, 1984); INSERT INTO JANE.EMP VALUES(30, 'Allan', 0006, 1995);
Now suppose that we want to output the names of all the employees who are in the same department into a single row in ascending order according to their employee ID and birth date from this table.
One option is to first define a cursor to query the employees' names and departments from the table. We could then fetch data from the cursor and use application logic to identify employees who are in the same department and then concatenate their names. This approach is absolutely possible, but requires a lot of extra effort.
Another possible option is represented in the following SQLPL (note that # represents a SQL terminator here):
DROP TABLE RESULT# CREATE TABLE RESULT(DEPT_ID INT, EMP_NAMES VARCHAR(100))# DROP PROCEDURE AggEmpName# CREATE PROCEDURE AggEmpName () BEGIN DECLARE STMT VARCHAR(50) DEFAULT 'No name'; DECLARE PRE_Dept_ID INT; SET STMT= ''; SET PRE_Dept_ID= 0; FOR MYC CURSOR FOR SELECT E_Name, Dept_ID FROM JANE.EMP ORDER BY Dept_ID, Birth, Emp_ID DO IF PRE_Dept_ID = 0 THEN SET PRE_Dept_ID = Dept_ID; SET STMT = STMT ||E_Name||';'; ELSE IF PRE_Dept_ID = Dept_ID THEN SET STMT = STMT ||E_Name||';'; ELSE INSERT INTO RESULT VALUES (PRE_Dept_ID,STMT); SET PRE_Dept_ID = Dept_ID; SET STMT= ''; SET STMT = STMT ||E_Name||';'; END IF; END IF; END FOR; INSERT INTO RESULT VALUES (PRE_Dept_ID,STMT); END# CALL AggEmpName () # SELECT * FROM RESULT#
First, we create a RESULT table to store the result (employee names that belong to the same
department). Inside the body of SQLPL(
AggEmpName), we concatenate the employee
names for the same department in
STMT. When the department ID changes, we insert
the concatenated string to the RESULT table with the department ID. Then we reset the
STMT and start to concatenate the name for the next department. Here's our
DEPT_ID EMP_NAMES 10 Jack;Lily; 20 Bob;Frank;Tom;Tom; 30 Chris;Jill;Jerry;Jerry;Allan; 3 record(s) selected
With the LISTAGG function, this task becomes very simple. Consider the following query:
SELECT Dept_ID, LISTAGG(ALL E_Name, '; ') WITHIN GROUP (ORDER BY Birth, Emp_ID) AS Name_List FROM JANE.EMP GROUP BY Dept_ID;
The syntax is easy to understand. In each department (GROUP BY Dept_ID), we aggregate the employee names, and order by Birth and Emp_ID, as shown in the following result:
|20||Bob; Frank; Tom; Tom|
|30||Chris; Jill; Jerry; Jerry; Allan|
If you want to remove the duplicate names in each group, you can use the DISTINCT keyword, as shown in the following query:
SELECT Dept_ID, LISTAGG(DISTINCT E_Name, '; ') WITHIN GROUP (ORDER BY E_Name, Birth, Emp_ID) AS Name_List FROM JANE.EMP GROUP BY Dept_ID;
|20||Bob; Frank; Tom|
|30||Allan; Chris; Jerry; Jill|
As you might notice, it's very easy to use and maintain (when comparing with implementing your own application logic for doing so).
SELECT Dept_ID, LISTAGG(DISTINCT E_Name, '; ') WITHIN GROUP (ORDER BY Birth, Emp_ID) AS Name_List FROM JANE.EMP GROUP BY Dept_ID;
-214, ERROR: AN EXPRESSION IN THE FOLLOWING POSITION, OR STARTING WITH 1 IN THE ORDER BY CLAUSE IS NOT VALID. REASON CODE = 8 RC 8 DISTINCT is specified in the LISTAGG function and the first argument expression cannot be matched exactly with the first sort key expression of the ORDER BY specified in the WITHIN GROUP clause. This reason code occurs only when clause-type is ORDER BY.
This rule is required to avoid non-deterministic results.
Now suppose that our table contains the following data (some columns are not shown here). Notice that there are two employees named Tom, each with a different year of birth.
Jill, Tom, Bob
Jill, Bob, Tom
Without the restriction of -214, Db2 can't be sure which row (Tom 1983 or Tom 1990) is duplicated and needs to be removed.
Comparing LISTAGG and XMLAGG
If you are familiar with XML, you might already know about the aggregate function in XML called XMLAGG. XMLAGG was introduced in Db2 9 for z/OS.
To achieve the same (or similar) results, the closest we can do with XMLAGG is:
SELECT DEPT_ID, XMLAGG(XMLTEXT(E_NAME || '; ') ORDER BY Birth, EMP_ID) AS Name_List FROM JANE.EMP GROUP BY Dept_ID;
XMLAGG takes only the XML data type as input, so we need to apply XMLTEXT() on E_NAME, then concatenate with a ';' as a separator. Here's the result:
DEPT_ID NAME_LIST 10 Jack; Lily; 20 Bob; Frank; Tom; Tom; 30 Chris; Jill; Jerry; Jerry; Allan; 3 record(s) selected
- We can't eliminate duplicates inside XMLAGG. For LISTAGG, we can use the DISTINCT keyword.
- The return data type of XMLAGG is always XML. So depending on how the result is used, the output of XMLAGG might need to be cast to the data type that the application requires.
- The VARCHAR output of LISTAGG is limited to 32704. XMLLAGG doesn't have such a size limitation.
- If you pay attention to the output of XMLAGG and LISTAGG, you'll see a ; (separator) at the end of XMLAGG output(NAME_LIST), but not at the end of LISTAGG output. In the XMLAGG statement, the separator is added "manually" by using the concatenate operator (||), whereas the separator in LISTAGG is "built-in," and is not added if there are no more tokens.