Grouping values by using the LISTAGG function

This tutorial shows you how to use the LISTAGG function, which is used to aggregate a set of string values within a group into a single string. This tutorial introduces the LISTAGG function, provides working examples, and provides a comparison with a similar aggregate function, XMLAGG.

Software requirements and prerequisites

To use the LISTAGG function:

FL 501 You must activate Db2® 12 function level 501 or higher by applying the PTF for APAR PI70535.
Note: In addition to applying the APAR, you must explicitly activate the function level by issuing the ACTIVATE command. For example:
```
-db2a ACTIVATE FUNCTION LEVEL (V12R1M501)
```
You might also need to bind or rebind your application with V12R1M501. The following example shows how to rebind DSNTEP3.DSNTEP3.(*) with V12R1M501:
```
REBIND PACKAGE(DSNTEP3.DSNTEP3.(*)) APPLCOMPAT(V12R1M501)
```

About LISTAGG

The LISTAGG function is used to aggregate a set of string values within a group into a single string by appending the string-expression values based on the order that's specified in the 'WITHIN GROUP' clause.

As a single-set aggregate function, LISTAGG operates on all rows and returns a single output row.
As a group-set aggregate function, LISTAGG operates on and returns an output row for each group that's defined by GROUP BY clause.

The following diagram shows the syntax for LISTAGG:

The input, string-expression, must be a string (character, graphic, or binary), a numeric data type, or a LOB. The numeric argument and LOBs are supported via implicit casting (numeric to VARCHAR; CLOB to VARCHAR of max length of 32704; DBCLOB to VARGRAPHIC of max length of 16352; BLOB to VARBINARY of max length of 32704).

The separator can be a literal, special register, variable, or an expression that is based on literals, special registers, or variables, as long as the expression does not include a non-deterministic function or a function that takes an external action.

The result data type of LISTAGG is determined by the input data type summarized in the following table:

Input data type	Output data type
CHAR(`n`) or VARCHAR(`n`)	VARCHAR
GRAPHIC(`n`) or VARGRAPHIC(`n`)	VARGRAPHIC
BINARY(`n`) or VARBINARY(`n`)	VARBINARY

Example: Using LISTAGG to output lists of employees who are in the same departments

Suppose you have a table called EMPLS that contains the employee name, employee ID, department ID, and year of birth of the employee.

DROP TABLE EMPLS;

CREATE TABLE EMPLS
(Dept_ID INT,
  E_Name VARCHAR(20),
  EMP_ID INT,
  Birth INT);

INSERT INTO EMPLS VALUES(10, 'Jack', 0012, 1983);
INSERT INTO EMPLS VALUES(10, 'Lily', 0015, 1990);
INSERT INTO EMPLS VALUES(20, 'Tom', 0019, 1983);
INSERT INTO EMPLS VALUES(20, 'Bob', 0022, 1976);
INSERT INTO EMPLS VALUES(20, 'Frank', 0004, 1983);
INSERT INTO EMPLS VALUES(20, 'Tom', 0014, 1985);
INSERT INTO EMPLS VALUES(30, 'Jerry', 0028, 1991);
INSERT INTO EMPLS VALUES(30, 'Chris', 0021, 1981);
INSERT INTO EMPLS VALUES(30, 'Jill', 0002, 1984);
INSERT INTO EMPLS VALUES(30, 'Jerry', 0031, 1984);
INSERT INTO EMPLS VALUES(30, 'Allan', 0006, 1995);

Dept_ID	E_Name	Emp_ID	Birth
10	Jack	0012	1983
10	Lily	0015	1990
20	Tom	0019	1983
20	Bob	0022	1976
20	Frank	0004	1983
20	Tom	0014	1985
30	Jerry	0028	1991
30	Chris	0021	1981
30	Jill	0002	1984
30	Jerry	0031	1984
30	Allan	0006	1995

Now suppose that you want to output the names of all the employees who are in the same department into a single row in ascending order according to their employee ID and birth date from this table.

One option is to first define a cursor to query the employees' names and departments from the table. You could then fetch data from the cursor and use application logic to identify employees who are in the same department and then concatenate their names. This approach is absolutely possible, but requires a lot of extra effort.

Another possible option is represented in the following SQLPL (note that # represents a SQL terminator here):

DROP TABLE RESULT#
CREATE TABLE RESULT(DEPT_ID INT, EMP_NAMES VARCHAR(100))#
DROP PROCEDURE AggEmpName#
CREATE PROCEDURE AggEmpName ()
    BEGIN
       DECLARE STMT VARCHAR(50) DEFAULT 'No name'; 

       DECLARE PRE_Dept_ID INT;
       SET STMT= '';
       SET PRE_Dept_ID= 0;
    
       FOR MYC CURSOR FOR
          SELECT E_Name, Dept_ID
          FROM EMPLS
          ORDER BY Dept_ID, Birth, Emp_ID
          DO
             IF PRE_Dept_ID = 0 THEN
               SET PRE_Dept_ID = Dept_ID;
               SET STMT = STMT ||E_Name||';';
          ELSE
             IF PRE_Dept_ID = Dept_ID THEN
               SET STMT = STMT ||E_Name||';';
             ELSE
                INSERT INTO RESULT VALUES (PRE_Dept_ID,STMT);
                SET PRE_Dept_ID = Dept_ID;
                SET STMT= '';
                SET STMT = STMT ||E_Name||';';
              END IF;
          END IF;
       END FOR;
  INSERT INTO RESULT VALUES (PRE_Dept_ID,STMT);
  END# 

CALL AggEmpName () #

SELECT * FROM RESULT#

First, you create a RESULT table to store the result (employee names that belong to the same department). Inside the body of SQLPL(AggEmpName), you concatenate the employee names for the same department in STMT. When the department ID changes, you insert the concatenated string to the RESULT table with the department ID. Then you reset the STMT and start to concatenate the name for the next department. Here's our result:

DEPT_ID     EMP_NAMES

10          Jack;Lily;
20          Bob;Frank;Tom;Tom;
30          Chris;Jill;Jerry;Jerry;Allan;
   3 record(s) selected

With the LISTAGG function, this task becomes very simple. Consider the following query:

SELECT Dept_ID,
       LISTAGG(ALL E_Name, '; ')
         WITHIN GROUP (ORDER BY Birth, Emp_ID) 
        AS Name_List
  FROM EMPLS
  GROUP BY Dept_ID;

The syntax is easy to understand. In each department (GROUP BY Dept_ID), you aggregate the employee names, and order by Birth and Emp_ID, as shown in the following result:

Dept_ID	Name_List
10	Jack; Lily
20	Bob; Frank; Tom; Tom
30	Chris; Jill; Jerry; Jerry; Allan

Example: Removing duplicate employee names from each department group created by LISTAGG

If you want to remove the duplicate names in each group, you can use the DISTINCT keyword, as shown in the following query:

SELECT Dept_ID,
       LISTAGG(DISTINCT E_Name, '; ')
         WITHIN GROUP (ORDER BY E_Name, Birth, Emp_ID) 
        AS Name_List
 FROM EMPLS
 GROUP BY Dept_ID;

Dept_ID	Name_List
10	Jack; Lily
20	Bob; Frank; Tom
30	Allan; Chris; Jerry; Jill

As you might notice, it's very easy to use and maintain (when comparing with implementing your own application logic for doing so).

Tip: Don't forget to put E_NAME in the ORDER BY clause. If you forget to do so and run the SQL statement as follows...

SELECT Dept_ID,
    LISTAGG(DISTINCT  E_Name, '; ')
      WITHIN GROUP (ORDER BY  Birth, Emp_ID)
        AS Name_List
 FROM EMPLS
GROUP BY Dept_ID;

...you'll get SQL error code -214:

-214, ERROR: AN EXPRESSION IN THE FOLLOWING POSITION, OR STARTING WITH 1 IN THE ORDER BY CLAUSE IS NOT

VALID. REASON CODE = 8

RC 8 DISTINCT is specified in the LISTAGG function and the first argument expression cannot be matched exactly
with the first sort key expression of the ORDER BY specified in the WITHIN GROUP clause. This reason code occurs 
only when clause-type is ORDER BY.

This rule is required to avoid non-deterministic results.

Now suppose that our table contains the following data (some columns are not shown here). Notice that there are two employees named Tom, each with a different year of birth.

E_Name	Birth
Tom	1983
Bob	1986
Tom	1990
Jill	1980

The result of the previous query (if you remove the restriction of -214) would be non-deterministic. It could be either:

Jill, Tom, Bob

or:

Jill, Bob, Tom

Without the restriction of -214, Db2 cannot be sure which row (Tom 1983 or Tom 1990) is duplicated and needs to be removed.

Comparing LISTAGG and XMLAGG

If you are familiar with XML, you might already know about the aggregate function in XML called XMLAGG. XMLAGG was introduced in Db2 9 for z/OS.

To achieve the same (or similar) results, the closest you can do with XMLAGG is:

SELECT DEPT_ID,
       XMLAGG(XMLTEXT(E_NAME || '; ')
           ORDER BY Birth, EMP_ID)     
       AS Name_List
FROM EMPLS
GROUP BY Dept_ID;

XMLAGG takes only the XML data type as input, so you need to apply XMLTEXT() on E_NAME, then concatenate with a ';' as a separator. Here's the result:

DEPT_ID     NAME_LIST
10          Jack; Lily;
20          Bob; Frank; Tom; Tom;
30          Chris; Jill; Jerry; Jerry; Allan;       
 3 record(s) selected

In this scenario, there are several notable differences between XMLAGG and LISTAGG:

You cannot eliminate duplicates inside XMLAGG. For LISTAGG, you can use the DISTINCT keyword.
The return data type of XMLAGG is always XML. So depending on how the result is used, the output of XMLAGG might need to be cast to the data type that the application requires.
The VARCHAR output of LISTAGG is limited to 32704. XMLLAGG doesn't have such a size limitation.
If you pay attention to the output of XMLAGG and LISTAGG, you'll see a ; (separator) at the end of XMLAGG output(NAME_LIST), but not at the end of LISTAGG output. In the XMLAGG statement, the separator is added "manually" by using the concatenate operator (||), whereas the separator in LISTAGG is "built-in," and is not added if there are no more tokens.