Start of change

Grouping values by using the LISTAGG function

This tutorial shows you how to use the LISTAGG function, which is used to aggregate a set of string values within a group into a single string. This tutorial introduces the LISTAGG function, provides working examples, and provides a comparison with a similar aggregate function, XMLAGG.

Software requirements and prerequisites

To use the LISTAGG function:
  1. FL 501 You must activate Db2® 12 function level 501 or higher by applying the PTF for APAR PI70535.
    Note: In addition to applying the APAR, you must explicitly activate the function level by issuing the ACTIVATE command. For example:
    -db2a ACTIVATE FUNCTION LEVEL (V12R1M501)
  2. You might also need to bind or rebind your application with V12R1M501. The following example shows how to rebind DSNTEP3.DSNTEP3.(*) with V12R1M501:
    REBIND PACKAGE(DSNTEP3.DSNTEP3.(*)) APPLCOMPAT(V12R1M501)

About LISTAGG

The LISTAGG function is used to aggregate a set of string values within a group into a single string by appending the string-expression values based on the order that's specified in the 'WITHIN GROUP' clause.

  • As a single-set aggregate function, LISTAGG operates on all rows and returns a single output row.
  • As a group-set aggregate function, LISTAGG operates on and returns an output row for each group that's defined by GROUP BY clause.

The following diagram shows the syntax for LISTAGG:

Read syntax diagramSkip visual syntax diagramLISTAGG(ALLDISTINCT string-expression,separator)WITHIN GROUP(ORDER BY,sort-keyASCDESC)

The input, string-expression, must be a string (character, graphic, or binary), a numeric data type, or a LOB. The numeric argument and LOBs are supported via implicit casting (numeric to VARCHAR; CLOB to VARCHAR of max length of 32704; DBCLOB to VARGRAPHIC of max length of 16352; BLOB to VARBINARY of max length of 32704).

The separator can be a literal, special register, variable, or an expression that is based on literals, special registers, or variables, as long as the expression does not include a non-deterministic function or a function that takes an external action.

The result data type of LISTAGG is determined by the input data type summarized in the following table:

Input data type Output data type
CHAR(n) or VARCHAR(n) VARCHAR
GRAPHIC(n) or VARGRAPHIC(n) VARGRAPHIC
BINARY(n) or VARBINARY(n) VARBINARY

Here's a simple example: suppose we have a table called JANE.EMP that contains the employee name, employee ID, department ID, and year of birth of the employee.

DROP TABLE JANE.EMP;

CREATE TABLE JANE.EMP
(Dept_ID INT,
  E_Name VARCHAR(20),
  EMP_ID INT,
  Birth INT);

INSERT INTO JANE.EMP VALUES(10, 'Jack', 0012, 1983);
INSERT INTO JANE.EMP VALUES(10, 'Lily', 0015, 1990);
INSERT INTO JANE.EMP VALUES(20, 'Tom', 0019, 1983);
INSERT INTO JANE.EMP VALUES(20, 'Bob', 0022, 1976);
INSERT INTO JANE.EMP VALUES(20, 'Frank', 0004, 1983);
INSERT INTO JANE.EMP VALUES(20, 'Tom', 0014, 1985);
INSERT INTO JANE.EMP VALUES(30, 'Jerry', 0028, 1991);
INSERT INTO JANE.EMP VALUES(30, 'Chris', 0021, 1981);
INSERT INTO JANE.EMP VALUES(30, 'Jill', 0002, 1984);
INSERT INTO JANE.EMP VALUES(30, 'Jerry', 0031, 1984);
INSERT INTO JANE.EMP VALUES(30, 'Allan', 0006, 1995);
Dept_ID E_Name Emp_ID Birth
10 Jack 0012 1983
10 Lily 0015 1990
20 Tom 0019 1983
20 Bob 0022 1976
20 Frank 0004 1983
20 Tom 0014 1985
30 Jerry 0028 1991
30 Chris 0021 1981
30 Jill 0002 1984
30 Jerry 0031 1984
30 Allan 0006 1995

Now suppose that we want to output the names of all the employees who are in the same department into a single row in ascending order according to their employee ID and birth date from this table.

One option is to first define a cursor to query the employees' names and departments from the table. We could then fetch data from the cursor and use application logic to identify employees who are in the same department and then concatenate their names. This approach is absolutely possible, but requires a lot of extra effort.

Another possible option is represented in the following SQLPL (note that # represents a SQL terminator here):

DROP TABLE RESULT#
CREATE TABLE RESULT(DEPT_ID INT, EMP_NAMES VARCHAR(100))#
DROP PROCEDURE AggEmpName#
CREATE PROCEDURE AggEmpName ()
    BEGIN
       DECLARE STMT VARCHAR(50) DEFAULT 'No name'; 

       DECLARE PRE_Dept_ID INT;
       SET STMT= '';
       SET PRE_Dept_ID= 0;
    
       FOR MYC CURSOR FOR
          SELECT E_Name, Dept_ID
          FROM JANE.EMP
          ORDER BY Dept_ID, Birth, Emp_ID
          DO
             IF PRE_Dept_ID = 0 THEN
               SET PRE_Dept_ID = Dept_ID;
               SET STMT = STMT ||E_Name||';';
          ELSE
             IF PRE_Dept_ID = Dept_ID THEN
               SET STMT = STMT ||E_Name||';';
             ELSE
                INSERT INTO RESULT VALUES (PRE_Dept_ID,STMT);
                SET PRE_Dept_ID = Dept_ID;
                SET STMT= '';
                SET STMT = STMT ||E_Name||';';
              END IF;
          END IF;
       END FOR;
  INSERT INTO RESULT VALUES (PRE_Dept_ID,STMT);
  END# 

CALL AggEmpName () #

SELECT * FROM RESULT#

First, we create a RESULT table to store the result (employee names that belong to the same department). Inside the body of SQLPL(AggEmpName), we concatenate the employee names for the same department in STMT. When the department ID changes, we insert the concatenated string to the RESULT table with the department ID. Then we reset the STMT and start to concatenate the name for the next department. Here's our result:

DEPT_ID     EMP_NAMES

10          Jack;Lily;
20          Bob;Frank;Tom;Tom;
30          Chris;Jill;Jerry;Jerry;Allan;
   3 record(s) selected

With the LISTAGG function, this task becomes very simple. Consider the following query:

SELECT Dept_ID,
         LISTAGG(ALL E_Name, '; ')
         WITHIN GROUP (ORDER BY Birth, Emp_ID) 
         AS Name_List
FROM JANE.EMP
GROUP BY Dept_ID;

The syntax is easy to understand. In each department (GROUP BY Dept_ID), we aggregate the employee names, and order by Birth and Emp_ID, as shown in the following result:

Dept_ID Name_List
10 Jack; Lily
20 Bob; Frank; Tom; Tom
30 Chris; Jill; Jerry; Jerry; Allan

If you want to remove the duplicate names in each group, you can use the DISTINCT keyword, as shown in the following query:

SELECT Dept_ID,
       LISTAGG(DISTINCT E_Name, '; ')
         WITHIN GROUP (ORDER BY E_Name, Birth, Emp_ID) 
           AS Name_List
FROM JANE.EMP
GROUP BY Dept_ID;
Dept_ID Name_List
10 Jack; Lily
20 Bob; Frank; Tom
30 Allan; Chris; Jerry; Jill

As you might notice, it's very easy to use and maintain (when comparing with implementing your own application logic for doing so).

Tip: Don't forget to put E_NAME in the ORDER BY clause. If you forget to do so and run the SQL statement as follows...
SELECT Dept_ID,
    LISTAGG(DISTINCT  E_Name, '; ')
          WITHIN GROUP (ORDER BY  Birth, Emp_ID)
                AS Name_List
FROM JANE.EMP
GROUP BY Dept_ID;
...you'll get SQL error code -214:
-214, ERROR: AN EXPRESSION IN THE FOLLOWING POSITION, OR STARTING WITH 1 IN THE ORDER BY CLAUSE IS NOT

VALID. REASON CODE = 8

RC 8 DISTINCT is specified in the LISTAGG function and the first argument expression cannot be matched exactly
with the first sort key expression of the ORDER BY specified in the WITHIN GROUP clause. This reason code occurs 
only when clause-type is ORDER BY.

This rule is required to avoid non-deterministic results.

Now suppose that our table contains the following data (some columns are not shown here). Notice that there are two employees named Tom, each with a different year of birth.

E_Name Birth
Tom 1983
Bob 1986
Tom 1990
Jill 1980
The result of the previous query (if we remove the restriction of -214) would be non-deterministic. It could be either:
Jill, Tom, Bob
or:
Jill, Bob, Tom

Without the restriction of -214, Db2 can't be sure which row (Tom 1983 or Tom 1990) is duplicated and needs to be removed.

Comparing LISTAGG and XMLAGG

If you are familiar with XML, you might already know about the aggregate function in XML called XMLAGG. XMLAGG was introduced in Db2 9 for z/OS.

To achieve the same (or similar) results, the closest we can do with XMLAGG is:

SELECT DEPT_ID,
       XMLAGG(XMLTEXT(E_NAME || '; ')
           ORDER BY Birth, EMP_ID)     
       AS Name_List
FROM JANE.EMP
GROUP BY Dept_ID;

XMLAGG takes only the XML data type as input, so we need to apply XMLTEXT() on E_NAME, then concatenate with a ';' as a separator. Here's the result:

DEPT_ID     NAME_LIST
10          Jack; Lily;
20          Bob; Frank; Tom; Tom;
30          Chris; Jill; Jerry; Jerry; Allan;       
 3 record(s) selected
In this scenario, there are several notable differences between XMLAGG and LISTAGG:
  • We can't eliminate duplicates inside XMLAGG. For LISTAGG, we can use the DISTINCT keyword.
  • The return data type of XMLAGG is always XML. So depending on how the result is used, the output of XMLAGG might need to be cast to the data type that the application requires.
  • The VARCHAR output of LISTAGG is limited to 32704. XMLLAGG doesn't have such a size limitation.
  • If you pay attention to the output of XMLAGG and LISTAGG, you'll see a ; (separator) at the end of XMLAGG output(NAME_LIST), but not at the end of LISTAGG output. In the XMLAGG statement, the separator is added "manually" by using the concatenate operator (||), whereas the separator in LISTAGG is "built-in," and is not added if there are no more tokens.
End of change