LISTAGG: The most important function in Db2 12 for z/OS function level 501
EricRadzinski 060000F2X2 Comment (1) Visits (6912)
This post is adapted from an a
In Db2 12 for z/OS continuous delivery, new capabilities and enhancements are delivered in the service stream as soon as they are ready. The LISTAGG function, which is used to aggregate a set of string values within a group into a single string, is the most important function in Db2 12 for z/OS function level 501. In this article, we'll introduce this new function with working examples, and then we'll compare it to a similar aggregate function, XMLAGG.
Software requirements and prerequisites
To use the new LISTAGG function:
Note: In addition to applying the APAR, you must explicitly activate the function level by issuing the ACTIVATE command. For example:
-db2a ACTIVATE FUNCTION LEVEL (V12R1M501)
As mentioned above, the LISTAGG function is used to aggregate a set of string values within a group into a single string by appending the string-expression values based on the order that's specified in the 'WITHIN GROUP' clause.
The following diagram shows the syntax for LISTAGG:
The input, string-expression must be a string (character, graphic, or binary), a numeric data type, or a LOB. The numeric argument and LOBs are supported via implicit casting (numeric to VARCHAR; CLOB to VARCHAR of max length of 32704; DBCLOB to VARGRAPHIC of max length of 16352; BLOB to VARBINARY of max length of 32704).
The separator can be a literal, special register, variable, or an expression that is based on literals, special registers, or variables, as long as the expression does not include a non-deterministic function or a function that takes an external action.
The result data type of LISTAGG is determined by the input data type summarized as follows:
Here's a simple example: suppose we have a table called JANE.EMP that contains the employee name, employee ID, department ID, and year of birth of the employee.
DROP TABLE JANE.EMP;
CREATE TABLE JANE.EMP
INSERT INTO JANE.EMP VALUES(10, 'Jack', 0012, 1983);
Suppose that we'd like to output all employees' names under the same department into one row with ascending order according to their employee ID and birth date from this table. How can we achieve this?
One option is to first define a cursor to query employees' names and department from the table, then fetch data from the cursor and use application logic to find out the employee from the same department and then concatenate their name. This is absolutely doable, but might need some work.
One possible implementation can be something similar to the following SQLPL (note that # represents a SQL terminator here):
DROP TABLE RESULT#
DECLARE PRE_Dept_ID INT;
CALL AggEmpName () #
SELECT * FROM RESULT#
First, we create a RESULT table to store the result (employee names that belong to the same department). Inside the body of SQLPL(AggEmpName), we concatenate the employee names for the same department in STMT. When the department ID changes, we insert the concatenated string to the RESULT table with the department ID. Then we reset the STMT and start to concatenate the name for the next department. Here's our result:
With the LISTAGG function, this task becomes very simple.
The syntax is easy to understand. In each department (GROUP BY Dept_ID), we aggregate the employee names, and order by Birth and Emp_ID, as shown in the following result:
If you want to remove the duplicate names in each group, you can use the DISTINCT keyword, like this:
As you might notice, it's very easy to use and maintain (when comparing with implementing your own application logic for doing so).
Tip: Don't forget to put E_NAME in the ORDER BY clause. If you forget to do so and run the SQL statement as follows...
...you'll get the following SQLCODE:
-214, ERROR: AN EXPRESSION IN THE FOLLOWING POSITION, OR STARTING WITH 1 IN THE ORDER BY CLAUSE IS NOT
VALID. REASON CODE = 8
RC 8 DISTINCT is specified in the LISTAGG function and the first argument expression cannot be matched exactly with the first sort key expression of the ORDER BY specified in the WITHIN GROUP clause. This reason code occurs only when clause-type is ORDER BY.
This rule is mandatory to avoid non-deterministic results.
Now let's have a look at the following example. Suppose our table contains the following data (some columns are not shown here). Notice that there are two employees named Tom, each with a different year of birth.
The result of query 3 (if we remove the restriction of -214) will be non-deterministic. It could be either:
Without restriction -214, Db2 for z/OS can't be sure which row (Tom 1983 or Tom 1990) is duplicated and needs to be removed.
Comparing LISTAGG and XMLAGG
If you are an XML fan, you probably know that there's an aggregate function in XML called XMLAGG. XMLAGG was introduced in Db2 9 for z/OS (almost 10 years ago).
To achieve the same (or similar) results, the closest we can do with XMLAGG is:
When comparing XMLAGG and LISTAGG in this scenario, there are a few differences:
Thanks to Swetha Sheth for her comments and assistance with this article.
Jane Man is a Senior Software Engineer specializing in modern application development on Db2 for z/OS at IBM's Silicon Valley Lab.
Xiao Bo Wang is a Senior Software Engineer on the Db2 for z/OS RDS team at IBM's China Development Lab.