Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Exchange data using arrays in SQL PL

New capabilities for exchanging data between applications and stored procedures

Gustavo Arocena (gustavo@ca.ibm.com), Senior Developer, IBM Toronto Lab
Gustavo Arocena
Gustavo Arocena is a Senior Developer in the DB2 SQL Compiler area. He joined the IBM Toronto Lab in 1998 and is currently responsible for SQL Procedures and SQL Parser development. Gustavo holds a Master's degree in Computer Science from the University of Toronto in the area of database query languages.
Nailah Bissoon (nbissoon@ca.ibm.com), Performance Quality Assurance Team Leader, IBM
Nailah Bissoon photo
Nailah Ogeer Bissoon, M.Sc. is currently a member of the DB2 Performance Quality Assurance team. Nailah joined the IBM Toronto Lab in 2004 and has worked on several DB2 performance benchmark publications since then. Her tasks also include assisting customers in performance-sensitive areas and ensuring that DB2 continues to be a performance leader among other Database Management Systems.

Summary:  IBM DB2 9.5 for Linux, UNIX, and Windows introduces support for array data types in SQL Procedural Language (SQL PL). In addition to being useful as a data structure to develop algorithms, arrays are particularly useful in SQL PL as a vehicle for exchanging collections of data between applications and stored procedures and between stored procedures and relational statements. This article gives you an overview of arrays in SQL PL and includes examples that illustrate the different operations on arrays, the different ways of creating array values, and the use of arrays on the client side. You will also see a brief comparison of DB2's arrays with the VARRAY data type supported in Oracle.

[17 Mar 2011: This article was updated to include an additional example code listing, Listing 10, and a section that discusses the performance impact of arrays (see the "Performance study" section). --Ed.]

Date:  17 Mar 2011 (Published 24 Oct 2007) PDF:  A4 and Letter (92KB | 18 pages)Get Adobe® Reader®
Also available in:   Chinese

Activity:  22100 views
Comments:  

Introduction

Much of the power of the relational model comes from the fact that the relational table is the single data structure that represents homogeneous collections of data. Tables are a convenient way to represent both persistent data and query results.

However, there are circumstances where using tables to represent collections results in bulky, and possibly inefficient, code. Think, for example, of an application that calls a stored procedure with the list of items that a customer has just purchased. Without a collection type such as arrays, the application would need to store the list of items on a table prior to calling the stored procedure, have the stored procedure read the list by selecting from the table and then delete the list.

To avoid the overhead associated with using a table, the application could encode the list of items in some string format and have the procedure decode the string to extract the items. Compared to simply passing on the list represented as an array, both workarounds are cumbersome and inefficient.

DB2 9.5 introduces support for array data types in SQL PL to help with application development issues like those described above. Given that arrays are a staple in all major programming languages, the concept of array doesn't need much explanation. Instead, take a look at a simple example that illustrates arrays in SQL PL. Listing 1 shows an array data type and a procedure that uses it.


Listing 1. An array data type and a procedure that uses it
1.  CREATE TYPE phonenumbers AS VARCHAR(12) ARRAY[1000]
2.  -- Procedure find_customers searches for numbers in
3.  -- numbers_in that begin with the given area_code,
4.  -- and reports them in numbers_out.
5.  -- Phone numbers are strings of the form 416-413-9394
6.  CREATE PROCEDURE find_customers(
7.         IN numbers_in phonenumbers,
8.         IN area_code CHAR(3),
9.         OUT numbers_out phonenumbers)
10.  BEGIN
11.    DECLARE i, j, max INTEGER;
12.    SET i = 1;
13.    SET j = 1;
14.    SET numbers_out = NULL;
15.    SET max = CARDINALITY(numbers_in);
16.    WHILE i <= max DO
17.      IF substr(numbers_in[i], 1, 3) = area_code THEN
18.        SET numbers_out[j] = numbers_in[i];
19.        SET j = j + 1;
20.      END IF;
21.      SET i = i + 1;
22.    END WHILE;
23.  END

Line 1 in Listing 1 shows a create type statement that creates a type named phonenumbers. This type has a maximum cardinality of 1000 -- meaning that values of this type can have between 0 and 1000 elements, and its element type is VARCHAR(12). Procedure find_customers, takes two parameters of type phonenumbers (one IN and one OUT), and populates the OUT parameter with the input numbers that begin with a given area code. As seen in the example, array sub-indexing is performed using the familiar square bracket syntax, and the sub-index of the first element is 1. Function CARDINALITY, used in line 15, returns the number of elements in the input array. The WHILE loop scans the input array; when an element begins with the required area code, it is added to the output array.

The maximum cardinality of an array type is used by DB2 at runtime to ensure that subscripts are within bounds. In this example, DB2 alerts you to a runtime error when a sub-index used on a variable of type phoneList is less than 1 or greater than 1000. If, when defining an array type, there is no clear value for its maximum cardinality, it can be omitted. This step is shown in the listing below:

CREATE TYPE unboundedPhoneList AS VARCHAR(12) ARRAY[];

When the maximum cardinality in a type definition is omitted, the sub-indexes on values of that type can range from 1 to the maximum positive value of integer type. However, the memory allocated for an array is based on its cardinality, and not on the maximum cardinality of its type. Therefore, omitting the maximum cardinality does not have any impact on the amount of allocated memory. Memory management is discussed in greater detail in a later section of this article.


Adding elements to an array

In the example of Listing 1, new elements are always appended at the end, and therefore numbers_out's cardinality increases by 1 each time an element is added as shown in line 18. In general, however, the elements of an array can be set in any order.

When an array variable is NULL or when the element being assigned is in a position beyond the current cardinality, all the elements between the old last element and the new one are implicitly assigned the NULL value. For example, assume that variable phoneList, of type phonenumbers is currently NULL. After executing the statement

SET phoneList[10] = '416-413-9394';

the cardinality of phoneList is 10, and elements in positions 1 to 9 are NULL. Elements in positions 11 and up do not exist (note that this is not the same as being NULL). Therefore, a statement such as

SET phone = phoneList[11];

creates a runtime error, as it refers to a non-existing element.

If you now executed the statement

SET phoneList[12] = '416-413-7727';

the cardinality of phoneList would be 12, and the statement

SET phone = phoneList[11];

would set variable phone to NULL.


Using array constructors

It is not necessary to initialize array elements one at a time, as in Listing 1. When the values of all the elements in the array are known, an array can be created in a single operation with the help of array constructors. There are two types of array constructors:

  • Constructor by enumeration
  • Constructor by query

Constructors by enumeration allow you to specify the values of all the elements as comma-separated scalar values, as shown in the following two listings. The values in the following listing are all literals, which is the most common case when initializing arrays. In general, each value in the enumeration can be an arbitrary scalar SQL expression.

SET phoneList = ARRAY['416-413-9394', '416-413-7727', '416-413-6254'];

Listing 2 shows an array constructor with four elements with values given by the NULL constant, a variable, a literal and a scalar sub-query, respectively.


Listing 2. Array constructor with four elements
SET phoneList = ARRAY[NULL, phone, '416-413-7727',
                      (SELECT phone
                       FROM authors
                       WHERE author_id = 100)];

An array constructor with no elements represents an empty array (that is, an array with cardinality equal to 0). Empty arrays should not be confused with NULL arrays or with arrays containing just NULL values. Each of the statements in Listing 3 sets variable phoneList to a different value. Please pay careful attention to the comment beside each statement.


Listing 3. NULL and empty array values
SET phoneList = NULL; /* NULL value. */
SET phoneList = ARRAY[]; /* empty array */
SET phoneList = ARRAY[NULL]; /* array with one NULL element */

Another way of building an array in a single operation is by using a constructor by query, which creates an array with the values returned by a single-column sub-select, as shown in the listing below:

SET phoneList = ARRAY[SELECT phone FROM authors WHERE city = 'Toronto'];

If the sub-select returns an empty table, the result of the constructor is the empty array. And, as expected, when the number of elements specified in an array constructor exceeds the array's maximum cardinality, DB2 notifies you of an error.


Allocating memory to arrays

Arrays in SQL PL are allocated and reallocated dynamically. When the value of an array variable is NULL, there is no memory allocated for it. Contrast this to arrays in C where space for the whole array is allocated up front.

When an array variable is assigned an initial value, DB2 allocates enough memory to hold that value. Later, If the variable is assigned a new value that doesn't fit in the allocated memory, DB2 implicitly performs a reallocation.

The amount of memory allocated for a variable depends on a few factors. If the maximum possible size of the array (given its type) is not deemed too big, DB2 allocates the space for maximum size the first time a value is assigned to a variable. This guarantees that no reallocation is necessary regardless of how many elements are added to the array later on and regardless of the size of those elements.

How big is too big is something that DB2 decides based on the availability of free memory. If the maximum possible size of the array is deemed too big, DB2 only allocates enough space to hold the initial value, and possibly extra space to allow for some growth.


Trimming arrays

In the examples so far, you have seen different ways of putting arrays together. SQL PL also provides the TRIM_ARRAY function, to remove elements from arrays. The following listing shows TRIM_ARRAY being used to remove the last two elements from phoneList:

SET phoneList = TRIM_ARRAY(phoneList, 2);

The value of the second argument to TRIM_ARRAY must be a value between 1 and the cardinality of the first argument.


Turning tables into arrays

At the beginning of this article, I argued that the main motivation behind array support in SQL PL is to provide a simpler alternative to relational tables for representing relatively small collections. It is often the case, however, that the initial value for an array is obtained by extracting values from a table. Similarly, very often, the contents of an array needs to be turned into rows in a table, either to be stored or to be joined with other tables. To make such conversions between arrays and tables possible, DB2 provides the new ARRAY_AGG function and UNNEST operator.

The ARRAY_AGG function is a column function that aggregates values into an array.

Suppose you have this table:


Table 1. Customer table
CUST_IDNAMEPHONELOCATION
17Joe Cat416-305-3745Toronto
113Tom Dog905-305-3747Ajax
716Sam Bear416-305-3746Toronto
5Jill Bird905-723-1662Markham
221Kim Frog416-478-9683Toronto


Assuming you have the customers table shown in Table 1, the statement below sets variable phones to an array with all the phone numbers of customers in Toronto:

SET phones = (SELECT ARRAY_AGG(phone) FROM customers WHERE location = 'Toronto');

The above statement generates the phones in an arbitrary order. If the order is relevant, an ORDER BY clause can be specified in ARRAY_AGG, as illustrated in the listing below:

SET phones = (SELECT ARRAY_AGG(phone ORDER BY name) 
   FROM customers WHERE location = 'Toronto');
 

Note that an ORDER BY clause in the SELECT statement would not have any effect on the order of the array elements, as it would apply to the rows returned by the statement. In this example, there is only one such row.

Simple uses of ARRAY_AGG can be expressed equivalently using array constructors by query. However, the ARRAY_AGG function can also be used in statements that return more than one row and/or more than one column.

For example, the statement in Listing 4 defines a cursor that creates arrays of phones and IDs for customers in each city. The result of the statement is shown in Table 2.


Listing 4. Statement to create arrays of phones and IDs for customers in each city

DECLARE myCursor CURSOR FOR
  SELECT location,
         ARRAY_AGG(phone ORDER BY name) AS PHONES,
         ARRAY_AGG(cust_id) AS IDS
  FROM customers
  GROUP BY location

Table 2 shows the result of the statement in Listing 4.


Table 2. Statement results
LOCATIONPHONEIDs
Toronto
1: 416-305-37451: 17
2: 416-305-37462: 716
3: 416-305-37473: 221
Ajax1: 905-305-37471: 113
Markham1: 905-723-16621: 5

Turning arrays into tables

Once a collection of data has been computed, it is often necessary to use this collection in SQL statements, for example to insert the data into a table or to join it with data stored in tables. The UNNEST operator in SQL PL makes such an operation possible.

In its simplest form, UNNEST takes an array and produces a one-column table. The statement in Listing 5 selects the customer IDs for customers with a phone number in phoneList.


Listing 5. Selecting customer IDs
SELECT cust_id
FROM customers, UNNEST(phoneList) as T(phone)
WHERE customers.phone = T.phone

In its most general form, UNNEST can take several arrays as arguments, and it can produce an additional column to reflect the order imposed by the sub-indexes. For the query in Listing 6, assume that productIds and productQs are two arrays containing the ids and quantities for the line items in an order. The line item number is reflected in the array sub-index. The UNNEST in Listing 6 creates a three-column table which is inserted into the order_items table. The WITH ORDINALITY clause causes UNNEST to generate an additional column of type integer that contains the position associated with each element.


Listing 6. WITH ORDINALITY clause

INSERT INTO order_items
       (SELECT orderId, T.index, T.pid, T.pq
        FROM UNNEST(productIds, productQs)
             WITH ORDINALITY as T(pid, pq, index)

To wrap up the coverage of conversion between arrays and tables, revisit procedure phonenumbers from the first example. The version presented in Listing 1 uses a loop to process the input array and populate the output array. However, it turns out that all the logic in the procedure can be expressed as a single SQL statement that uses UNNEST and ARRAY_AGG, as shown in Listing 7:


Listing 7. Procedure find_customers revisited
1. CREATE TYPE phonenumbers AS VARCHAR(12) ARRAY[1000]
2. -- Procedure find_customers searches for numbers in
3. -- numbers_in that begin with the given area_code,
4. -- and reports them in numbers_out.
5. -- Phone numbers are strings of the form 416-413-9394
6. 
7. CREATE PROCEDURE find_customers(
8. IN numbers_in phonenumbers,
9. IN area_code CHAR(3),
10. OUT numbers_out phonenumbers)
11. BEGIN
12. SET numbers_out =
13.     (SELECT ARRAY_AGG(T.num)
14.      FROM UNNEST(numbers_in) AS T(num)
15.      WHERE substr(T.num, 1, 3) = area_code);
16. END

The last example proves that while arrays allowed us to pass a collection of values into a stored procedure efficiently and concisely, once inside the procedure, it may still be beneficial to turn the array into a table to exploit the powerful set-at-a-time semantics of SQL to process the array elements.


Using arrays on the client side

The examples so far demonstrate arrays in SQL PL. On the client side, arrays are supported in Java Database Connectivity (JDBC), command line interface (CLI) and in the Command Line Processor. Procedures with array parameters can be called from any of these interfaces. Listing 8 shows a CALL statement issued from the CLP to the procedure find_customers defined earlier, and the output generated by the CLP.


Listing 8. Calling procedures with array parameters from the CLP
db2 CALL find_customers(ARRAY['416-305-3745', '905-414-4565', '416-305-3746'], '416', ?)

  Value of output parameters
  --------------------------
  Parameter Name  : NUMBERS_OUT
  Parameter Value : [416-305-3745,
                     416-305-3746]

  Return Status = 0

Listing 9 shows a Java snippet that illustrates how to call procedure find_customers from a JDBC application.


Listing 9. Calling procedure find_customers from a JDBC application


import java.sql.*;

public class array_sample
{
  public static void main(String argv[])
  {
    com.ibm.db2.jcc.DB2Connection con = null;
    try
    {
      // Connect to the db
      Class.forName("com.ibm.db2.jcc.DB2Driver").newInstance();
      con = (com.ibm.db2.jcc.DB2Connection)
                DriverManager.getConnection("jdbc:db2:test");

      // Create an array of strings
      String[] phones=new String[10];
      phones[0]="416-305-3745";
      phones[1]="905-414-4565";
      phones[2]="416-305-3746";

      // Create an SQL ARRAY value of type "ARRAY of VARCHAR"
      java.sql.Array phonesArrayIn =con.createArrayOf("VARCHAR", phones);

      // Prepare the call statement
      CallableStatement callStmt =
            con.prepareCall("CALL find_customers(?, ?, ?)");

      // Set IN parameters
      callStmt.setArray(1, phonesArrayIn);
      callStmt.setString(2, "416");

      // Register OUT parameter
      callStmt.registerOutParameter(3, java.sql.Types.ARRAY);

      // Call the procedure
      callStmt.execute();

      // Get value of OUT parameter
      java.sql.Array phonesArrayOut = callStmt.getArray(3);
      String[] phonesOut = (String [])phonesArrayOut.getArray();

      // Print result
      System.out.println("Result:");
      for(int i = 0; i < phonesOut.length; i++)
      {
        System.out.println(phonesOut[i]);
      }

      con.close();
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }
  }
}

Listing 10 shows a DB2 CLI snippet that illustrates how to call procedure find_customers from a DB2 CLI application.


Listing 10. Calling procedure find_customers from a CLI application


#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <sqlcli1.h>
#include <sql.h>
#include <sqlenv.h>

int rc = 0;

int main(int argc, char *argv[])
{
  SQLHANDLE henv; /* environment handle */
  SQLHANDLE hdbc; /* connection handle */
  SQLHSTMT stmt; /* statement handle*/

  char dbAlias[SQL_MAX_DSN_LENGTH + 1];
  char user[10];
  char pswd[10];

  if (argc == 1)
  {
    strcpy(dbAlias ,"TEST");
    strcpy(user, "");
    strcpy(pswd, "");
  }
  else 
  {
    printf("Incorrect usage \n");
    return -1; 
  }
  printf("Connecting to %s\n",dbAlias);

  SQLRETURN cliRC = SQL_SUCCESS;

  /* allocate an environment handle */
  cliRC = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv);
  if (cliRC != SQL_SUCCESS)
  {
    printf("\n--ERROR while allocating the environment handle.\n");
    return 1;
  }

  /* allocate a database connection handle */
  cliRC = SQLAllocHandle(SQL_HANDLE_DBC, henv, &hdbc);
  if (cliRC != SQL_SUCCESS)
  {
    printf("\n--ERROR while allocating the database handle.\n");
    return 1;
  }
  printf("\n  Connecting to %s...", dbAlias);

  /* connect to the database */
  cliRC = SQLConnect(hdbc,
                     (SQLCHAR *)dbAlias,
                     SQL_NTS,
                     (SQLCHAR *)user,
                     SQL_NTS,
                     (SQLCHAR *)pswd,
                     SQL_NTS);
  printf("  Connected to %s.\n", dbAlias);

  /* allocate a statement handle */
  cliRC = SQLAllocHandle(SQL_HANDLE_STMT, hdbc, &stmt);
  if (cliRC != SQL_SUCCESS)
  {
    printf("\n--ERROR while allocating the statement handle.\n");
    return 1;
  }

  int i=0;
  // Declare local C variables to store the input and output to the stored procedures 
  char numbers_in[1000][13]={0};
  char numbers_out[1000][13];
  char area_code[3];

  //declare IPD/APD attributes
  SQLHANDLE hIPD;
  SQLHANDLE hAPD;
  // Define the size of the arrays
  SQLLEN size = 1000;

  sprintf(area_code,"416");
  printf("%s\n",area_code);
  sprintf(numbers_in[0],"416-305-3745");
  sprintf(numbers_in[1],"905-414-4565");
  sprintf(numbers_in[2],"416-305-3746");

  //Bind input/output parameters
  if (SQLBindParameter(stmt, ++i, SQL_PARAM_INPUT, SQL_C_CHAR,
      SQL_VARCHAR, 12, 0, &numbers_in, 13, NULL)!=SQL_SUCCESS || 
      SQLBindParameter(stmt, ++i, SQL_PARAM_INPUT, SQL_C_CHAR,               
      SQL_CHAR, 3, 0, area_code, 3, NULL)!=SQL_SUCCESS || 
      SQLBindParameter(stmt, ++i, SQL_PARAM_OUTPUT, SQL_C_CHAR, 
      SQL_VARCHAR, 12, 0, &numbers_out, 13, NULL)!=SQL_SUCCESS)
  {

    printf ("Binding unsuccessful \n");
    return 1;
  }
  
  //GetStmtAttr for IPD/APDs        
if(SQLGetStmtAttr(stmt,SQL_ATTR_IMP_PARAM_DESC,&hIPD,SQL_IS_INTEGER,NULL)!=SQL_SUCCESS ||
  SQLGetStmtAttr(stmt,SQL_ATTR_APP_PARAM_DESC,&hAPD,SQL_IS_INTEGER,NULL)!=SQL_SUCCESS)
  {
     printf ("GetStmtAttr unsuccessful\n");
     return 1;
  }
   // For input array parameters, define the SQL_DESC_CARDINALITY attribute of IPD 
  if( SQLSetDescField(hIPD,1, SQL_DESC_CARDINALITY,(SQLPOINTER)1000,SQL_IS_INTEGER) 
  !=SQL_SUCCESS || 
   // For output array parameters, define the SQL_DESC_CARDINALITY field for APD 
      SQLSetDescField(hAPD,3, SQL_DESC_CARDINALITY,(SQLPOINTER)1000, SQL_IS_INTEGER) 
      !=SQL_SUCCESS ||
   // For all array parameters, define the SQL_DESC_CARDINALITY_PTR field for APD 
      SQLSetDescField(hAPD,1,SQL_DESC_CARDINALITY_PTR,&size,SQL_IS_INTEGER) 
      !=SQL_SUCCESS || 
     SQLSetDescField(hAPD,3, SQL_DESC_CARDINALITY_PTR, &size, SQL_IS_INTEGER) 
      != SQL_SUCCESS)
  {  
     printf ("SQLSetDescField unsuccessful\n");
     return 1;
  }
  if (SQLPrepare(stmt, (SQLCHAR*)"CALL find_customers(?, ?, ?)", SQL_NTS)!= SQL_SUCCESS)
  {
     printf ("SQLPrepare unsuccessful\n");
     return 1;
  }
  rc = SQLExecute(stmt);
  printf ("SQLExec: rc %d %d \n",rc,SQL_SUCCESS );

  cliRC = SQLFreeHandle(SQL_HANDLE_STMT,stmt);

  /* free the statement handle */
  cliRC = SQLFreeStmt(stmt, SQL_CLOSE);
  return 0;
}



Comparing DB2's arrays with Oracle's VARRAYs

Developers migrating applications from Oracle to DB2 will find the new array support in DB2 particularly useful, as arrays are commonly used in Oracle PL/SQL. This appendix outlines the main differences between the ARRAY data type in DB2 and Oracle's VARRAY, and how the most common operations on VARRAYs can be mapped to ARRAYs.

In DB2, variables of type array are like variables of any other SQL data types. And, like with other data types, DB2 provides some functions that take arrays as parameters and/or return arrays as results, such as CARDINALITY and TRIM_ARRAY.

In Oracle, VARRAY is an object type. VARRAYs are manipulated using methods, instead of functions. Listing 11 illustrates this difference between DB2 and Oracle with a statement that trims the last two elements from myArray and a statement that saves myArray's cardinality into myCard.


Listing 11. Trimming array elements in DB2 and in Oracle
DB2:
myArray = TRIM_ARRAY(myArray, 2);
myCard = CARDINALITY(myArray);

Oracle:
myArray.TRIM(2);
myCard = myArray. COUNT;

As you saw in Section 1, DB2 implicitly allocates memory for arrays when they are initialized with a non-NULL value or when they grow past their current size. On the other hand, Oracle requires explicit allocation operations prior to adding elements to an array. Listing 12 illustrates this difference.


Listing 12. Appending elements in DB2 and in Oracle
DB2:
myArray[CARDINALITY(myArray) + 1] = 1000;

Oracle
myArray.EXTEND;
myArray(myArray,COUNT + 1) = 1000;

Listing 12 also illustrates the different syntax for sub-indexing in DB2 (square brackets) and in Oracle (parentheses).

Oracle applications involving arrays often make use of bulk bind operations, which execute SQL DML statements repeatedly, for each of the elements in an array. In Oracle, these operations are FORALL and BULK COLLECT. DB2 doesn't provide direct equivalents to these operations, but the most common uses of FORALL and BULK COLLECT can be efficiently mapped to statements involving UNNEST and ARRAY_AGG, respectively.

The FORALL statement repeats another statement for each of the elements in an array. Listing 13 shows an example of a FORALL statement that repeats an INSERT for each of the elements in arrays departments.


Listing 13. FORALL statement
FORALL i IN depts.FIRST..depts.LAST
INSERT INTO emp(deptno) values(depts(i));

Listing 14 shows the DB2 equivalent using UNNEST.


Listing 14. Mapping FORALL to UNNEST

INSERT INTO emp (deptno)
      (SELECT T.deptno FROM UNNEST(depts))

Let's now see how Oracle's BULK COLLECT can be mapped to ARRAY_AGG. BULK COLLECT is not a statement on its own, but an optional clause that can be applied to SELECT INTO and FETCH statements.

An Oracle SELECT INTO statement that includes a BULK COLLECT clause is similar to a SELECT with ARRAY_AGG in DB2. Regular SELECT INTO statements can return only one row. When the BULK COLLECT clause is included, the statement can return multiple rows; the variable in the INTO clause must be an array, and each row is stored in an array element.

Listing 15 illustrates the use of BULK COLLECT.


Listing 15. BULK COLLECT clause


DECLARE
  names NameList;
BEGIN
  SELECT ename BULK COLLECT INTO names 
  FROM emp WHERE sal > 1000;
END;

Listing 16 shows the DB2 equivalent.


Listing 16. Mapping BULK COLLECT to ARRAY_AGG
BEGIN
  DECLARE names NameList;
 
  SET names = (SELECT ARRAY_AGG(ename) FROM emp WHERE sal > 1000);
END;

Similarly, when a FETCH statement includes the BULK COLLECT clause, the variable used in the FETCH must be an array. ORACLE implicitly repeats the FETCH until all the rows from the cursor are consumed and stores each row in an element of the array.

In addition to the differences mentioned so far, Oracle supports storing VARRAYs as a column type, and supports the creation of arrays of records and arrays of arrays. These features are not yet supported in DB2.


Performance study

To measure the performance impact of arrays, a 900GB database was used and an OLTP (Online Transactional Processing) workload was exercised. The workload consisted of approximately 60% select statements and 40% insert, update, and delete statements. Stored procedures and a DB2 CLI application was used to drive the workload. Two distinct sets of stored procedures were implemented.

For the first set of stored procedures, DB2 arrays were used when possible. In the second set of stored procedures, strings were used in place of arrays, and cursors were heavily used to fetch rows from multi-row result sets. For each row retrieved from the cursor call, the element was appended to the string along with a delimiter. A user-defined function was used to break up the string into individual elements when necessary. Listing 17 shows an example of a CALL statement issued from the CLP to procedure alternate_find_customers and the output generated by the CLP. A DB2 CLI application was implemented for each version of the stored procedure.


Listing 17. Calling procedures defined using varchars instead of arrays from the CLP
db2 CALL find_customers(ARRAY['416-305-3745;905-414-4565;416-305-3746', '416', ?)

  Value of output parameters
  --------------------------
  Parameter Name  : NUMBERS_OUT
  Parameter Value : 416-305-3745;416-305-3746

  Return Status = 0

DB2 array data types yielded significantly higher performance compared to varchar representations of arrays. We measured a ~70% improvement when taking both throughput and user CPU utilization into consideration (normalized throughput in the chart in Figure 1 below). Most of the performance improvement stems from the simplification of the stored procedures and DB2 CLI application code, namely the removal of cursor definitions and accesses, as well as removal of extra local variables and user-defined functions that are not needed when using DB2 arrays. This results in a reduction in user CPU utilization, which translates into throughput gains.


Figure 1. Normalized throughput
Bar chart comparing performance of DB2 arrays and varchar representation of arrays, with Varchar representations of arrays having a value of 100 normalized throughput, versus DB2 arrays having a value of 170.

Conclusion

This article presents an overview of the recently introduced support for array data type in DB2 V9.5. Like arrays in other programming languages, DB2 arrays can be used as an auxiliary data structure when coding algorithms in SQL PL. But more importantly, arrays are a very convenient representation for collections of data that flow between an application and stored procedures or between stored procedures.

In addition to the basic operations on arrays (such as sub-indexing, cardinality and trimming), DB2 has integrated arrays into the relational model in such a way that a result set can be turned into an array (for example, when it needs to be passed as a parameter to another stored procedure), and an array can be turned into a table. The latter conversion allows the data in the array to be joined with data in other tables, and allows the set-at-a-time semantics of SQL to be applied to collections represented as arrays.

JDBC and DB2 CLI array support is also available, and there is evidence to suggest that there can be performance gains if stored procedures and applications are updated to use the new DB2 array data types.


Acknowledgements

Thanks to Serge Rielau for reviewing an early draft of this article.


Resources

Learn

Get products and technologies

Discuss

About the authors

Gustavo Arocena

Gustavo Arocena is a Senior Developer in the DB2 SQL Compiler area. He joined the IBM Toronto Lab in 1998 and is currently responsible for SQL Procedures and SQL Parser development. Gustavo holds a Master's degree in Computer Science from the University of Toronto in the area of database query languages.

Nailah Bissoon photo

Nailah Ogeer Bissoon, M.Sc. is currently a member of the DB2 Performance Quality Assurance team. Nailah joined the IBM Toronto Lab in 2004 and has worked on several DB2 performance benchmark publications since then. Her tasks also include assisting customers in performance-sensitive areas and ensuring that DB2 continues to be a performance leader among other Database Management Systems.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=264536
ArticleTitle= Exchange data using arrays in SQL PL
publish-date=03172011
author1-email=gustavo@ca.ibm.com
author1-email-cc=
author2-email=nbissoon@ca.ibm.com
author2-email-cc=