Array values

An array is a structure that contains an ordered collection of data elements in which each element can be referenced by its index value in the collection.The cardinality of an array is the number of elements in the array. All elements in an array have the same data type.

An ordinary array has a defined upper bound on the number of elements, known as the maximum cardinality. Each element in the array is referenced by its ordinal position as the index value. If N is the number of elements in an ordinary array, the ordinal position associated with each element is an integer value greater than or equal to 1 and less than or equal to N.

Ordinary arrays are functionally equivalent to Hive arrays. For compatibility reasons, when you declare ordinary array columns in Hadoop tables, the Hive syntax for arrays is supported.

An associative array has no specific upper bound on the number of elements. Each element is referenced by its associated index value. The data type of the index value can be an integer or a character string but is the same data type for the entire array.

Associative arrays are functionally equivalent to Hive maps. For compatibility reasons, when you declare associative array columns in Hadoop tables, the Hive syntax for maps is supported.

The maximum cardinality of an ordinary array is not related to its physical representation, unlike the maximum cardinality of arrays in programming languages such as C. Instead, the maximum cardinality is used by the system at run time to ensure that subscripts are within bounds. The amount of memory required to represent an ordinary array value is not proportional to the maximum cardinality of its type.

The amount of memory required to represent an array value is usually proportional to its cardinality. When an array is being referenced, all of the values in the array are stored in main memory. Therefore, arrays that contain a large amount of data will consume large amounts of main memory.

The Array type is not supported for multi-row insert, update, or delete.

Restrictions on array columns in Hadoop tables

  • There is a combined limit of 32,763 ARRAY, ROW, and user defined type definitions (SQLSTATE 54035). For example, each instance of ARRAY or ROW consumes one of these definitions, therefore a column of an ARRAY of ROW types consumes two definitions. If you exceed the limit, you must drop a HADOOP table that contains a ROW or ARRAY defined as a column type, to create further ARRAY or ROW types.
  • An ARRAY type value can have a maximum length of 32,672 bytes after it is read from the data source for the Hadoop table. If a value exceeds this limit, a warning is logged in the Db2 Big SQL log file ($BIGSQL_DIST_VAR/logs/bigsql.log) and a NULL value is returned for this column. For more information about evaluating the length of an ARRAY value in a Hadoop table, see In-memory length of ARRAY values retrieved from Hadoop tables.
  • The following types are not permitted as ARRAY elements (SQLSTATE 42858):
    • XML
    • LONG VARCHAR
    • LONG VARGRAPHIC
    • Structured data types
    • User-defined types
    • ARRAY
  • The first column of a Hadoop table cannot be an ARRAY data type.
  • If you define a column as an ordinary ARRAY data type with an explicit cardinality, and the value from the source data for the Hadoop table contains more elements than the declared cardinality, the value is truncated to the defined cardinality when it is read.
  • Do not use expressions that result in an ARRAY data type in the following situations (SQLSTATE 428H2):
    • A top level SELECT list (except in the fullselect of an INSERT statement).
    • A GROUP BY clause.
    • An ORDER BY.
    • A HAVING clause.
    • A basic, quantified, BETWEEN or IN predicate.
    • The right side of an assignment to a local or global variable, or a cast.
  • Do not use a column that is defined as an ARRAY data type in constraints, primary keys, foreign keys, or indexes. (SQLSTATE 42962).
  • When you create a column of type ARRAY in the CREATE HADOOP TABLE statement, the ARRAY data type that is implicitly created cannot be used to define other objects, such as procedures, functions or variables. It is also not compatible with the ARRAY data types that are defined by using the CREATE TYPE statement. Values cannot be assigned from one to the other (SQLSTATE 42821).
  • Hadoop tables with an ARRAY data type can only be used in a SELECT statement and in an INSERT from a fullselect statement, where the source expression is an ARRAY type with an identical definition as the destination column. It must be an ARRAY with the same element type, index type and, if declared, the same maximum cardinality.

In-memory length of ARRAY values retrieved from Hadoop tables

As a guideline, the in-memory length of an ARRAY that is retrieved from a Hadoop table is 6 bytes, plus the length of its individual elements, plus an overhead per element of 3 to 6 bytes for an ordinary ARRAY, or of a number of bytes equal to the length of the key (in character) for associative ARRAYs. Use the following table to see how much space each element requires based on the ARRAY data type.
Table 1. Number of bytes of elements by data type
Element data type Number of bytes
BOOLEAN 4
TINYINT 4
SMALLINT 4
INTEGER 4
BIGINT 8
REAL 8
DOUBLE or FLOAT 8
DECIMAL(P,S) 5 + (Precision + 2) / 2
VARCHAR 5 + The actual numbers of characters in the string
STRING 5 + The actual numbers of characters in the string
DATE 8
TIMESTAMP 8
The following table shows examples of the maximum cardinality that can be declared for an ARRAY of a given data type so that it fits into the limit of 32,672 bytes for ARRAY values in Hadoop tables.
Table 2. Examples of maximum cardinality for ARRAY values in Hadoop tables
Array column definition Maximum cardinality(N)
INTEGER ARRAY[N] 3377
TIMESTAMP ARRAY[N] 2412
DECIMAL(8,2) ARRAY[N] 2111
VARCHAR(1024) ARRAY[N] 31