Row values

A row is a structure that contains an ordered sequence of up to 1,012 named fields, each with an associated data type, that can be referenced by their respective name. There cannot be two fields with the same name in a given row type.

ROWs are functionally equivalent to Hive STRUCTs. For compatibility reasons, the Hive syntax for STRUCT data types is supported when defining ROW columns in Hadoop tables.

Restrictions on ROW columns in Hadoop tables

  • You can define only 32,673 of ARRAY or ROW types for the Db2 Big SQL database (SQLSTATE 54035). For example, each instance of ARRAY or ROW consumes one of these entries, therefore a column of an ARRAY of ROW types consumes two entries. If you exceed the limit, you must drop a HADOOP table that contains a ROW or ARRAY defined as a column type.
  • A ROW type value can have a maximum length of 32,672 bytes after it is read from the Hadoop data source. If a value exceeds this limit, a warning is logged in the Db2 Big SQL log file ($BIGSQL_DIST_VAR/logs/bigsql.log) and a NULL value is returned for this column. For more information about evaluating the length of a ROW value in a Hadoop table, see In-memory length of ROW values retrieved from Hadoop tables.
  • The following types are not permitted as ROW fields (SQLSTATE 42858, 429C5):
    • XML
    • Structured data types
    • User-defined types.
    • ARRAY
    • ROW
  • The first column of a Hadoop table cannot be a ROW data type.
  • Do not use expressions that result in a ROW data type in the following situations (SQLSTATE 428H2):
    • A top level SELECT list (except in the fullselect of an INSERT statement).
    • A GROUP BY clause.
    • An ORDER BY clause.
    • A HAVING clause.
    • A basic, quantified, BETWEEN or IN predicate.
    • The right side of an assignment to a local or global variable, or a cast.
  • Do not use a column that is defined as a ROW data type in constraints, primary keys, foreign keys, or indexes. (SQLSTATE 42962).
  • When you create a column of type ROW in the CREATE HADOOP TABLE statement, the ROW data type that is implicitly created cannot be used to define other objects, such as procedures, functions or variables. It is also not compatible with the ROW data types that are defined by using the CREATE TYPE statement. Values cannot be assigned from one to the other (SQLSTATE 42821).
  • Hadoop tables with a ROW data type can only be used in a SELECT statement and in an INSERT from a fullselect statement, where the source expression is a ROW type with an identical definition as the destination column. It must be a ROW with the same element type, index type and, if declared, the same maximum cardinality.

In-memory length of ROW values retrieved from Hadoop tables

As a guideline, the in-memory length of a ROW that is retrieved from a Hadoop table is 6 bytes, plus the sum over each of its individual fields with the following descriptions:
  • The length of the field, based on its data type
  • The length of the field name, in characters
  • An overhead of 2 bytes per field
Use the following table to see how much space each field requires based on its data type.
Table 1. Number of bytes of elements by data type
Element data type Number of bytes
DECIMAL(P,S) 5 + (Precision + 2) / 2
VARCHAR 5 + The actual numbers of characters in the string
STRING 5 + The actual numbers of characters in the string