CREATE TABLE (HBASE) statement

The CREATE TABLE (HBASE) statement defines an HBase table for the Hadoop environment. The definition must include its name and the names and attributes of its columns. The definition can include other attributes of the table, such as its primary key or check constraints.

Invocation

This statement can be embedded in an application program or issued by using dynamic SQL statements. It is an executable statement that can be dynamically prepared only if DYNAMICRULES run behavior is in effect for the package (SQLSTATE 42509).

Authorization

The authorization ID of the statement must have either DBADM authority, or must have CREATETAB authority in combination with the following additional authorization:
  • One of the following privileges or authorities:
    • USE privilege on the table space
    • SYSADM authority
    • SYSCTRL authority
  • Plus one of these privileges or authorities:
    • IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the table does not exist
    • CREATEIN privilege on the schema, if the schema name of the table refers to an existing schema
    • SCHEMAADM authority on the schema, if the schema name of the table refers to an existing schema
To define a foreign key, the authorization ID of the statement must have one of the following privileges for the parent table:
  • REFERENCES privilege on the table
  • REFERENCES privilege on each column of the specified parent key
  • CONTROL privilege on the table
  • SCHEMAADM authority on the schema, if the schema name of the parent table refers to an existing schema
  • DBADM authority

Syntax

Read syntax diagramSkip visual syntax diagram CREATE EXTERNAL HBASE TABLE IF NOT EXISTS table-name element-list1COMMENTstring-constantTBLPROPERTIES(,string-constant1=string-constant2)COLUMN MAPPING(hbase-column-mapping)ADD SALThbase-salt-optionshbase-split-optionsDEFAULT COLUMN FAMILY OPTIONS(hbase-cfamily-opts)COLUMN FAMILY OPTIONS (,hbase-cfamilyhbase-cfamily-opts)HBASE TABLE NAMEtable-nameDEFAULT ENCODINGhbase-encoding as-result-table
element-list
Read syntax diagramSkip visual syntax diagram ( ,column-definition )
column-definition
Read syntax diagramSkip visual syntax diagram column-name data-type column-options
data-type
Read syntax diagramSkip visual syntax diagramprimitive-typearray-typerow-type
primitive-type
Read syntax diagramSkip visual syntax diagram2TINYINTSMALLINTINT2INTEGERINTINT4BIGINTINT8FLOATFLOAT8REALFLOAT4DOUBLEPRECISIONFLOAT8CHARACTER(integer)CHARSTRING3VARCHAR(integer)CHARACTERVARYINGCHARTIMESTAMPDATETIME(integer)DATEDECIMAL(10,0)(integer,0,integer)DECNUMERICNUMBINARY(integer)VARBINARY(integer)BOOLEAN
array-type
Read syntax diagramSkip visual syntax diagram primitive-typerow-typeARRAY [integer-constant][INTINTEGERVARCHAR(int)]
row-type
Read syntax diagramSkip visual syntax diagram ROW(,field-namedata-type)
column-options
Read syntax diagramSkip visual syntax diagramCOMMENTcolumn-commentNULLNOT NULLCONSTRAINTconstraint-nameconstraint-choicesconstraint-attributes
constraint-choices
Read syntax diagramSkip visual syntax diagramPRIMARY KEYUNIQUEreferences-clause
references-clause
Read syntax diagramSkip visual syntax diagram REFERENCES table-name (,column-name)
constraint-attributes
Read syntax diagramSkip visual syntax diagram NOT ENFORCEDTRUSTEDNOT TRUSTEDENFORCED ENABLE QUERY OPTIMIZATIONDISABLE QUERY OPTIMIZATION
hbase-column-mapping
Read syntax diagramSkip visual syntax diagram KEY MAPPED BY ( ,column-name ) hbase-map-parameters ,hbase-columnMAPPED BY(, column-name)hbase-encoding
hbase-map-parameters
Read syntax diagramSkip visual syntax diagram FORCE KEY UNIQUE hbase-encoding
hbase-encoding
Read syntax diagramSkip visual syntax diagramBINARYdelimited-row-formatUSINGserde-row-formatSTRING
delimited-row-format
Read syntax diagramSkip visual syntax diagram DELIMITEDFIELDS TERMINATED BYterminator-charESCAPED BYesc-char COLLECTION ITEMS TERMINATED BYterminator-charLINES TERMINATED BYterminator-charNULL DEFINED ASnull-value
serde-row-format
Read syntax diagramSkip visual syntax diagram SERDE serde-class WITH SERDEPROPERTIES(,string-constant1=string-constant2)
hbase-column
Read syntax diagramSkip visual syntax diagram hbase-column-family : hbase-column-name
hbase-salt-options
Read syntax diagramSkip visual syntax diagram (BUCKETS= salt-buckets) (CLASS= salt-class) PROPERTIES=(,name=value)
hbase-split-options
Read syntax diagramSkip visual syntax diagramSPLITS(, split-point)4SPLITALGO[split-algorithm,num-region-servers]NUMREGION(range-start, range-end,number-of-regions)SPLITS_FILEfilepath
hbase-cfamily-opts
Read syntax diagramSkip visual syntax diagramCOMPRESSIONBLOOM FILTERIN MEMORYNO IN MEMORYBLOCKCACHEBLOCKSIZEDATA_BLOCK_ENCODINGKEEP_DELETED_CELLSVERSIONSMIN_VERSIONSREPLICATION_SCOPETTL(hbase-cfamily-option-value)
as-result-table
Read syntax diagramSkip visual syntax diagram (,column-name) AS ( full-select ) WITH NO DATA
Notes:
  • 1 Each clause can be used only once.
  • 2 Db2 Big SQL supports the data types that are described in Data types. However, some data types are syntactically supported for compatibility with Hive but are treated as different data types within the Db2 Big SQL runtime environment. For more information, see the mapping of data types that are used in the CREATE TABLE statement.
  • 3 Because the STRING data type has no specific size associated with it, the SQL processor might assume that each value in a STRING column always contains 32 KB of data, and performance might be impacted. If performance is a concern, use types that contain a specific length. You can also set the bigsql.string.size property to default the STRING mapping to a smaller VARCHAR size. Set this property in one of two ways: run the SET HADOOP PROPERTY command, or set the value globally by updating the $BIGSQL_HOME/conf/bigsql-conf.xml configuration file.
  • 4 Do not use the SPLITALGO or the NUMREGION clause with the ADD SALT clause. SPLITALGO and NUMREGION are mutually exclusive.

Description

EXTERNAL
Indicates that the table is already defined in HBase. In this case, Db2 Big SQL refers to that existing table. If the EXTERNAL keyword is not specified, the HBase table is automatically created by Db2 Big SQL. When Db2 Big SQL refers to an external table, it does not verify that the structure of the external table matches the table definition in Db2 Big SQL until a running query references the table.
HBASE
This keyword is required when you are defining an HBase table. You can verify that you have defined an HBase table by querying the PROPERTY column in the SYSCAT.TABLES catalog view. For example:
SELECT SUBSTRING(PROPERTY,22,1)
  FROM SYSCAT.TABLES
  WHERE TABNAME='<table-name>';
IF NOT EXISTS
Checks whether the specified table name already exists. If the name exists, no error is returned.
table-name
Specifies a unique name for the new table. The table name must be a valid identifier. The table name cannot refer to an HBase table that contains an ARRAY or ROW column (SQLSTATE 428H2).
element-list
Defines the elements of a table, including the definition of columns and constraints on the table.
column-definition
Defines the attributes of a column.
column-name
Names a column of the table. The name cannot be qualified, and the same name cannot be used for more than one column of the table. The column name must be a valid identifier.
data-type
Specifies the data type of the column.
primitive-type
There are several built-in data types:
TINYINT
A 1-byte integer with a range of -128 to 127.

Db2® Big SQL automatically converts the TINYINT to a larger integer type, such as SMALLINT, INT, or BIGINT, or a floating-point type such as FLOAT or DOUBLE.

SMALLINT | INT2
A 2-byte integer with a precision of 5 digits and a range of -32,768 to 32,767.
INTEGER | INT | INT4
A 4-byte integer with a precision of 10 digits and a range of -2,147,483,648 to +2,147,483,647.
BIGINT | INT8
An 8-byte integer with a precision of 19 digits and a range of -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.
FLOAT(integer) | FLOAT 8 | REAL | FLOAT4 | DOUBLE | FLOAT8
A single-precision or double-precision floating-point number.

A single-precision floating-point number is a 32-bit approximation of a real number. The number can be zero or can range from -3.4028234663852886e+38 to -1.1754943508222875e-38, or from 1.1754943508222875e-38 to 3.4028234663852886e+38.

A double-precision floating-point number is a 64-bit approximation of a real number. The number can be zero or can range from -1.7976931348623158e+308 to -2.2250738585072014e-308, or from 2.2250738585072014e-308 to 1.7976931348623158e+308.

A REAL number is a single-precision floating-point number. A DOUBLE number is a double-precision floating-point number.

CHARACTER(integer) | CHAR(integer)
A character string is a sequence of characters. The length of the string (integer) is the number of characters in the sequence. The default is 1.
STRING | VARCHAR(integer) | CHARACTER VARYING(integer) | CHAR VARYING(integer)
The STRING data type has no specific size associated with it. The default is VARCHAR(32672).
TIMESTAMP | DATETIME
A six- or seven-part value (year, month, day, hour, minute, second, and optional fractional seconds) that designates a date and time.

If specified, integer must be between 0 and 9 and represents the precision of fractional seconds from 0 (seconds) to 9 (nanoseconds).

DATE
A three-part value (year, month, and day) that designates a date. The range of the year part is 0001 to 9999. The range of the month part is 1 to 12. The range of the day part is 1 to x, where x depends on the month.
DECIMAL | DEC | NUMERIC | NUM (precision-integer, scale-integer)
A decimal number with defined precision and scale. The precision is the total number of digits and can range from 1 to 31. The scale is the number of digits to the right of the decimal point and can range from 0 to the precision of the number. The default precision is 10 and scale is 0.
BINARY | VARBINARY (integer)
A varying length binary value is specified as VARBINARY(length) or BINARY(length) and can be up to 32,672 bytes long. VARBINARY is an alias for BINARY, and they are functionally identical in every way. If no length is provided, the length is assumed to be 32,672.

However, it is recommended that an explicit length be specified. The Db2 Big SQL engine works most efficiently when the table definition enables a row to fit within 32 KB of memory. When the calculated row size exceeds 32 KB, some queries might see a performance degradation.

With this data type, the Java™ I/O interface is used for all data access (both insert and select operations), and the ORC file format is recommended. For more information, see File formats supported by Db2 Big SQL.

BOOLEAN
A Boolean value.
array-type | row-type
For detailed information about these types, see Array values and Row values.

Each ARRAY or ROW type can be used in one of two ways: the SQL standard version or the Hive version. The SQL standard version, which is shown in the syntax diagram, is the recommended usage. When you specify the number of elements in the array as part of the table definition, the database manager can compute the maximum size of the column, which enhances query optimization.

With this data type, the Java I/O interface is used for all data access (both insert and select operations), and the ORC file format is recommended. For more information, see File formats supported by Db2 Big SQL.

ARRAY
An ARRAY is a list that can be indexed. The elements in the array must be of the same type. Assuming that column C1 is declared as an ARRAY with two or more elements, you can access these elements by using the following notation: ...WHERE C1[2]=1.... An ARRAY of ROW types is valid.
integer-constant
For simple arrays, an integer constant that indicates the scale of the array, which is the number of elements in the array.
INT | INTEGER | VARCHAR (int)
The data type of the associative array index.
ROW
A ROW object contains a value for each attribute of the SQL structured type that it represents (java.sql.Struct). The ROW type is equivalent to the STRUCT type in Hive tables. The ROW type contains field definitions that contain field names and their data types. An entry is created in the SYSIBM.SYSATTRIBUTES table for each field in a ROW definition.
field-name
Specifies the name of a field for the ROW type. The name cannot be the same as another field of this ROW type (SQLSTATE 42711).
data-type
Specifies the data type of the field. The data type must be a primitive data type.
column-options
You can define column options including a comment, the nullability attribute, or constraints.
COMMENT column-comment
A comment that provides additional information about the column.
NULL | NOT NULL
NOT NULL specifies that the column will not contain null values. Because the Hadoop table is external to any local database server, Big SQL enforces nullability at query execution time (SQL5104N). The nullability attribute of a column is enforced during an INSERT or LOAD USING operation. Do not assign a null value to a column that is defined as NOT NULL (SQLSTATE 23502).
CONSTRAINT constraint-name
Specifies the name of a constraint. You can define the following constraints on a column:
PRIMARY KEY
A primary key must be unique and must have the NOT NULL attribute. A table cannot have more than one primary key.
UNIQUE
A unique key prevents duplicate values in one or more NOT NULL columns of a table. A table can have multiple unique keys.
REFERENCES table-name
Ensures that required table relationships are maintained and that data entry rules are followed. The column on which this parameter is specified becomes the foreign key to the referenced parent table table-name.
constraint-attributes
Defines attributes that are associated with primary key, unique, referential integrity, or check constraints.
NOT ENFORCED | ENFORCED
Specifies whether the constraint is enforced by the database manager during normal operations such as insert, update, or delete.
NOT ENFORCED
Specifies that the constraint is not enforced by the database manager. This is the default for Hadoop tables. A primary key or unique constraint cannot be NOT ENFORCED if there is a dependent ENFORCED referential constraint.
TRUSTED | NOT TRUSTED
Specifies whether the data can be trusted to conform to the constraint.
TRUSTED
Specifies that the data can be trusted to conform to the constraint. This is the default. Use this option only if the table data is independently known to conform to the constraint; otherwise, unpredictable results might occur.
NOT TRUSTED
Specifies that the data cannot be trusted to conform to the constraint. This option is valid only for referential integrity constraints (SQLSTATE 42613). Use this option when the data conforms to the constraint for most rows, but it is not independently known whether all current and future rows will conform. If a constraint is both not trusted and enabled for query optimization, it will not be used for any optimization that depends on the data being in complete conformity with the constraint.
ENFORCED
Although the ENFORCED keyword is available for compatibility with the syntax for local tables that are created with the CREATE TABLE statement, an error is returned when you specify this option for Hadoop tables (SQLSTATE 42858).
ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION
Specifies whether the constraint or functional dependency can be used for query optimization under appropriate circumstances.
ENABLE QUERY OPTIMIZATION
Specifies that the constraint is assumed to be true and can be used for query optimization. This is the default.
DISABLE QUERY OPTIMIZATION
Specifies that the constraint cannot be used for query optimization. This option cannot be specified for primary key and unique constraints (SQLSTATE 42613).
unique-constraint
Defines a unique key. This constraint does not allow duplicate values in one or more columns of the table. A table can have multiple unique keys. The columns that are specified in a unique constraint must be defined as NOT NULL.
referential-constraint
Defines a referential constraint.
CONSTRAINT constraint-name
Specifies a name for the referential constraint.
FOREIGN KEY (column-name, ...)
The foreign key of the referential constraint is composed of the identified columns. Each name in the list of column names must identify a column of the parent table, and the same column must not be identified more than once. The number of foreign key columns must be the same as the number of parent key columns, and the data types of the corresponding columns must be compatible (SQLSTATE 42830).
COMMENT string-constant
Specifies a comment that provides additional information about the table. The maximum length of string-constant is 254 bytes.
TBLPROPERTIES ('string-constant1' = 'string-constant2')
Defines job properties that can configure input or output formats, and that you can use to specify more detailed information about the table. The properties are passed as is to the underlying InputFormat and OutputFormat classes. Properties that are not implemented by these classes are ignored. Db2 Big SQL does not validate the properties.
COLUMN MAPPING hbase-column-mapping
This clause is required for all HBase tables. It specifies the mapping between columns that are defined in element-list and their stored locations. It also describes the encoding of values in HBase columns.
KEY MAPPED BY column-name
Specifies one or more columns that are to be stored in the HBase row key. If a single column is specified, that column must be defined as NOT NULL. If multiple columns are specified, some, but not all of those columns can be defined as nullable. Do not specify array-type or row-type columns.
FORCE KEY UNIQUE
Specifies that a unique value is to be automatically generated and appended to the encoded HBase row key. This clause enables duplicate rows to be stored in an HBase table. The generated value is not visible through SQL, but does consume an additional 128 bytes of storage for each row in the HBase table. This clause is effective only when the KEY MAPPED BY clause is specified.
ENCODING
Specifies how values are encoded in HBase columns.
BINARY
Data is encoded in a binary format. This is the default. Binary encoding has the following benefits:
  • It performs better during data encoding and decoding.
  • It ensures the proper collation of values when they are used in a row key.
  • It generally requires less storage than string encoding.
delimited-row-format
Data is encoded as delimited text, after which it is converted into UTF-8 during final storage.
DELIMITED
Valid values for the FIELDS TERMINATED BY, ESCAPED BY, and LINES TERMINATED BY parameters are the following items:
  • A literal single-byte character such as pipe (|) or comma (,).
  • A value of the format \nnn, where nnn is the octal value of a character. For example, \001 is the ASCII character that represents CTRL-A.
  • A value of the format \xYY, where YY is a two-digit hexadecimal value of a character. This representation is retained for compatibility with Db2 Big SQL v1; however, the literal character or octal representations are recommended.
FIELDS TERMINATED BY terminator-char
Specifies a delimiter for structured fields. You can include any single character, but the default is \001, the ASCII character that represents CTRL-A. Characters that match the field terminator character can be part of your data if you specify the ESCAPED BY clause and those characters are properly escaped.
COLLECTION ITEMS TERMINATED BY terminator-char
Specifies a delimiter for the individual items in a collection. You can include any single character, but the default is \002, the ASCII character that represents CTRL-B.
LINES TERMINATED BY terminator-char
Specifies a delimiter for data rows. The only valid delimiter is a newline character (\n). Newline characters must not exist within your data because they cannot be escaped.
NULL DEFINED AS null-value
Specifies a value to represent the null value. The default is a literal backslash character followed by an uppercase N (\N).
Warning:
  • Delimited text encoding is not recommended for use in key mapping unless the data that is being stored is already in text format. Because the string representation of numeric values does not collate (for example, '1' < '19' < '2'), Db2 Big SQL does not push down predicates on numeric data types that were encoded as delimited text.
  • Because the default null value representation is \N, if the delimiter you choose will be escaped by the same byte that is used to escape \N, the result will not be correct. To avoid this scenario, choose a different delimiter, or use the NULL DEFINED AS clause to specify a different byte to represent null values. For example, if \b is the delimiter, override the default null value representation, as shown in the following example:
    cf1:cq1 MAPPED BY (c4,c5)
      SEPARATOR '\b' ENCODING STRING NULL DEFINED AS ''
USING serde-row-format
Data is encoded by using a Hive SerDe class to encode and decode the columns that are contained in the column mapping.
Warning:
  • You cannot use these columns in a secondary index that is created by the CREATE INDEX statement.
  • Search criteria on these columns cannot be pushed into HBase for processing. This means that rows might be returned to Db2 Big SQL for filtering.
SERDE serde-class
Specifies the name of a Java class in the Db2 Big SQL server CLASSPATH that implements the Hive SerDe interface org.apache.hadoop.hive.serde2.SerDe.
WITH SERDEPROPERTIES ('string-constant1' = 'string-constant2')
Associates SerDe properties with the SerDe class. For example, if you want to eventually load HBase data into a Hive table, create the table by using this option and the hbase.columns.mapping parameter.
STRING
Using string encoding is equivalent to using delimited encoding with no other options.
hbase-column
Specifies the case-sensitive qualified name of an HBase column in which values are to be stored. The format of this name is hbase-column-family:hbase-column-name.
hbase-column-family
Specifies the column family that qualifies a target HBase column name. A column family is an internal storage column definition under which the table columns are stored. This value must be a valid identifier.
hbase-column-name
Specifies a name that identifies a column that belongs to hbase-column-family. You can specify a string literal or a valid identifier.
MAPPED BY (column-name, ...)
Specifies one or more SQL columns that are mapped to an HBase column.
ADD SALT
Specifies that Db2 Big SQL is to add a prefix to the row key. This prefix is based on a hash function that is applied to values in the row key. This clause provides a way to scatter data that would otherwise be clustered and form hot spots of activity in the HBase region servers.
(BUCKETS=salt-buckets)
Specifies the number of buckets in which to put the prefixed row keys. The default value is the number of region servers that exist when the statement runs. The maximum value is 64.
(CLASS='class-name')

Specifies the name of the class that does the salting. The default is com.ibm.biginsights.bigsql.hbasecommon.salt.BigSqlSalt.

You can specify any class that extends the default algorithm, or you can define your own salting algorithm. Ensure that the class is available to Db2 Big SQL and that it extends com.ibm.biginsights.bigsql.hbasecommon.salt.BigSqlSalt. Complete the following steps to make the class available to Db2 Big SQL:
  1. As the bigsql user, run the following command to stop Db2 Big SQL: $BIGSQL_HOME/bin/bigsql stop.
  2. Add the JAR file that contains the salting class to the $BIGSQL_HOME/bigsql/userlib directory.
  3. Run the following command to restart Db2 Big SQL: $BIGSQL_HOME/bin/bigsql start.
PROPERTIES
Specifies the properties that can be used to configure a custom salting class. The properties are specified in the form of name/value pairs in which the name is delimited by single quotation marks. For example:
properties = ('padding'=2)
hbase-split-options
Specifies that the HBase table is to be created as pre-split regions across region servers.
SPLITS (split-point1, split-point2, ..., split-pointN)
Provides a starting value for the first column that is stored in the HBase row key at which a split point begins. The total number of created regions is the number of split points plus one.

You can specify the values as string or numeric representations of a value whose data type matches that of the first column in the HBase row key. The Examples section shows how to use the SPLITS clause.

SPLITALGO [split-algorithm,num-region-servers]
Specifies a split algorithm that evenly divides the number of possible keys by the number of regions. This clause is useful when the keys are relatively uniform. Do not use NUMREGION with SPLITALGO in the same statement. Do not use ADD SALT with SPLITALGO.

You can select one of two algorithms that are included when you install Db2 Big SQL. Choose the algorithm that works best with your data.

'HexStringSplit'
Specifies a region boundary as a uniformly distributed hexadecimal value. This split algorithm uses hex strings as keys, which are easy to read but can consume more space.
'UniformSplit'
By using this class, you divide the space of possible keys evenly. This algorithm is useful when the keys are nearly uniform random bytes. The results use less space, but they might not be as readable.
NUMREGION (range-start,range-end,number-of-regions)
Specifies a split algorithm that evenly divides the number of possible keys by the number of regions. This clause is useful when the keys are relatively uniform. Do not use SPLITALGO with NUMREGION in the same statement. Do not use ADD SALT with NUMREGION.

You can specify the range start and range end values as a number or a string literal value of the first column in the HBase row key. The Examples section shows how to use the NUMREGION clause.

SPLITS_FILE filepath
Specifies a file that contains split point values, one value per line. The path must be a fully qualified distributed file system (DFS) path. The contents of this file are processed as though the SPLITS clause were specified.
DEFAULT COLUMN FAMILY OPTIONS
This parameter specifies the HBase configuration options that are applied, by default, to all column families that are defined for the HBase table. You can define or override these options for individual column families by using the COLUMN FAMILY OPTIONS clause.
COLUMN FAMILY OPTIONS (hbase-cfamily hbase-cfamily-opts)
Specifies the HBase configuration options that are applied to a specific column family. You specify the options in the form of option name/value pairs. In the case of a Boolean option, you can omit the option value to specify true.
COMPRESSION
Specifies a compression algorithm for the column family. Valid values are NONE (default), GZ (GZIP), SNAPPY, and LZ4.
BLOOM FILTER
Bloom filters help to identify whether a specific row and column combination exists in a block without having to load it. Bloom filters use extra space but can improve lookup times and help to reduce block cache churn by loading only the blocks that are required.
NONE
Specifies that no bloom filter is maintained. This is the default.
ROW
Specifies that the hash of the row is added to the bloom filter on each key insert. Use this option when queries have predicates on row keys.
ROWCOL
Specifies that the hash of the row plus column family plus column family qualifier is added to the bloom filter on each key insert. Use this option when queries have column projections.
IN MEMORY
Specifies whether data is to be cached for as long as possible to improve performance. Valid values are true and false (default).
NO IN MEMORY
This clause is retained for compatibility with prior releases. Use the IN MEMORY clause instead.
BLOCKCACHE
Specifies whether storage memory blocks are to be cached. Block caching can improve read performance. Valid values are true and false. If block caching is enabled and some blocks have to be evicted to free up space, the column family blocks for whom IN MEMORY is set to true are the last blocks to be evicted.
BLOCKSIZE
Specifies the memory block size (in bytes). This value is used to control how much data HBase is required to read from storage files during retrieval and what is cached in memory for subsequent access. The default is 64 KB.
DATA_BLOCK_ENCODING
Specifies in-cache key compression, which enables you to take advantage of sorted key redundancy in an HFile block. You can choose to store only the differences between consecutive keys.
KEEP_DELETED_CELLS
Specifies that deleted cells can be retrieved with a GET or SCAN operation, as long as these operations have a specified time range that ends before the deletion timestamp. This allows for point-in-time queries even in the presence of delete operations. Valid values are true and false. The default is false.
VERSIONS
An integer value that specifies the number of different versions of column values to store in the HBase table. The default is 1. Db2 Big SQL reads only the latest version of a column value.
MIN_VERSIONS
Specifies the minimum number of row versions to keep for each column family. The default is 0, which disables the option. If the TTL clause is set to a specific value, the data is deleted after the TTL value is passed. You can use the MIN_VERSIONS clause if you want to retain at least a few versions of the data.
REPLICATION_SCOPE
Specifies whether replication is to be enabled on the column family to minimize the effects of system failure. The default is 0, which means that the column family is not replicated. A value of 1 means that the column family is replicated.
TTL
Specifies a time to live (TTL) value (in seconds) for the column family. HBase automatically deletes all versions of the row (including the current version) that have exceeded the specified TTL value. The default is forever.
HBASE TABLE NAME table-name
Specifies the name of the underlying HBase table. By using this clause, you can give a different name to a table in Db2 Big SQL. HBase table names are case sensitive. For example, if you specify HBASE TABLE NAME hbasetable1, but the actual HBase table name is HBaseTable1, the statement returns an error message because the specified HBase table does not exist.

If your HBase table names differ only in case, you can use this clause to differentiate them in Db2 Big SQL. For example, if your HBase table is named tab1, you can refer to that table as tab1 in Db2 Big SQL. A second HBase table named TAB1 can be referred to as tab1_2 in Db2 Big SQL by using the EXTERNAL clause.

If the HBASE TABLE NAME clause is not specified, the name of the HBase table is schema.tablename, where the schema and the table name are defined in Db2 Big SQL. These names are transformed, if necessary, to satisfy Hive rules for identifiers. For more information about identifiers in Hive, see Identifiers.

If you create an external HBase table on top of an existing HBase table by using the HBASE TABLE NAME clause, ensure that characters for user names and group names are from the set of lowercase characters a through z, the numbers 0 through 9, and the underscore character ( _ ). Names cannot begin with a number.

DEFAULT ENCODING hbase-encoding
Specifies the default data encoding that is to be used by any column mapping that does not explicitly specify one (see ENCODING). The default is binary encoding.
as-result-table
Creates a table that is based on a SELECT statement.
(column-name1, column-name2, ...)
You can optionally rename the columns in the target table.
AS (full-select)
Specifies the SELECT statement on which the target table is based.
WITH NO DATA
Specifies that the target table is to be created without data.

Usage notes

  • If you create a table in a schema that does not exist, the schema is automatically created with the default name, which is your user ID.
  • Because the STRING data type has no specific size associated with it, the SQL processor might assume that each value in a STRING column always contains 32 KB of data, and performance might be impacted. If performance is a concern, use types that contain a specific length. You can also set the bigsql.string.size property to default the STRING mapping to a smaller VARCHAR size. Set this property in one of two ways: run the SET HADOOP PROPERTY command, or set the value globally by updating the $BIGSQL_HOME/conf/bigsql-conf.xml configuration file.
  • If no precision is included on the TIMESTAMP definition, the default is TIMESTAMP(9), which is compatible with the maximum precision of a TIMESTAMP value in Hive.

Restrictions

If only the /apps/hbase/data directory is encrypted, INSERT...SELECT or LOAD INTO HBASE does not work.

The following restrictions apply to the complex data types:
  • Do not use an ARRAY, ROW, or STRUCT data type value as the row key.
  • Do not use an ARRAY, ROW, or STRUCT data type value in a constraint or key definition (SQLTATE 42962).
  • Do not reference a table that contains a column of type ARRAY, ROW, or STRUCT in the AS clause (SQLSTATE 428H2).
  • There is a limit of 32,763 ARRAY, ROW, or STRUCT definitions per database (SQLSTATE 54035). This limit also includes any user-defined types that use the CREATE TYPE statement.
  • Comparisons or assignments among ARRAY, ROW, or STRUCT variables, or that use ARRAY data types as parameters, or ARRAY or ROW parameters to a stored procedure, are not supported with ARRAY columns.
  • Do not use an ARRAY data type as a type within an ARRAY type.
  • The result type of a column in a fullselect cannot be an ARRAY type.
  • Columns that are defined with the ARRAY or ROW data type cannot be used as the anchor object of an anchored type (SQLSTATE 428HS).

The use of scrollable cursors for HBase tables is not supported.

Examples

  1. Create an HBase table with composite keys and dense columns. In this example, the HBase table has one column family (cf1) with two columns (cq1 and cq2). The HBase row key is mapped to three SQL columns (composite key). The encoding is binary. The HBase column cf1:cq1 is mapped to two SQL columns (dense column). The encoding in this case is string with '\b' as the terminator character. The HBase column cf1:cq2 is mapped to a single SQL column. The encoding is binary.
    
    CREATE HBASE TABLE mixed_encodings (
        c1 INT,
        c2 INT,
        c3 INT,
        c4 VARCHAR(20),
        c5 VARCHAR(40),
        c6 VARCHAR(90)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1, c2, c3),
        cf1:cq1 MAPPED BY (c4, c5) ENCODING DELIMITED FIELDS TERMINATED BY '\b',
        cf1:cq2 MAPPED BY (c6) ENCODING BINARY
      )
      DEFAULT ENCODING BINARY;
  2. Create an HBase table whose columns are all part of the primary key.
    
    CREATE HBASE TABLE allkey (
        k1 VARCHAR(10),
        k2 VARCHAR(70),
        k3 VARCHAR(30)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (k1, k2, k3)
      );
  3. Create an HBase table with column options.
    
    CREATE HBASE TABLE colopt1 (
        akey VARCHAR(10),
        c1 VARCHAR(70),
        c2 VARCHAR(30)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (akey),
        cf1:c1 MAPPED BY (c1),
        cf2:c2 MAPPED BY (c2)
      )
      COLUMN FAMILY OPTIONS (
        cf1 COMPRESSION(GZ) BLOOM FILTER(ROW) IN MEMORY,
        cf2 NO IN MEMORY
      );
  4. Create an HBase table by using SerDe encoding.
    
    CREATE EXTERNAL HBASE TABLE tableName (
        c1 BIGINT,
        c2 INT,
        c3 INT
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1),
        cf1:cq1 MAPPED BY (c2, c3) ENCODING USING SERDE 'com.example.SerDeClass'
          WITH SERDEPROPERTIES ('key1'='value1','key2'='value2')
      );
  5. Create an HBase table by using salting techniques to spread out any hot spots in the range. New records are split into multiple buckets, and each record goes to a different region in the cluster. This example implicitly creates four splits, which consist of the three buckets plus one.
    
    CREATE HBASE TABLE hbtable (
        c1 BIGINT,
        c2 INT,
        c3 INT
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1),
        cf1:cq1 MAPPED BY (c2, c3)
      )
      ADD SALT (
        (BUCKETS = 3)
        (CLASS = 'com.sample.SimpleSalt')
        PROPERTIES = ('padding' = '2')
      );
  6. Create an HBase table with default salt options. Db2 Big SQL uses the default salting algorithm, which defines the number of buckets as the number of region servers that exist at the time the table is created.
    
    CREATE HBASE TABLE mysalt1 (
        rowkey INT,
        c0 INT
      )
      COLUMN MAPPING (
        KEY MAPPED BY (rowkey),
        f:q MAPPED BY (c0)
      )
      ADD SALT;
  7. Create an HBase table with a combination of salting and splitting to produce seven splits (the number of explicit splits plus one). You must know your data to use the split option. Db2 Big SQL does not validate your split information.
    
    CREATE HBASE TABLE salt_and_explicit_split (
        rowkey VARCHAR(40),
        c0 INT
      )
      COLUMN MAPPING (
        KEY MAPPED BY (rowkey),
        f:q MAPPED BY (c0)
      )
      ADD SALT (
        (buckets=3)
      )
      SPLITS (
        '0a', '0n', '1a', '1n', '2a', '2n');
  8. Create an HBase table by using a compound row key with a numeric value and default binary encoding.
    
    CREATE HBASE TABLE hbtable (
        c1 INT,
        c2 INT,
        c3 INT,
        c4 VARCHAR(10),
        c5 VARCHAR(20),
        c6 VARCHAR(30)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1, c2, c3),
        cf1:cq1 MAPPED BY (c4, c5) ENCODING DELIMITED FIELDS TERMINATED BY '\b'
        cf1:cq2 MAPPED BY (c6) ENCODING BINARY
      )
      DEFAULT ENCODING BINARY
      NUMREGION(10, 30, 5);
    The table is created with splits on C1. The NUMREGION clause defines the starting point of the range as 10, and the ending point of the range as 30, with 5 regions in which to split. The regions for hbtable are shown in the following table:
    Table 1. Example split regions
    Region Split on C1 values
    1 Less than 10
    2 10 - 15
    3 16 - 21
    4 22 - 29
    5 30 and greater

    If C1 contains character or string values, the split might be less efficient with binary encoding. Be sure to match the encoding with the data type of the row key column for more efficient splits. For numeric row key data, specify binary encoding. For character or string row key data, specify string encoding.

  9. Create an HBase table by using a custom SerDe that enables you to read data that is stored as JSON objects, and to convert that data to a JSON string that can be understood by HBase. The content of your HBase table is derived from JSON documents.
    
    CREATE HBASE TABLE post (
        id INT,
        title STRING,
        content STRING,
        created_on TIMESTAMP,
        updated_on TIMESTAMP,
        published BOOLEAN
      )
      COLUMN MAPPING (
        KEY MAPPED BY (id),
        cf1:cq1 MAPPED BY (title, content, created_on, updated_on, published)
      )
      DEFAULT ENCODING USING SERDE 'com.ibm.biginsights.bigsql.serde.JSONSerDe';
    An example insert statement is shown in the following code snippet:
    
    INSERT INTO post
      VALUES (1, 'Hello world!', 'First post!', '2013-01-01 01:01:01.001',
        '2013-01-01 01:01:01.001', true);
  10. Create an HBase table with splits on numeric or string representations.
    
    CREATE HBASE TABLE t1 (
      c1 INT,
      c2 VARCHAR(10),
      c3 VARCHAR(20)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1, c2),
        cf1:c1 MAPPED BY (c3)
      )
      SPLITS (1000, 2000, 3000);
    
    CREATE HBASE TABLE t2 (
      c1 DATE,
      c2 VARCHAR(10),
      c3 VARCHAR(20)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1, c2),
        cf1:c1 MAPPED BY (c3)
      )
      SPLITS ('1970-01-01', '2000-01-01', '2010-01-01');
  11. Create an HBase table with splits on NUMREGION (numeric or string).
    
    CREATE HBASE TABLE t1 (
      c1 INT,
      c2 VARCHAR(10),
      c3 VARCHAR(20)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1, c2),
        cf1:c1 MAPPED BY (c3)
      )
      NUMREGION (0, 10000, 20);
    
    CREATE HBASE TABLE t1 (
      c1 DATE,
      c2 VARCHAR(10),
      c3 VARCHAR(20)
      )
      COLUMN MAPPING (
        KEY MAPPED BY (c1, c2),
        cf1:c1 MAPPED BY (c3)
      )
      NUMREGION ('1970-01-01', '2100-01-01', 20);
  12. Create an HBase table that is based on another HBase table.
    
    CREATE HBASE TABLE hb1 (
      ...
      )
      ...
      AS SELECT * FROM  hbcopy;
  13. Create an HBase table that uses complex data types such as STRUCT and ARRAY.
    
    CREATE HBASE TABLE complex_types (
      key VARCHAR(5),
      c1 STRUCT<f1:VARCHAR(5), f2:VARCHAR(5)>,
      c2 ARRAY<VARCHAR(5)>
      )
      COLUMN MAPPING (
        KEY MAPPED BY (key),
        f:q1 MAPPED BY (c1) ENCODING DELIMITED
          FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '/' LINES TERMINATED BY '\n',
        f:q2 MAPPED BY (c2) ENCODING DELIMITED
          FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '/' LINES TERMINATED BY '\n')
      ;
    Suppose that the following sample data is stored in the HDFS, in a file named data.csv in folder /tmp:
    key1,f1v1/f2v1,a1v1/a2v1/a3v1
    key2,f1v2/f2v2,a1v2/a2v2/a3v2
    key3,f1v3/f2v3,a1v3/a2v3/a3v3
    You can use the LOAD HADOOP statement to populate the table with this data, as shown in the following example:
    
    LOAD HADOOP USING FILE URL '/tmp/data.csv'
      WITH SOURCE PROPERTIES ('field.delimiter'=',','collection.items.delimiter'='/')
      INTO TABLE complex_types;
    After the data is loaded, you can query the complex types as individual elements.
    
    SELECT key, c1.f1, c1.f2, c2[1], c2[2], c2[3]
      FROM complex_types;