CREATE INDEX (HBASE)

The CREATE INDEX statement creates an index on the specified HBase table.

Syntax

hbase-split-options

Notes:

¹ The TABLE keyword is deprecated and will be removed in the next release.

Description

CREATE INDEX index-name

Specifies the name of the index to be created.

ON TABLE table-name

Specifies the name of the HBase table on which the index is created. The TABLE keyword is deprecated in later releases.

column-name

Specifies the columns in the table that are to be indexed. You cannot create an index on a column that is encoded with the USING SERDE parameter.

hbase-split-options

When you specify these options, the HBase table is created as pre-split regions across region servers.

SPLITS (split-point1, split-point2, ..., split-pointN)

Provides a starting value for the first column that is stored in the HBase row key at which a split point begins. The total number of regions that are created is the number of split keys plus one.

You can specify the values as string or numeric representations of a value of the data type of the first column that is contained in the HBase row key. You can see examples of splitting in the examples section of CREATE TABLE (HBASE) statement.

SPLITS_FILE file-path

Specifies a file containing split points. The file path must be a fully qualified distributed file system (DFS) path to a file that contains split point values. In the file, there must be one value for each line. The contents of this file are treated as if those values were provided to the SPLITS clause.

SPLITALGO [split-algorithm,num-region-server]

Use this parameter to include a split algorithm that evenly divides by the value of number-of-regions, the space of possible keys. This parameter is useful when the keys are relatively uniform. This parameter is mutually exclusive with NUMREGION. Do not use SPLITALGO with NUMREGION in the same statement. Do not use this parameter with the ADD SALT keyword.

You can select one of two algorithms. Choose the algorithm that works best with your data:

'HexStringSplit': The format of a 'HexStringSplit' region boundary is any uniformly distributed hexadecimal value. This split algorithm uses hex strings as keys, which is easy to read. However, the results can take up more space.
'UniformSplit': By using this class, you divide the space of possible keys evenly. This algorithm is useful when the keys are nearly uniform random bytes. The results use less space, but they might not be as readable.

NUMREGION (range-start,range-end,number-of-regions)

Use this parameter to include a split algorithm that evenly divides the space of possible keys by the value of the number of regions. This parameter is useful when the keys are relatively uniform. This parameter is mutually exclusive with SPLITALGO. Do not use SPLITALGO with NUMREGION in the same statement. Do not use this parameter with the ADD SALT keyword.

You can specify the range-start and range-end values as a number or a string literal value of the first column in the HBase row key. The examples in CREATE TABLE (HBASE) statement show the use of NUMREGION.

Usage notes

The secondary index provides the same data retrieval benefits as a primary index. So, it is not necessary to create a secondary index by using the same elements that comprise a row key. Therefore, you can create an index on a row key column, but consider the following restrictions:
- You cannot create a secondary index with a non-composite or single Db2 Big SQL column that is mapped to an HBase row key.
  This means that you cannot create an index on a column that is mapped entirely to the key of an HBase table.
- You cannot create a secondary index with the leading part of a composite row key in the same order that they are mapped to a row key.
  This means that you cannot create an index on columns that are from the first one or more of the columns that comprise the row key.
You can create a composite index that combines columns from the composite row keys when the row key columns are in a different order. If you define an index with columns in the same order as in the row key column mapping, they would form leading part of row key.
The index cannot include columns with ARRAY or ROW or STRUCT data types.
The index cannot include columns with non-STRING data types that use STRING encoding.
When loading large amounts of data into HBase the pre-splitting option should be used to improve response time. For details, see Pre-splitting.
You cannot create a secondary index on an external HBase table.

Examples

The following example creates an invalid index on a row key column. The row key maps to a single column.
```
CREATE HBASE TABLE mytable 
  (k int, c0 string, c1 int, c2 int)
COLUMN MAPPING 
(KEY MAPPED BY (k), 
 cf:cq MAPPED BY (c0, c1, c2));
```
Assume that column (k) is the rowkey :
```
CREATE INDEX ix_k on TABLE mytable (k); 
```
The result of this CREATE INDEX is an error.
The following example, which uses the same CREATE TABLE statement as in Example 1, creates an invalid index because the row key column is the leading column, and the only row key, of a composite index
```
CREATE INDEX ix_kc0 
  ON nc_rkindex (k, c0) 
  ;
```
The following example, which uses the same CREATE TABLE statement as in Example 1, creates an invalid index because the row key column is not in the first position along with an additional column, and column k is the only column mapped by the row key.
```
CREATE INDEX ix_c1k ON TABLE nc_rkindex (c1, k);
```
By using the following CREATE statement, these next series of indexes are valid:
```
CREATE HBASE TABLE rkindex (
  k0 varchar(20), k1 int, k2 string, k3 int, 
  c0 string, c1 int, c2 int
 )
COLUMN MAPPING (
  KEY MAPPED BY (k0, k1, k2, k3), 
  cf:cq MAPPED BY (c0, c1, c2));
```
If the index is made of several columns that are mapped by a row key; but the row key mapping columns are in a different order from the index columns, then you can create an index on the row key columns.
```
CREATE INDEX ix_k0k2k3k1 ON rkindex (k0,k2,k3,k1);
```
The following example, which uses the same CREATE TABLE statement as in Example 4, creates a valid index because the row key column that is used in the index is only a small part of the set of columns that are mapped to the row key. Therefore, the row key column k0 used alone in the index is not enough to determine uniqueness.
```
CREATE INDEX ix_k0c0c1c2 ON rkindex (k0,c0,c1,c2);
```

Assume the following table definition that uses the default STRING encoding:


CREATE HBASE TABLE mytable(  
  k int, c0 string, c1 varchar(20),
  c2 char(10), c3 int, c4 float
)
COLUMN MAPPING(
  KEY MAPPED BY (k),  
  cf:cq MAPPED BY (c0, c1, c2, c3, c4)
)
DEFAULT ENCODING STRING;

You can create secondary indexes only on columns c0, c1, or c2. Therefore, the following table shows the valid and invalid CREATE INDEX statements:

Valid examples Invalid examples

Table 1. Valid and invalid CREATE INDEX statements:
Valid examples	Invalid examples
`CREATE INDEX idx1 ON mytable(c0);`	`CREATE INDEX idx4 ON mytable(c0,c3);`
`CREATE INDEX idx2 ON mytable(c1);`	`CREATE INDEX idx5 ON mytable(c3);`
`CREATE INDEX idx3 ON mytable(c0,c1);`	`CREATE INDEX idx6 ON mytable(c4,c1);`

CREATE INDEX idx1 ON mytable(c0);

CREATE INDEX idx4 ON mytable(c0,c3);

CREATE INDEX idx2 ON mytable(c1);

CREATE INDEX idx5 ON mytable(c3);

CREATE INDEX idx3 ON mytable(c0,c1);

CREATE INDEX idx6 ON mytable(c4,c1);