Hints for designing indexes
In HBase, the row key provides the same data retrieval benefits as a primary index. So, when you create a secondary index, use elements that are different from the row key.
Secondary indexes allow you to have a secondary way to read an HBase table. They provide a way to efficiently access records by means of some piece of information other than the primary key. Secondary indexes require additional cluster space and processing because the act of creating a secondary index requires both space and processing cycles to update.
The LOAD USING command can be used to bulk load data into HBase tables in the distributed file system (DFS) by using the appropriate target table properties. However, any indexes that exist prior to the LOAD USING process are not updated. Consider dropping your indexes before you issue the LOAD USING command. You can then re-create the indexes after the LOAD USING command is complete. This action ensures that map reduce is used to populate the indexes. The benefits of using the parameter hbase.load.method on your LOAD USING command exceed any inconvenience of dropping and then recreating the index.
When you insert a large amount of data into a table, Db2 Big SQL automatically uses a feature called bulk insert. This feature lets Db2 Big SQL populate tables and any indexes on those tables more efficiently.
- An external table.
- Custom encoded columns.
- Columns that are the leading part or parts of a row key.
- Using an algorithm
- The following statement creates an index by using the SPLIT algorithm. The
algorithm uses the input value of 4 to evenly divide the
space of possible keys by four
regions.
The UniformSplit algorithm is useful when the keys are nearly uniform random bytes. The results use less space, but they might not be as readable.CREATE INDEX idx1 ON TABLE hbtableX (C1) AS 'hbase' SPLITALGO [ 'UniformSplit', 4];
- Using an explicit SPLIT
- In the following example, the splits are defined explicitly. Each value is a
key on which to split. The value can be a string or a numeric value.
This statement results in five splits; the number of explicit splits plus one.CREATE INDEX idx2 ON TABLE hbtable(name,score) as 'hbase' SPLITS [5, 10, 15, 20];
- You cannot create an index on a column that is mapped entirely to the key of an HBase table.
- You cannot create an index on columns that are defined in the same order as the row key in the COLUMN MAPPING clause. Since these columns form the leading part of the row key, the data retrieval benefits are already available.
CREATE HBASE TABLE rkindex
(k0 varchar, k1 int, k2 string, k3 int,
c0 string, c1 int, c2 int)
COLUMN MAPPING (
KEY MAPPED BY (k0, k1, k2, k3),
cf:cq MAPPED BY (c0, c1, c2));
If the index is made of several columns that are mapped by a row key, but the row key
mapping columns are in a different order from the index columns, then you can create an
index on the row key columns.
CREATE INDEX ix_k0k2k3k1 ON TABLE rkindex (k0,k2,k3,k1) AS 'HBASE';
In
the next example of CREATE INDEX, based on the same table as in the previous example,
the index is valid because the row key column that is used in the index is only a small
part of the set of columns that are mapped to the row key. Therefore, the row key column
k0 used alone in the index is not enough to determine uniqueness.
CREATE INDEX ix_k0c0c1c2 ON TABLE rkindex (k0,c0,c1,c2) AS 'HBASE';