Configuring the HBase connector as a source
To configure a HBase connector stage to read or look up rows in a HBase table or view, you must specify the source table in Connection properties and also how you want to process source data.
Procedure
- From the job design canvas, double-click the HBase Connector stage.
- Click the Properties tab, and, in the Usage section specify the settings for the read operation.
-
Ensure if you really want to strictly check types of data when deserializing it during read
operation using Strict type checking property. With the property,
Strict type checking set to Yes, each discrepancy
between serialized data and column definition in the output link will result in an error and stops
data processing. Without strict type checking situations, if the connector is fed data entries that
can be parsed, but are larger than expected, are only logged for future investigations. The data
entry itself is cut, for example, if you specify the field as Varchar(20) and connector is given 30
characters it will accept the first 20 characters and ignore the rest.
Without strict type checking, such situations are logged only for future investigations.
- The property Use parallel read determines whether connector should split reading data to all available player nodes to speed up the process. With this property set to No, each player node processes the entire table.
-
The property Type of row keys in the target table is used to provide row
key characteristic which connector uses to split load more efficiently between player nodes.
Available options are:
- Uniform byte arrays - which means that the row keys are random byte arrays
- Numeric strings with values greater than zero - row keys containing only digits. The key values must be greater than zero
- Hexadecimal strings with values greater than zero - row keys containing hexadecimal numbers. The key values must be greater than zero
- When HBase starts to split or merge regions during data loading it may impact reading performance. You can enable detecting if regions were splitted or merged during by setting Enable detecting region changes to Yes. This property is placed under Detect regions modifications (splits & merges) property group. You can also control how often this check will be performed by setting property Detect after reading every specified number of records.
- Click OK to save the job.