SAP HANA Query Consumer
The SAP HANA Query Consumer origin reads from an SAP HANA database using the specified SQL query. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
The SQL query can read data from a single table or from a join of tables. The origin returns data as a map with column names and field values.
When you configure SAP HANA Query Consumer, you specify connection information and credentials that determine how the origin connects to the database. You configure the query mode, SQL query and related information to define the data returned by the database. You can call stored procedures from the SQL query.
You can enable SAP HANA split batch commands, which allow parallel execution of the query on partitioned tables. You can specify custom properties that your driver requires. And you can specify what the origin does when encountering an unsupported data type.
By default, the origin generates JDBC record header attributes and JDBC field attributes that provide additional information about each record and field. You can configure the origin to generate SAP HANA record header attributes that provide details about the connection.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Before you use the origin, you must install a JDBC driver.
Installing the JDBC Driver
Before you use the SAP HANA Query Consumer origin, install the JDBC driver for the database. You cannot access the database until you install the required driver.
You install the driver into the JDBC SAP HANA stage library, streamsets-datacollector-jdbc-sap-hana-lib
, which includes the origin.
To use the JDBC driver with multiple stage libraries, install the driver into each stage library associated with the stages.
For information about installing additional drivers, see Install External Libraries in the Data Collector documentation.
Offset Column and Offset Value
The SAP HANA Query Consumer origin uses an offset column and initial offset value to determine where to start reading data within a table. Include both the offset column and the offset value in the WHERE clause of the SQL query.
The offset column must be a column in the table with unique non-null values, such as a primary key or indexed column. The initial offset value is a value within the offset column where you want the origin to start reading.
When the origin performs an incremental query, you must configure the offset column and offset value. For full queries, you can optionally configure them.
Full and Incremental Mode
The SAP HANA Query Consumer origin can perform queries in two modes:
- Incremental mode
- To use incremental mode, you must select the Incremental Mode property and configure an offset column and initial offset value for the origin. When you define the SQL query, you must use the ${OFFSET} parameter to represent the offset value in the WHERE clause.
- Full mode
- To use full mode, you must clear the Incremental Mode property for the origin. You can optionally configure an offset column and initial offset value and can define any type of SQL query.
Recovery
The SAP HANA Query Consumer origin supports recovery after a deliberate or unexpected stop when it performs incremental queries. Recovery is not supported for full queries.
In incremental mode, the origin uses offset values in the offset column to determine where to continue processing after a deliberate or unexpected stop. To ensure seamless recovery in incremental mode, use a primary key or indexed column as the offset column. As the SAP HANA Query Consumer origin processes data, it tracks the offset value internally. When the pipeline stops, the origin notes where it stopped processing data. When you restart the pipeline, it continues from the last-saved offset.
When the origin performs full queries, the origin runs the full query again after you restart the pipeline.
SQL Query
The SQL query defines the data returned from the database.
You define the query in the SQL Query property
on the JDBC tab. Or, you can define the query in a runtime
resource, and then use the runtime:loadResource
function in
the SQL Query property to load the query from the resource file at runtime. For example,
you might enter the following expression for the property:
${runtime:loadResource("myquery.sql", false)}
SQL Query for Incremental Mode
When you define the SQL query for incremental mode, the SAP HANA Query Consumer origin requires a WHERE and ORDER BY clause in the query.
Use the following guidelines when you define the WHERE and ORDER BY clauses in the query:
- In the WHERE clause, include the offset column and the offset value
- The origin uses an offset column and value to determine the data that is returned. Include both in the WHERE clause of the query.
- Use the OFFSET parameter to represent the offset value
- In the WHERE clause, use ${OFFSET} to represent the offset value.
- In the ORDER BY clause, include the offset column as the first column
- To avoid returning duplicate data, use the offset column as the first column in the ORDER BY clause.
invoice
table where the ID column is the offset column. The query
returns all data where the ID is greater than the offset and orders the data by the
ID: SELECT * FROM invoice WHERE id > ${OFFSET} ORDER BY id
SQL Query for Full Mode
You can define any type of SQL query for full mode.
invoice
table:SELECT * FROM invoice
When you define the SQL query for full mode, you can optionally include the WHERE and ORDER BY clauses using the same guidelines as for incremental mode. However, using these clauses to read from large tables can cause performance issues.
JDBC Attributes
The SAP HANA Query Consumer origin generates record header attributes and field attributes that provide additional information about each record and field.
The origin receives these details from the JDBC driver.
JDBC Header Attributes
By default, the SAP HANA Query Consumer origin generates JDBC record header attributes that provide additional information about each record, such as the original data type of a field or the source tables for the record. The origin receives these details from the JDBC driver.
You can use the record:attribute
or
record:attributeOrDefault
functions to access the information
in the attributes. For more information about working with record header attributes,
see Working with Header Attributes.
JDBC header attributes include a user-defined prefix to
differentiate the JDBC header attributes from other record header attributes. By
default, the prefix is jdbc
.
You can change the prefix that the origin uses and you can configure the origin not to create JDBC header attributes with the Create JDBC Header Attributes and JDBC Header Prefix properties on the Advanced tab.
JDBC Header Attribute | Description |
---|---|
<JDBC prefix>.tables |
Provides a
comma-separated list of source tables for the fields in the
record.
Note: Not all JDBC drivers
provide this information.
|
<JDBC prefix>.<column name>.jdbcType | Provides the numeric value of the original SQL data type for each field in the record. See the Java documentation for a list of the data types that correspond to numeric values. |
<JDBC prefix>.<column name>.precision | Provides the original precision for all numeric and decimal fields. |
<JDBC prefix>.<column name>.scale | Provides the original scale for all numeric and decimal fields. |
JDBC Field Attributes
The SAP HANA Query Consumer origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.
- Decimal and Numeric data types are converted to the Data Collector Decimal data type, which does not store scale and precision.
- The Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
Data Collector Data Type | Generated Field Attribute | Description |
---|---|---|
Decimal | precision | Provides the original precision for every decimal or numeric column. |
Decimal | scale | Provides the original scale for every decimal or numeric column. |
Datetime | nanoSeconds | Provides the original nanoseconds for every timestamp column. |
You can use the record:fieldAttribute
or
record:fieldAttributeOrDefault
functions to access the information
in the attributes. For more information about working with field attributes, see Field Attributes.
SAP HANA Header Attributes
The SAP HANA Query Consumer origin can include SAP HANA connection information , such as the driver version or application name, in record header attributes. The origin receives these details from the JDBC driver.
The attributes are named SapHANA.<attribute>
.
SapHANA.APPLICATIONUSER:<user name>
SapHANA.DRIVERVERSION:2.4.76-7ca985c0cc5ea9fa063ab376dab1bf7b859dd9cc
SapHANA.APPLICATION:com.streamsets.pipeline.BootstrapMain
Use the Include SAP HANA Connection Details on the SAP HANA tab to enable generating the SAP HANA header attributes.
Event Generation
The SAP HANA Query Consumer origin can generate events that you can use in an event stream. When you enable event generation, the origin generates an event when it completes processing the data returned by the specified query. The origin also generates an event when a query completes successfully and when it fails to complete.
- With the Pipeline Finisher executor to stop the pipeline and
transition the pipeline to a Finished state when the origin completes processing
available data.
When you restart a pipeline stopped by the Pipeline Finisher executor, the origin processes data based on how you configured the origin. For example, if you configure the origin to run in incremental mode, the origin saves the offset when the executor stops the pipeline. When it restarts, the origin continues processing from the last-saved offset. In contrast, if you configure the origin to run in full mode, when you restart the pipeline, the origin uses the initial offset, if specified.
For an example, see Stopping a Pipeline After Processing All Available Data.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
-
With a destination to store information about completed queries.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Record
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses one of the following types:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
- No-more-data
- The origin generates a no-more-data event record when it completes processing all data returned by a query.
- Query success
- The origin generates a query success event record when it completes processing the data returned from a query.
- Query failure
- The origin generates a query failure event record when it fails to complete processing the data returned from a query.
Configuring an SAP HANA Query Consumer Origin
Configure an SAP HANA Query Consumer origin to read from an SAP HANA database using the specified SQL query. Before you use the origin, you must install a JDBC driver.