Properties reference: Hive connector

This topic lists all properties that you can set to configure the stage.

Connection

URL
Specify the URL string for connection in the driver-specific format. The connection URL identifies the Hive data source in your environment to connect to. For information about the URL format that the driver supports, see your driver documentation.
  • Type: string
User name
Specify the user name to use to connect to the data source.
  • Type: string
Password
Specify the password to use to connect to the data source.
  • Type: protected string
Attributes
Specify any driver-specific connection attributes. Enter each attribute in a separate line in name=value or name:value or name value format. For information about driver-specific attributes, see your driver documentation.
  • Type: string
Hive driver type
Select type of Hive driver
  • Type: selection
  • Default: Default Hive driver
  • Values:
    • Default Hive driver
    • Default SparkSQL driver
    • JDBC configuration file
    • User defined
Driver classpath
Specify Hive driver classpath
  • Type: string
Driver class names
Specify Hive driver class names
  • Type: string

Usage

Write mode
Select the mode that you want to use to write to the data source.
  • Type: selection
  • Default: Insert
  • Values:
    • Insert
    • Update
    • Delete
    • Custom
For more information, see the Supported write modes topic.
Generate SQL at run time
Select Yes to automatically generate the SQL statements at run time.
  • Type: boolean
  • Default: true
Table name
Enter the name of the table that you want to access in the data source.
  • Type: string
Table action
Select the action to complete before writing data to the table.
  • Type: selection
  • Default: Append
  • Values:
    • Append
    • Create
    • Replace
    • Truncate
For more information, see the Table action property topic.
Generate create table statement at run time
Select Yes to automatically generate the CREATE TABLE statement at run time. Depending on the input link column data types, the driver, and the data source, the connector might not be able to determine the corresponding native data types and produce a valid statement.
  • Type: boolean
  • Default: true
Stop the job when the create table statement fails.
Select Yes to stop the job if the CREATE TABLE statement fails.
  • Type: boolean
  • Default: true
Row format
Select Delimited or SerDe row format options to create the table.
  • Type: selection
  • Default: Storage format
  • Values:
    • Storage format
    • Delimited
    • SerDe
Field terminator
Use this property to specify the field terminator to create the table.
  • Type: string
Line terminator
Use this property to specify the line terminator to create the table.
  • Type: string
Serde library name
Use this property to specify the library name for SerDe for creating the table e.g org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
  • Type: string
Data storage format
Use this property to specify the storage format of the file that stores the data in the hive table.
  • Type: selection
  • Default: TEXTFILE
  • Values:
    • TEXTFILE
    • SEQUENCEFILE
    • AVRO
    • PARQUET
    • RCFILE
    • ORC
Table location
Use this property to specify the location of the file that serves as storage for the hive table.
  • Type: string
Create table statement
Enter the CREATE TABLE statement to run to create the target database table.
  • Type: string
Generate drop table statement at run time
Select Yes to automatically generate the DROP TABLE statement at run time.
  • Type: boolean
  • Default: true
Stop the job when drop table statement fails
Select Yes to stop the job if the DROP TABLE statement fails.
  • Type: boolean
  • Default: false
Drop table statement
Enter the DROP TABLE statement to run to drop the target database table.
  • Type: string
Generate truncate table statement at run time
Select Yes to automatically generate the TRUNCATE TABLE statement at run time.
  • Type: boolean
  • Default: true
Stop the job when truncate table statement fails
Select Yes to stop the job if the TRUNCATE TABLE statement fails.
  • Type: boolean
  • Default: true
Truncate table statement
Enter the TRUNCATE TABLE statement to run to truncate the target database table.
  • Type: string
Perform table action first
Select Yes to perform the table action first. Select No to run the Before SQL statements first.
  • Type: boolean
  • Default: true
Enable quoted identifiers
Select Yes to enclose the specified table name and column names on the links in quoting strings when SQL statements are generated. The connector queries the driver to determine the quoting string. If it fails to obtain this information from the driver, the connector uses the backtick (`) character as the quoting string. The default is No.
  • Type: boolean
  • Default: false
For more information, see the Enable quoted identifiers property topic.
SQL
Specify the SQL statement properties.
  • Type: category
Select statement
Enter a SELECT statement or the fully qualified name of the file that contains the SELECT statement. The statement is used to read rows from the database.
  • Type: string
Read select statement from file
Select Yes to read the SELECT statement from the file that is specified in the SELECT statement property.
  • Type: boolean
  • Default: false
Insert statement
Enter an INSERT statement or the fully qualified name of the file that contains an INSERT statement. The statement is used to insert rows into the database.
  • Type: string
Read insert statement from file
Select Yes to read the INSERT statement from the file that is specified in the INSERT statement property.
  • Type: boolean
  • Default: false
Update statement
Enter an UPDATE statement or the fully qualified name of the file that contains an UPDATE statement. The statement is used to update rows in the database.
  • Type: string
Read update statement from file
Select Yes to read the UPDATE statement from the file that is specified in the UPDATE statement property.
  • Type: boolean
  • Default: false
Delete statement
Enter a DELETE statement or the fully qualified name of the file that contains a DELETE statement. The statement is used to delete rows from the database.
  • Type: string
Read delete statement from file
Select Yes to read the DELETE statement from the file that is specified in the DELETE statement property.
  • Type: boolean
  • Default: false
Custom statements
Custom statements to be run for each input row
  • Type: string
Read custom statements from file
Select Yes to read the custom statements from the file that is specified in the Custom statements property.
  • Type: boolean
  • Default: false
Enable partitioned reads
Select Yes to run the statement on each processing node. When using 'Hive partition' as Partitioned read method the statement should consist of a where clause <parition-column> = [[part-value]], where partition-column is the partition from which you want to query and the [[part-value]] is the placeholder which is replaced on each processing node with the actual partition values to prepare the query. A simple example is SELECT c1,c2 from part_table where pc1=[[part-value]]. When using 'Modulus', [[mod-part-expr]] should be used. Example: SELECT * from part_table where [[mod-part-value]]. With 'Minimum and Maximum range' the placeholder is [[range-part-expr]]. Example: SELECT * from part_table where [[range-part-value]]
  • Type: boolean
  • Default: false
For more information, see the Partitioned reads topic.
Partitioned read methods
Use this property to set the type of the partitioning to be used when the partitioned reads is enabled.
  • Type: selection
  • Default: Hive partition
  • Values:
    • Hive partition
    • Modulus
    • Minimum and Maximum range
Column name
Use this property to specify the key column that will be used by the selected partition read method. This column must be of tinyint, smallint, integer or bigint datatype.
  • Type: string
Table name
Use this property to specify the table that will be used by the selected partition read method.
  • Type: string
Enable partitioned write
Select Yes to insert data into partitioned table. In the insert query, ORCHESTRATE.<partitioned-column> would be replaced by the column data passed from input link.
  • Type: boolean
  • Default: false
For more information, see the Partitioned writes topic.
Transaction
Specify the transaction settings.
  • Type: category
Record count
Specify the number of rows that the stage reads from or writes to the data source in a single transaction. When this property is set to 0, the transaction is committed only once on each processing node of the stage after the stage processes all the rows on that node. When rows arrive on the input link of the stage in waves, the Record count value applies to each wave separately.
  • Type: integer
  • Default: 2000
  • Minimum: 0
  • Maximum: 999999999
Mark end of wave
Select Yes to generate an end-of-wave record after each wave of records, where the number of records in each wave is specified in the Record count property. When the Record count property is set to 0, the end-of-wave records are not generated.
  • Type: selection
  • Default: No
  • Values:
    • No
    • Yes
Record ordering
Determines how records must be ordered across multiple links.
  • Type: selection
  • Default: All records
  • Values:
    • All records
    • First record
    • Ordered
For more information, see the Record ordering property topic.
Key column
Specify one or more columns to be used for sorting.
  • Type: category
  • Default: false
Column name
Specify the name of the column to be used for sorting.
  • Type: string
Sort order
Select the sorting order.
  • Type: selection
  • Default: Ascending
  • Values:
    • Ascending
    • Descending
Null order
Select the sorting order for null values. This value specifies if the null values are sorted before or after non-null values.
  • Type: selection
  • Default: Before
  • Values:
    • Before
    • After
Case sensitive
Select Yes to perform case-sensitive sorting of values.
  • Type: boolean
  • Default: false
Session
Specify the session settings.
  • Type: category
Fetch size
Specify the number of rows that the driver must try to fetch from the data source when the connector requests a single row. Fetching rows in addition to the row requested by the connector can improve performance because the driver can complete the subsequent requests for more rows from the connector locally without a need to access the data source. The default value is 0, which indicates that the driver optimizes the fetch operation based on its internal logic.
  • Type: integer
  • Default: 0
  • Minimum: 0
  • Maximum: 999999999
Batch size
Enter the number of records to include in the batch of records for each statement execution. The value 0 indicates that all input records are passed to the statements in a single batch.
  • Type: integer
  • Default: 2000
  • Minimum: 0
  • Maximum: 999999999
Drop unmatched fields
Select Yes to drop any fields from the input link for which there are no matching parameters in the statements configured for the stage. Select No to issue error message when an unmatched field is present on the link.
  • Type: boolean
  • Default: false
For more information, see the Dropping unmatched fields topic.
Report schema mismatch
Select Yes to perform early comparison of the column definitions on the link with the column definitions in the data source and to issue warning messages for any detected discrepancies which can result in data corruption. Depending on the environment and the usage scenario the early detection of discrepancies may not be possible in which case the error messages are reported only when the actual data corruption is detected.
  • Type: boolean
  • Default: false
For more information, see the Report schema mismatch property topic.
Default length for columns
Enter the default length for the Char, NChar, Binary, VarChar, NVarChar, and VarBinary link columns for which the Length attribute is not set.
  • Type: integer
  • Default: 200
  • Minimum: 1
  • Maximum: 999999999
Default length for long columns
Enter the default length for the LongVarChar, LongNVarChar and LongVarBinary link columns for which the Length attribute is not set.
  • Type: integer
  • Default: 20000
  • Minimum: 1
  • Maximum: 999999999
Fail on truncation
Select Yes to fail the operation when truncation is detected for the Long Columns (LongVarChar, LongNVarChar and LongVarBinary). When the the value of this property is set to No, then the connector continues execution even when data truncation is detected.
  • Type: boolean
  • Default: true
Generate all columns as Unicode
Always generate columns as NChar, NVarChar and LongNVarChar columns instead of Char, VarChar and LongVarChar columns.
  • Type: boolean
  • Default: false
Character set for non-Unicode columns
Select the character set option for the values of Char, VarChar and LongVarChar link columns for which the Extended attribute is not set to Unicode. If you select the Default option, the character set encoding of the engine host system locale is used. If you select the Custom option, you must provide the character set name to be used.
  • Type: selection
  • Default: Default
  • Values:
    • Default
    • Custom
For more information, see the Character set for non-Unicode columns property topic.
Character set name
Specify the name of the character set encoding for the values of Char, VarChar and LongVarChar link columns for which the Extended attribute is not set to Unicode.
  • Type: string
Keep conductor connection alive
Select Yes to keep the connection alive in the conductor process while the player processes are processing records. Select No to close the connection in the conductor process before player processes start processing records, and to connect again if necessary after the player processes complete processing the records.
  • Type: boolean
  • Default: true
Set Hive parameters
Enter the Hive statement to set the hive parameters.
  • Type: string
For more information, see the Set Hive parameters property topic.
Stop the job when unable to set Hive parameters
Select Yes to stop the job if the Hive parameters are not set.
  • Type: boolean
  • Default: false
Run before and after SQL statements
Select Yes to run SQL statements before and after data is accessed in the database.
  • Type: boolean
  • Default: false
For more information, see the Run before and after SQL statements property topic.
Before SQL statement
Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once before any data is processed.
  • Type: string
Read Before SQL statement from file
Select Yes to read the SQL statement from the file that is specified in the Before SQL statement property.
  • Type: boolean
  • Default: false
Stop the job when Before SQL statement fails
Select Yes to stop the job if the Before SQL statement fails.
  • Type: boolean
  • Default: true
After SQL statement
Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once after all of the data is processed.
  • Type: string
Read After SQL statement from the file
Select Yes to read the SQL statement from the file that is specified in the After SQL statement property.
  • Type: boolean
  • Default: false
Stop the job when After SQL statement fails
Select Yes to stop the job if the After SQL statement fails.
  • Type: boolean
  • Default: true
Before SQL (node) statement
Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once on each node before any data is processed on that node.
  • Type: string
Read Before SQL (node) statement from file
Select Yes to read the SQL statement from the file that is specified in the Before SQL (node) statement property.
  • Type: boolean
  • Default: false
Stop the job when Before SQL (node) statement fails
Select Yes to stop the job if the Before SQL (node) statement fails.
  • Type: boolean
  • Default: true
After SQL (node) statement
Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once on each node after all of the data is processed on that node.
  • Type: string
Read After SQL (node) statement from file
Select Yes to read the SQL statement from the file that is specified in the After SQL (node) statement property.
  • Type: boolean
  • Default: false
Stop the job when After SQL (node) statement fails
Select Yes to stop the job if the After SQL (node) statement fails.
  • Type: boolean
  • Default: true
Java settings
Specify the customized Java settings for the stage.
  • Type: category
Heap size
Specify the maximum Java Virtual Machine heap size in megabytes.
  • Type: integer
  • Default: 256
  • Minimum: 128
JVM options
Specify the additional Java Virtual Machine options.
  • Type: string
Limit number of returned rows
Select Yes to limit the number of rows that are returned by the connector.
  • Type: boolean
  • Default: false
Limit
Enter the maximum number of rows to be returned by the connector or each node when Partition Read is enabled.
  • Type: integer
  • Default: 1000
  • Minimum: 1