Properties reference: Hive connector
This topic lists all properties that you can set to configure the stage.
Connection
- URL
- Specify the URL string for connection in the driver-specific format. The connection URL identifies the Hive data source in your environment to connect to. For information about the URL format that the driver supports, see your driver documentation.
- Type: string
- User name
- Specify the user name to use to connect to the data source.
- Type: string
- Password
- Specify the password to use to connect to the data source.
- Type: protected string
- Attributes
- Specify any driver-specific connection attributes. Enter each attribute in a separate line in name=value or name:value or name value format. For information about driver-specific attributes, see your driver documentation.
- Type: string
- Hive driver type
- Select type of Hive driver
- Type: selection
- Default: Default Hive driver
- Values:
- Default Hive driver
- Default SparkSQL driver
- JDBC configuration file
- User defined
- Driver classpath
- Specify Hive driver classpath
- Type: string
- Driver class names
- Specify Hive driver class names
- Type: string
Usage
- Write mode
- Select the mode that you want to use to write to the data source.
- Type: selection
- Default: Insert
- Values:
- Insert
- Update
- Delete
- Custom
- Generate SQL at run time
- Select Yes to automatically generate the SQL statements at run time.
- Type: boolean
- Default: true
- Table name
- Enter the name of the table that you want to access in the data source.
- Type: string
- Table action
- Select the action to complete before writing data to the table.
- Type: selection
- Default: Append
- Values:
- Append
- Create
- Replace
- Truncate
- Generate create table statement at run time
- Select Yes to automatically generate the CREATE TABLE statement at run time. Depending on the input link column data types, the driver, and the data source, the connector might not be able to determine the corresponding native data types and produce a valid statement.
- Type: boolean
- Default: true
- Stop the job when the create table statement fails.
- Select Yes to stop the job if the CREATE TABLE statement fails.
- Type: boolean
- Default: true
- Row format
- Select Delimited or SerDe row format options to create the table.
- Type: selection
- Default: Storage format
- Values:
- Storage format
- Delimited
- SerDe
- Field terminator
- Use this property to specify the field terminator to create the table.
- Type: string
- Line terminator
- Use this property to specify the line terminator to create the table.
- Type: string
- Serde library name
- Use this property to specify the library name for SerDe for creating the table e.g org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
- Type: string
- Data storage format
- Use this property to specify the storage format of the file that stores the data in the hive table.
- Type: selection
- Default: TEXTFILE
- Values:
- TEXTFILE
- SEQUENCEFILE
- AVRO
- PARQUET
- RCFILE
- ORC
- Table location
- Use this property to specify the location of the file that serves as storage for the hive table.
- Type: string
- Create table statement
- Enter the CREATE TABLE statement to run to create the target database table.
- Type: string
- Generate drop table statement at run time
- Select Yes to automatically generate the DROP TABLE statement at run time.
- Type: boolean
- Default: true
- Stop the job when drop table statement fails
- Select Yes to stop the job if the DROP TABLE statement fails.
- Type: boolean
- Default: false
- Drop table statement
- Enter the DROP TABLE statement to run to drop the target database table.
- Type: string
- Generate truncate table statement at run time
- Select Yes to automatically generate the TRUNCATE TABLE statement at run time.
- Type: boolean
- Default: true
- Stop the job when truncate table statement fails
- Select Yes to stop the job if the TRUNCATE TABLE statement fails.
- Type: boolean
- Default: true
- Truncate table statement
- Enter the TRUNCATE TABLE statement to run to truncate the target database table.
- Type: string
- Perform table action first
- Select Yes to perform the table action first. Select No to run the Before SQL statements first.
- Type: boolean
- Default: true
- Enable quoted identifiers
- Select Yes to enclose the specified table name and column names on the links in quoting strings when SQL statements are generated. The connector queries the driver to determine the quoting string. If it fails to obtain this information from the driver, the connector uses the backtick (`) character as the quoting string. The default is No.
- Type: boolean
- Default: false
- SQL
- Specify the SQL statement properties.
- Type: category
- Select statement
- Enter a SELECT statement or the fully qualified name of the file that contains the SELECT statement. The statement is used to read rows from the database.
- Type: string
- Read select statement from file
- Select Yes to read the SELECT statement from the file that is specified in the SELECT statement property.
- Type: boolean
- Default: false
- Insert statement
- Enter an INSERT statement or the fully qualified name of the file that contains an INSERT statement. The statement is used to insert rows into the database.
- Type: string
- Read insert statement from file
- Select Yes to read the INSERT statement from the file that is specified in the INSERT statement property.
- Type: boolean
- Default: false
- Update statement
- Enter an UPDATE statement or the fully qualified name of the file that contains an UPDATE statement. The statement is used to update rows in the database.
- Type: string
- Read update statement from file
- Select Yes to read the UPDATE statement from the file that is specified in the UPDATE statement property.
- Type: boolean
- Default: false
- Delete statement
- Enter a DELETE statement or the fully qualified name of the file that contains a DELETE statement. The statement is used to delete rows from the database.
- Type: string
- Read delete statement from file
- Select Yes to read the DELETE statement from the file that is specified in the DELETE statement property.
- Type: boolean
- Default: false
- Custom statements
- Custom statements to be run for each input row
- Type: string
- Read custom statements from file
- Select Yes to read the custom statements from the file that is specified in the Custom statements property.
- Type: boolean
- Default: false
- Enable partitioned reads
- Select Yes to run the statement on each processing node. When using 'Hive partition' as Partitioned read method the statement should consist of a where clause <parition-column> = [[part-value]], where partition-column is the partition from which you want to query and the [[part-value]] is the placeholder which is replaced on each processing node with the actual partition values to prepare the query. A simple example is SELECT c1,c2 from part_table where pc1=[[part-value]]. When using 'Modulus', [[mod-part-expr]] should be used. Example: SELECT * from part_table where [[mod-part-value]]. With 'Minimum and Maximum range' the placeholder is [[range-part-expr]]. Example: SELECT * from part_table where [[range-part-value]]
- Type: boolean
- Default: false
- Partitioned read methods
- Use this property to set the type of the partitioning to be used when the partitioned reads is enabled.
- Type: selection
- Default: Hive partition
- Values:
- Hive partition
- Modulus
- Minimum and Maximum range
- Column name
- Use this property to specify the key column that will be used by the selected partition read method. This column must be of tinyint, smallint, integer or bigint datatype.
- Type: string
- Table name
- Use this property to specify the table that will be used by the selected partition read method.
- Type: string
- Enable partitioned write
- Select Yes to insert data into partitioned table. In the insert query, ORCHESTRATE.<partitioned-column> would be replaced by the column data passed from input link.
- Type: boolean
- Default: false
- Transaction
- Specify the transaction settings.
- Type: category
- Record count
- Specify the number of rows that the stage reads from or writes to the data source in a single transaction. When this property is set to 0, the transaction is committed only once on each processing node of the stage after the stage processes all the rows on that node. When rows arrive on the input link of the stage in waves, the Record count value applies to each wave separately.
- Type: integer
- Default: 2000
- Minimum: 0
- Maximum: 999999999
- Mark end of wave
- Select Yes to generate an end-of-wave record after each wave of records, where the number of records in each wave is specified in the Record count property. When the Record count property is set to 0, the end-of-wave records are not generated.
- Type: selection
- Default: No
- Values:
- No
- Yes
- Record ordering
- Determines how records must be ordered across multiple links.
- Type: selection
- Default: All records
- Values:
- All records
- First record
- Ordered
- Key column
- Specify one or more columns to be used for sorting.
- Type: category
- Default: false
- Column name
- Specify the name of the column to be used for sorting.
- Type: string
- Sort order
- Select the sorting order.
- Type: selection
- Default: Ascending
- Values:
- Ascending
- Descending
- Null order
- Select the sorting order for null values. This value specifies if the null values are sorted before or after non-null values.
- Type: selection
- Default: Before
- Values:
- Before
- After
- Case sensitive
- Select Yes to perform case-sensitive sorting of values.
- Type: boolean
- Default: false
- Session
- Specify the session settings.
- Type: category
- Fetch size
- Specify the number of rows that the driver must try to fetch from the data source when the connector requests a single row. Fetching rows in addition to the row requested by the connector can improve performance because the driver can complete the subsequent requests for more rows from the connector locally without a need to access the data source. The default value is 0, which indicates that the driver optimizes the fetch operation based on its internal logic.
- Type: integer
- Default: 0
- Minimum: 0
- Maximum: 999999999
- Batch size
- Enter the number of records to include in the batch of records for each statement execution. The value 0 indicates that all input records are passed to the statements in a single batch.
- Type: integer
- Default: 2000
- Minimum: 0
- Maximum: 999999999
- Drop unmatched fields
- Select Yes to drop any fields from the input link for which there are no matching parameters in the statements configured for the stage. Select No to issue error message when an unmatched field is present on the link.
- Type: boolean
- Default: false
- Report schema mismatch
- Select Yes to perform early comparison of the column definitions on the link with the column definitions in the data source and to issue warning messages for any detected discrepancies which can result in data corruption. Depending on the environment and the usage scenario the early detection of discrepancies may not be possible in which case the error messages are reported only when the actual data corruption is detected.
- Type: boolean
- Default: false
- Default length for columns
- Enter the default length for the Char, NChar, Binary, VarChar, NVarChar, and VarBinary link columns for which the Length attribute is not set.
- Type: integer
- Default: 200
- Minimum: 1
- Maximum: 999999999
- Default length for long columns
- Enter the default length for the LongVarChar, LongNVarChar and LongVarBinary link columns for which the Length attribute is not set.
- Type: integer
- Default: 20000
- Minimum: 1
- Maximum: 999999999
- Fail on truncation
- Select Yes to fail the operation when truncation is detected for the Long Columns (LongVarChar, LongNVarChar and LongVarBinary). When the the value of this property is set to No, then the connector continues execution even when data truncation is detected.
- Type: boolean
- Default: true
- Generate all columns as Unicode
- Always generate columns as NChar, NVarChar and LongNVarChar columns instead of Char, VarChar and LongVarChar columns.
- Type: boolean
- Default: false
- Character set for non-Unicode columns
- Select the character set option for the values of Char, VarChar and LongVarChar link columns for which the Extended attribute is not set to Unicode. If you select the Default option, the character set encoding of the engine host system locale is used. If you select the Custom option, you must provide the character set name to be used.
- Type: selection
- Default: Default
- Values:
- Default
- Custom
- Character set name
- Specify the name of the character set encoding for the values of Char, VarChar and LongVarChar link columns for which the Extended attribute is not set to Unicode.
- Type: string
- Keep conductor connection alive
- Select Yes to keep the connection alive in the conductor process while the player processes are processing records. Select No to close the connection in the conductor process before player processes start processing records, and to connect again if necessary after the player processes complete processing the records.
- Type: boolean
- Default: true
- Set Hive parameters
- Enter the Hive statement to set the hive parameters.
- Type: string
- Stop the job when unable to set Hive parameters
- Select Yes to stop the job if the Hive parameters are not set.
- Type: boolean
- Default: false
- Run before and after SQL statements
- Select Yes to run SQL statements before and after data is accessed in the database.
- Type: boolean
- Default: false
- Before SQL statement
- Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once before any data is processed.
- Type: string
- Read Before SQL statement from file
- Select Yes to read the SQL statement from the file that is specified in the Before SQL statement property.
- Type: boolean
- Default: false
- Stop the job when Before SQL statement fails
- Select Yes to stop the job if the Before SQL statement fails.
- Type: boolean
- Default: true
- After SQL statement
- Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once after all of the data is processed.
- Type: string
- Read After SQL statement from the file
- Select Yes to read the SQL statement from the file that is specified in the After SQL statement property.
- Type: boolean
- Default: false
- Stop the job when After SQL statement fails
- Select Yes to stop the job if the After SQL statement fails.
- Type: boolean
- Default: true
- Before SQL (node) statement
- Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once on each node before any data is processed on that node.
- Type: string
- Read Before SQL (node) statement from file
- Select Yes to read the SQL statement from the file that is specified in the Before SQL (node) statement property.
- Type: boolean
- Default: false
- Stop the job when Before SQL (node) statement fails
- Select Yes to stop the job if the Before SQL (node) statement fails.
- Type: boolean
- Default: true
- After SQL (node) statement
- Enter the SQL statement or the fully qualified name of the file that contains the SQL statement to run once on each node after all of the data is processed on that node.
- Type: string
- Read After SQL (node) statement from file
- Select Yes to read the SQL statement from the file that is specified in the After SQL (node) statement property.
- Type: boolean
- Default: false
- Stop the job when After SQL (node) statement fails
- Select Yes to stop the job if the After SQL (node) statement fails.
- Type: boolean
- Default: true
- Java settings
- Specify the customized Java settings for the stage.
- Type: category
- Heap size
- Specify the maximum Java Virtual Machine heap size in megabytes.
- Type: integer
- Default: 256
- Minimum: 128
- JVM options
- Specify the additional Java Virtual Machine options.
- Type: string
- Limit number of returned rows
- Select Yes to limit the number of rows that are returned by the connector.
- Type: boolean
- Default: false
- Limit
- Enter the maximum number of rows to be returned by the connector or each node when Partition Read is enabled.
- Type: integer
- Default: 1000
- Minimum: 1