Catalog
Properties that can be customized for catalog are listed here.
| Property | Description | Type | Default value / Default setting | System property | Restart containers required |
|---|---|---|---|---|---|
presto_hive_max_outstanding_splits |
Limit of number of splits waiting to be served by split source. After reaching this limit writers will stop writing new splits to split source until some of them are used by workers. Higher value will increase memory usage, but will allow to concentrate all IO at one time which may be much faster and increase resources utilization. | Integer | hive.max-outstanding-splits |
Y | |
presto_hive_max_initial_splits |
This property describes how many splits may be initially created for a single query. The
initial splits are created to allow better concurrency for small queries. Hive connector will create
first hive.max-initial-splits splits with size of
hive.max-initial-split-size instead of hive.max-split-size. Having
this value higher will force more splits to have smaller size effectively increasing definition of
what is considered small query in database. |
Integer | hive.max-initial-splits |
Y | |
presto_hive_max_initial_split_size |
This property describes max size of each of initially created splits for a single query. The
logic of initial splits is described in hive.max-initial-splits property. Changing
this value changes what is considered small query. Higher value causes smaller parallelism for small
queries. Lower value increases concurrency for them. This is max size, as the real size may be lower
when end of blocks in single DataNode is reached. |
String | 32MB | hive.max-initial-split-size |
Y |
presto_hive_max_split_size |
This property describes how many splits may be initially created for a single query. The
initial splits are created to allow better concurrency for small queries. Hive connector will create
first hive.max-initial-splits splits with size of
hive.max-initial-split-size instead of hive.max-split-size. Having
this value higher will force more splits to have smaller size effectively increasing definition of
what is considered small query in database. |
String | 64MB | hive.max-split-size |
Y |
presto_hive_split_loader_concurrency |
This property specifies the level of concurrency for loading data from Hive tables using the Presto (Java) Hive connector. It controls the number of concurrent split loader threads that Presto (Java) can utilize to fetch data in parallel from the Hive source. A higher concurrency value can improve data retrieval speed and resource utilization, particularly for large Hive queries, but it can also increase system resource consumption. | Integer | hive.split-loader-concurrency |
Y | |
presto_hive_pushdown_filter_enabled |
This property controls whether filter pushdown is enabled in the Presto (Java) Hive connector. Filter pushdown is a query optimization technique that allows Presto (Java) to push filtering conditions directly to the underlying Hive data source. When enabled, filtering conditions specified in SQL queries are evaluated as close to the data source as possible, reducing the amount of data that needs to be transferred to Presto (Java) for processing. | Integer | hive.pushdown-filter-enabled |
Y | |
presto_hive_node_selection_strategy |
Add configuration property hive.node-selection-strategy to choose
NodeSelectionStrategy. When SOFT_AFFINITY is selected, scheduler
will make the best effort to request the same worker to fetch the same file. |
String | hive.node-selection-strategy |
Y | |
presto_hive_max_partitions_per_writers |
The maximum number of partitions permitted for a single writer. | Integer | hive.max-partitions-per-writers |
||
presto_hive_metastore_timeout |
This parameter specifies the timeout duration for requests made to the Hive metastore. Note:
It is applicable for version watsonx.data™
version 1.1.4 and later.
|
String (Duration) | 10s | hive.metastore-timeout |
Y |
hive.s3.max-error-retries |
This property allows to set the maximum number of retry attempts for S3 client operations in Hive. | Integer | 50 | hive.s3.max-error-retries |
Y |
hive_s3_connect_timeout |
This property specifies the TCP connection timeout for S3 operations in Hive. | String (Duration) | 1m | hive.s3.connect-timeout |
Y |
hive_s3_socket_timeout |
This property sets the maximum time allowed for reading data from the socket for S3 operations in Hive. | String (Duration) | 2m | hive.s3.socket-timeout |
Y |
hive.s3.max-connections |
This property defines the maximum number of concurrent open connections permitted to S3 in Hive. | Integer | 5000 | hive.s3.max-connections |
Y |
hive.s3.max-client-retries |
This property sets the maximum number of retry attempts for read operations for S3 in Hive. | Integer | 50 | hive.s3.max-client-retries |
Y |
For more information about how to customize the catalog properties, see Customization.