Catalog

Properties that can be customized for catalog are listed here.
Table 1. watsonx.data component: Presto (Java) catalog
Property Description Type Default value / Default setting System property Restart containers required
presto_hive_max_outstanding_splits Limit of number of splits waiting to be served by split source. After reaching this limit writers will stop writing new splits to split source until some of them are used by workers. Higher value will increase memory usage, but will allow to concentrate all IO at one time which may be much faster and increase resources utilization. Integer   hive.max-outstanding-splits Y
presto_hive_max_initial_splits This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first hive.max-initial-splits splits with size of hive.max-initial-split-size instead of hive.max-split-size. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database. Integer   hive.max-initial-splits Y
presto_hive_max_initial_split_size This property describes max size of each of initially created splits for a single query. The logic of initial splits is described in hive.max-initial-splits property. Changing this value changes what is considered small query. Higher value causes smaller parallelism for small queries. Lower value increases concurrency for them. This is max size, as the real size may be lower when end of blocks in single DataNode is reached. String 32MB hive.max-initial-split-size Y
presto_hive_max_split_size This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first hive.max-initial-splits splits with size of hive.max-initial-split-size instead of hive.max-split-size. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database. String 64MB hive.max-split-size Y
presto_hive_split_loader_concurrency This property specifies the level of concurrency for loading data from Hive tables using the Presto (Java) Hive connector. It controls the number of concurrent split loader threads that Presto (Java) can utilize to fetch data in parallel from the Hive source. A higher concurrency value can improve data retrieval speed and resource utilization, particularly for large Hive queries, but it can also increase system resource consumption. Integer   hive.split-loader-concurrency Y
presto_hive_pushdown_filter_enabled This property controls whether filter pushdown is enabled in the Presto (Java) Hive connector. Filter pushdown is a query optimization technique that allows Presto (Java) to push filtering conditions directly to the underlying Hive data source. When enabled, filtering conditions specified in SQL queries are evaluated as close to the data source as possible, reducing the amount of data that needs to be transferred to Presto (Java) for processing. Integer   hive.pushdown-filter-enabled Y
presto_hive_node_selection_strategy Add configuration property hive.node-selection-strategy to choose NodeSelectionStrategy. When SOFT_AFFINITY is selected, scheduler will make the best effort to request the same worker to fetch the same file. String   hive.node-selection-strategy Y
presto_hive_max_partitions_per_writers The maximum number of partitions permitted for a single writer. Integer   hive.max-partitions-per-writers  
presto_hive_metastore_timeout This parameter specifies the timeout duration for requests made to the Hive metastore.
Note: It is applicable for version watsonx.data™ version 1.1.4 and later.
String (Duration) 10s hive.metastore-timeout Y
hive.s3.max-error-retries This property allows to set the maximum number of retry attempts for S3 client operations in Hive. Integer 50 hive.s3.max-error-retries Y
hive_s3_connect_timeout This property specifies the TCP connection timeout for S3 operations in Hive. String (Duration) 1m hive.s3.connect-timeout Y
hive_s3_socket_timeout This property sets the maximum time allowed for reading data from the socket for S3 operations in Hive. String (Duration) 2m hive.s3.socket-timeout Y
hive.s3.max-connections This property defines the maximum number of concurrent open connections permitted to S3 in Hive. Integer 5000 hive.s3.max-connections Y
hive.s3.max-client-retries This property sets the maximum number of retry attempts for read operations for S3 in Hive. Integer 50 hive.s3.max-client-retries Y

For more information about how to customize the catalog properties, see Customization.