Catalog

Properties that can be customized for catalog are listed here.

Table 1. watsonx.data component: Presto (Java) catalog
Property	Description	Type	Default value / Default setting	System property	Restart containers required
`presto_hive_max_outstanding_splits`	Limit of number of splits waiting to be served by split source. After reaching this limit writers will stop writing new splits to split source until some of them are used by workers. Higher value will increase memory usage, but will allow to concentrate all IO at one time which may be much faster and increase resources utilization.	Integer		`hive.max-outstanding-splits`	Y
`presto_hive_max_initial_splits`	This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first `hive.max-initial-splits` splits with size of `hive.max-initial-split-size` instead of `hive.max-split-size`. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database.	Integer		`hive.max-initial-splits`	Y
`presto_hive_max_initial_split_size`	This property describes max size of each of initially created splits for a single query. The logic of initial splits is described in `hive.max-initial-splits` property. Changing this value changes what is considered small query. Higher value causes smaller parallelism for small queries. Lower value increases concurrency for them. This is max size, as the real size may be lower when end of blocks in single DataNode is reached.	String	32MB	`hive.max-initial-split-size`	Y
`presto_hive_max_split_size`	This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first `hive.max-initial-splits` splits with size of `hive.max-initial-split-size` instead of `hive.max-split-size`. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database.	String	64MB	`hive.max-split-size`	Y
`presto_hive_split_loader_concurrency`	This property specifies the level of concurrency for loading data from Hive tables using the Presto (Java) Hive connector. It controls the number of concurrent split loader threads that Presto (Java) can utilize to fetch data in parallel from the Hive source. A higher concurrency value can improve data retrieval speed and resource utilization, particularly for large Hive queries, but it can also increase system resource consumption.	Integer		`hive.split-loader-concurrency`	Y
`presto_hive_pushdown_filter_enabled`	This property controls whether filter pushdown is enabled in the Presto (Java) Hive connector. Filter pushdown is a query optimization technique that allows Presto (Java) to push filtering conditions directly to the underlying Hive data source. When enabled, filtering conditions specified in SQL queries are evaluated as close to the data source as possible, reducing the amount of data that needs to be transferred to Presto (Java) for processing.	Integer		`hive.pushdown-filter-enabled`	Y
`presto_hive_node_selection_strategy`	Add configuration property `hive.node-selection-strategy` to choose `NodeSelectionStrategy`. When `SOFT_AFFINITY` is selected, scheduler will make the best effort to request the same worker to fetch the same file.	String		`hive.node-selection-strategy`	Y
`presto_hive_max_partitions_per_writers`	The maximum number of partitions permitted for a single writer.	Integer		`hive.max-partitions-per-writers`
`presto_hive_metastore_timeout`	This parameter specifies the timeout duration for requests made to the Hive metastore. Note: It is applicable for version watsonx.data™ version 1.1.4 and later.	String (Duration)	10s	`hive.metastore-timeout`	Y
`hive.s3.max-error-retries`	This property allows to set the maximum number of retry attempts for S3 client operations in Hive.	Integer	50	`hive.s3.max-error-retries`	Y
`hive_s3_connect_timeout`	This property specifies the TCP connection timeout for S3 operations in Hive.	String (Duration)	1m	`hive.s3.connect-timeout`	Y
`hive_s3_socket_timeout`	This property sets the maximum time allowed for reading data from the socket for S3 operations in Hive.	String (Duration)	2m	`hive.s3.socket-timeout`	Y
`hive.s3.max-connections`	This property defines the maximum number of concurrent open connections permitted to S3 in Hive.	Integer	5000	`hive.s3.max-connections`	Y
`hive.s3.max-client-retries`	This property sets the maximum number of retry attempts for read operations for S3 in Hive.	Integer	50	`hive.s3.max-client-retries`	Y

For more information about how to customize the catalog properties, see Customization.