Specifying additional customization for watsonx.data

A project administrator can specify extra customization of properties for different components in IBM® watsonx.data. You can set extra customizations other than the default ones, as part of the post-install procedure. The following specifications are optional and it can be altered to change the component level customizations.

watsonx.data on Red Hat® OpenShift®

The complete list of possible custom resource descriptions is listed in the following table.

Table 1. watsonx.data component: Presto coordinator
Property	Description	Type	Default value / Default setting	System property	Restart containers required
`presto_coordinator_resources_limits_cpu`	Resource CPU limit for Presto coordinator, container is allowed to use only this much CPU.	Kubernetes CPU Unit	small: 12 small_mincpureq: 12 medium: 12 large: 12 xlarge: 12 xxlarge: 12	`resources.limits.cpu`	N
`presto_coordinator_resources_limits_memory`	Resource Memory limit for Presto coordinator, container is allowed to use only this much memory.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G	`resources.limits.memory` Note: For more information about the memory unit, see Memory resource units.	N
`presto_coordinator_resources_limits_ephemeral_storage`	This parameter sets the maximum amount of local ephemeral storage that a container in a Presto coordinator Pod can consume.	Units of Bytes	small: 10G small_mincpureq: 10G medium: 10G large: 10G xlarge: 10G xxlarge: 10G	`resources.limits.ephemeral-storage`	N
`presto_coordinator_resources_requests_cpu`	Resource CPU request for Presto coordinator.	Kubernetes CPU Unit	small: 12 small_mincpureq: 0.005 medium: 12 large: 12 xlarge: 12 xxlarge: 12	`resources.requests.cpu`	N
`presto_coordinator_resources_requests_memory`	Resource memory request for Presto coordinator.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G	`resources.requests.memory` Note: For more information about the memory unit, see Memory resource units.	N
`presto_coordinator_resources_requests_ephemeral_storage`	This parameter sets the minimum/guaranteed size of local ephemeral storage that a container in a Presto coordinator Pod requests.	Units of Bytes	small: 1G small_mincpureq: 1G medium: 1G large: 1G xlarge: 1G xxlarge: 1G	`resources.request.ephemeral-storage`	N
`presto_coordinator_jvm_Xmx`	Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM).	-	-	`jvm.config.Xmx`	Y
`presto_coordinator_task_concurrency`	Default local concurrency for parallel operators such as joins and aggregations.	Number(must be the power of two)	16	`config.properties.task.concurrency`	Y
`presto_coordinator_query_max_memory`	The maximum amount of user memory that a query can use across the entire cluster.	Data size	1TB	`config.properties.query.max-memory`	Y
`presto_coordinator_query_max_memory_per_node`	The maximum amount of user memory that a query can use on a worker.	Data size	presto_coordinator_jvm_Xmx*0.795	`config.properties.query.max-memory-per-node`	Y
`presto_coordinator_query_max_total_memory_per_node`	The maximum amount of user and system memory that a query can use on a worker.	Data size	presto_coordinator_jvm_Xmx*0.795	`config.properties.query.max-total-memory-per-node`	Y
`presto_coordinator_query_max_concurrent_queries`	Describes how many queries can be processed simultaneously in a single cluster node.	Integer	15	`config.properties.query.max-concurrent-queries`	Y
`presto_coordinator_memory_heap_headroom_per_node`	The amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto.	Data size	presto_coordinator_jvm_Xmx*0.2	`config.properties.query.memory.heap-headeroom-per-node`	Y
`presto_coordinator_query_max_total_memory`	The maximum amount of user and system memory that a query can use across the entire cluster.	Data size	2TB	`config.properties.query.max-total-memory`	Y
`presto_coordinator_experimental_optimized_repartitioning`	Improve performance of repartitioning data between stages.	Boolean	`true`	`experimental.optimized-repartitioning`	Y
`presto_coordinator_experimental_pushdown_dereference_enabled`	Add support for pushdown of dereference expressions for querying nested data.	Boolean	-	`experimental.pushdown-dereference-enabled`	Y
`presto_coordinator_experimental_pushdown_subfields_enabled`	Add support for pushdown of subfields expressions for querying nested data.	Boolean	-	`experimental.pushdown-subfields-enabled`	Y
`presto_coordinator_join_max_broadcast_table_size`	Add join-max-broadcast-table-size configuration property and`join_max_broadcast_table_size` session property to control the maximum estimated size of a table that can be broadcast when using `AUTOMATIC` join distribution type.	Integer	-	`join-max-broadcast-table-size`	Y
`presto_coordinator_node_scheduler_max_pending_splits_per_task`	The number of outstanding splits with the standard split weight that can be queued for each worker node for a single stage of a query, even when the node is already at the limit for total number of splits. Allowing a minimum number of splits per stage is required to prevent starvation and deadlocks. This value must be smaller than `node-scheduler.max-splits-per-node`, will usually be increased for the same reasons, and has similar drawbacks if set too high.	Integer	-	`node-scheduler.max-pending-splits-per-task`	Y
`presto_coordinator_node_scheduler_max_splits_per_node`	The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler.	Integer	-	`node-scheduler.max-splits-per-node`	Y
`presto_coordinator_optimizer_prefer_partial_aggregation`	This property allow users to disable partial aggregations for queries that do not benefit.	Boolean	-	`optimizer.prefer-partial-aggregation`	Y
`presto_coordinator_query_execution_policy`	Configures the algorithm to organize the processing of all of the stages of a query.	String	`phased`	`query.execution-policy`	Y
`presto_coordinator_query_low_memory_killer_policy`	The policy used for selecting the query to kill when the cluster is out of memory (OOM). This property can have one of the following values: `none, total-reservation, or total-reservation-on-blocked-nodes`. `none` disables the cluster OOM killer. The value of total-reservation configures a policy that kills the query with the largest memory reservation across the cluster. The value of `total-reservation-on-blocked-nodes` configures a policy that kills the query using the most memory on the workers that are out of memory (blocked).	String	`total-reservation-on-blocked-nodes`	`query.low-memory-killer.policy`	Y
`presto_coordinator_query_max_stage_count`	Add a limit on the number of stages in a query. The default is 100 and can be changed with the `query.max-stage-count`configuration property and the `query_max_stage_count` session property.	Integer	200	`query.max-stage-count`	Y
`presto_coordinator_query_min_schedule_split_batch_size`	Add `query.min-schedule-split-batch-size` config flag to set the minimum number of splits to consider for scheduling per batch.	Boolean	-	`query.min-schedule-split-batch-size`	Y
`presto_coordinator_query_stage_count_warning_threshold`	Add a config option (`query.stage-count-warning-threshold`) to specify a per-query threshold for the number of stages. When this threshold is exceeded, a `TOO_MANY_STAGES` warning is raised.	Integer	150	`query.stage-count-warning-threshold`	Y
`presto_coordinator_scale_writers`	Enable writer scaling by dynamically increasing the number of writer tasks on the cluster.	Boolean	-	`scale-writers`	Y
`presto_coordinator_sink_max_buffer_size`	Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment.	Integer	-	`sink.max-buffer-size`	Y
`presto_coordinator_experimental_max_revocable_memory_per_node`	The amount of revocable memory a query can use on each node.	Units of Bytes	-	`experimental.max-revocable-memory-per-node`	Y
`presto_coordinator_experimental_reserved_pool_enabled`	This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock.	Boolean	False	`experimental.reserved-pool-enabled`	Y
`presto_coordinator_query_min_expire_age`	This property describes the minimum time after which you can remove the query metadata from the server.	String (Duration)	120 minuets	`query.min-expire-age`	Y
`presto_coordinator_enable_dynamic_filtering`	This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support.	Boolean	-	`experimental.enable-dynamic-filtering`	Y
`presto_coordinator_com_facebook_presto_governance`	This property sets the minimum log level for the logger `com.facebook.presto.governance`. It helps to customize the logging behavior based on the severity of log messages.	String(log levels) Note: There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`		`com.facebook.presto.governance`
`presto_coordinator_com_facebook_presto_governance_util`	This property sets the minimum log level for the logger `com.facebook.presto.governance_util`. It helps to customize the logging behavior based on the severity of log messages.	String(log levels) Note: There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`		`com.facebook.presto.governance.util`
`presto_coordinator_com_facebook_presto_dispatcher`	This property sets the minimum log level for the logger `com.facebook.presto.dispatcher`. It helps to customize the logging behavior based on the severity of log messages.	String(log levels) Note: There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`		`com.facebook.presto.dispatcher`
`presto_coordinator_exchange_client_threads`	This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution.	Integer		`exchange.client-threads`
`presto_coordinator_exchange_http_client_max_connections`	This property specifies the maximum number of HTTP connections that the Exchange service can establish concurrently across all servers it interacts with. it helps to regulate the total number of simultaneous connections used by the exchange client for communication between Presto nodes.	Integer		`exchange.http-client.max-connections`
`presto_coordinator_exchange_http_client_max_requests_queued_per_destination`	This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client.	Integer		`exchange.http-client.max-requests-queued-per-destination`
`presto_coordinator_http_server_log_max_size`	This property specifies the maximum file size for the log file generated by the HTTP server component.	Units of Bytes		`http-server.log.max-size`
`presto_coordinator_http_server_log_max_history`	The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content	Integer		`http-server.log.max-history`
`presto_coordinator_http_server_threads_max`		Integer		`http-server.threads.max`
`presto_coordinator_join_max_broadcast_table_size`	This property allows to specify a maximum size for replicated tables used in joins.	Units of Bytes		`join-max-broadcast-table-size`
`presto_coordinator_log_max_history`	This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out.	Integer		`log.max-history`
`presto_coordinator_log_max_size`	The property `log.max-size` defines the maximum file size allowed for the general application log file.	Units of Bytes		`log.max-size`
`presto_coordinator_node_scheduler_max_splits_per_node`	This property specifies the target maximum number of splits that can concurrently run on each worker node. Splits represent units of work within queries. Adjusting this property allows administrators to optimize resource utilization, especially in scenarios involving large query batches or connectors generating numerous splits. CAUTION: Setting `presto_coordinator_node_scheduler_max_splits_per_node` at too high value might lead to inefficient memory usage and performance degradation. Set this property such that there is always at least one split waiting to be processed, but not higher.	Integer		`node-scheduler.max-splits-per-node`
`presto_coordinator_optimize_nulls_in_join`	This property when enabled reduces the overhead of processing `NULL` values during `JOIN` operations, particularly beneficial when dealing with columns containing a significant number of `NULL` values.	Boolean		`optimize-nulls-in-join`
`presto_coordinator_optimizer_default_filter_factor_enabled`	This property enables the use of a default value for estimating the cost of filters in query optimization.	Boolean		`optimizer.default-filter-factor-enabled`
`presto_coordinator_optimizer_exploit_constraints`	This property enables constraint optimizations for querying catalogs that support table constraints.	Boolean		`optimizer.exploit-constraints`
`presto_coordinator_query_client_timeout`	This property specifies the duration for which the cluster waits without any communication from the client application, (for example CLI) before abandoning and canceling the ongoing query or task.	String (Duration)		`query.client.timeout`
`presto_coordinator_query_max_execution_time`	This property specifies the maximum allowed time for a query to be actively executing on the cluster before termination.	String (Duration)		`query.max-execution-time`
`presto_coordinator_query_max_history`	This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this value is reached, queries are removed based on age	Integer		`query.max-history`
`presto_coordinator_query_max_length`	This property specifies the maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error.	Integer		`query.max-length`
`presto_coordinator_shutdown_grace_period`	This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete.	String (Duration)		`shutdown.grace-period`
`presto_coordinator_experimental_max_spill_per_node`	This property refers to the maximum spill space used by all queries on a single node (when the memory allocated for query processing is exceeded).	Units of Bytes		`experimental.max-spill-per-node`
`presto_coordinator_experimental_query_max_spill_per_node`	This property refers to the maximum spill space used by a single query on a single node.	Units of Bytes		`experimental.query-max-spill-per-node`
`presto_coordinator_experimental_spiller_max_used_space_threshold`	This property sets a threshold disk space usage ratio. If the usage exceeds beyond threshold value, this spill path becomes ineligible for spilling.	Double		`experimental.spiller-max-used-space-threshold`
`presto_coordinator_experimental_spiller_spill_path`	This property specifies a directory where spilled content is written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to use multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.)	String		`experimental.spiller-spill-path`
`presto_coordinator_httpserver_max_request_header_size`	This property is used to set the maximum size of the request header that `http` supports.	Data size	16kB	`httpserver.max_request_header_size`	Y
`presto_coordinator_httpserver_max_response_header_size`	This property is used to set the maximum size of the response header that `http` supports.	Data size	16kB	`httpserver.max_response_header_size`	Y
`presto_coordinator_join_distribution_type`	This property specifies the type of distributed join to use. Allowed values include: `AUTOMATIC`, `PARTITIONED`, `BROADCAST`	String	`AUTOMATIC`	`join-distribution-type`

Table 2. watsonx.data component: Presto worker
Property	Description	Type	Default value / Default setting	System property	Restart containers required
`presto_worker_replicas`	The number of replicas for Presto worker.	Integer	small: 3 small_mincpureq: 3 medium: 9 large: 19 xlarge: 69 xxlarge: 199	`spec.replicas`	N
`presto_worker_resources_limits_cpu`	Resource CPU limit for Presto worker, container is allowed to use only this much CPU.	Kubernetes CPU Unit	small: 12 small_mincpureq: 12 medium: 12 large: 12 xlarge: 12 xxlarge: 12	`resources.limits.cpu`	N
`presto_worker_resources_limits_memory`	Resource Memory limit for Presto worker, container is allowed to use only this much memory.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G	`resources.limits.memory` Note: For more information about the memory unit, see Memory resource units.	N
`presto_worker_resources_limits_ephemeral_storage`	This parameter sets the maximum amount of local ephemeral storage that a container in a Presto worker Pod can consume.	Units of Bytes	small: 10G small_mincpureq: 10G medium: 10G large: 10G xlarge: 10G xxlarge: 10G	`resources.limits.ephemeral-storage`	N
`presto_worker_resources_requests_cpu`	Resource CPU request for Presto worker.	Kubernetes CPU Unit	small: 12 small_mincpureq: 0.005 medium: 12 large: 12 xlarge: 12 xxlarge: 12	`resources.requests.cpu`	N
`presto_worker_resources_requests_memory`	Resource memory request for Presto worker.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G	`resources.requests.memory` Note: For more information about the memory unit, see Memory resource units.	N
`presto_worker_jvm_Xmx`	`Xmx` specifies the maximum memory allocation pool for a Java virtual machine (JVM).	-	-	`jvm.config.Xmx`	Y
`presto_worker_task_concurrency`	Default local concurrency for parallel operators such as joins and aggregations.	Number(must be the power of two)	16	`config.properties.task.concurrency`	Y
`presto_worker_query_max_memory`	The maximum amount of user memory that a query can use across the entire cluster.	Data size	1TB	`config.properties.query.max-memory`	Y
`presto_worker_query_max_memory_per_node`	The maximum amount of user memory that a query can use on a worker.	Data size	presto_worker_jvm_Xmx*0.795	`config.properties.query.max-memory-per-node`	Y
`presto_worker_query_max_total_memory_per_node`	The maximum amount of user and system memory that a query can use on a worker.	Data size	presto_worker_jvm_Xmx*0.795	`config.properties.query.max-total-memory-per-node`	Y
`presto_worker_query_max_concurrent_queries`	Describes how many queries can be processed simultaneously in a single cluster node.	Integer	15	`config.properties.query.max-concurrent-queries`	Y
`presto_worker_memory_heap_headroom_per_node`	This is the amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto.	Data size	presto_worker_jvm_Xmx*0.2	`config.properties.query.memory.heap-headeroom-per-node`	Y
`presto_worker_query_max_total_memory`	The maximum amount of user and system memory that a query can use across the entire cluster.	Data size	2TB	`config.properties.query.max-total-memory`	Y
`presto_worker_experimental_optimized_repartitioning`	Improve performance of repartitioning data between stages.	Boolean	`true`	`experimental.optimized-repartitioning`	Y
`presto_worker_experimental_pushdown_dereference_enabled`	Add support for pushdown of dereference expressions for querying nested data.	Boolean	-	`experimental.pushdown-dereference-enabled`	Y
`presto_worker_experimental_pushdown_subfields_enabled`	Add support for pushdown of subfields expressions for querying nested data.	Boolean	-	`experimental.pushdown-subfields-enabled`	Y
`presto_worker_join_max_broadcast_table_size`	Add join-max-broadcast-table-size configuration property and `join_max_broadcast_table_size` session property to control the maximum estimated size of a table that can be broadcast when using AUTOMATIC join distribution type.	Integer	-	`join-max-broadcast-table-size`	Y
`presto_worker_node_scheduler_max_pending_splits_per_task`	The number of outstanding splits with the standard split weight that can be queued for each worker node for a single stage of a query, even when the node is already at the limit for total number of splits. Allowing a minimum number of splits per stage is required to prevent starvation and deadlocks. This value must be smaller than `node-scheduler.max-splits-per-node`, will usually be increased for the same reasons, and has similar drawbacks if set too high.	Integer	-	`node-scheduler.max-pending-splits-per-task`	Y
`presto_worker_node_scheduler_max_splits_per_node`	The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler.	Integer	-	`node-scheduler.max-splits-per-node`	Y
`presto_worker_optimizer_prefer_partial_aggregation`	This property allow users to disable partial aggregations for queries that do not benefit.	Boolean	-	`optimizer.prefer-partial-aggregation`	Y
`presto_worker_query_execution_policy`	Configures the algorithm to organize the processing of all of the stages of a query.	String	`phased`	`query.execution-policy`	Y
`presto_worker_query_low_memory_killer_policy`	The policy used for selecting the query to kill when the cluster is out of memory (OOM). This property can have one of the following values: none, total-reservation, or `total-reservation-on-blocked-nodes`. none disables the cluster OOM killer. The value of total-reservation configures a policy that kills the query with the largest memory reservation across the cluster. The value of `total-reservation-on-blocked-nodes` configures a policy that kills the query using the most memory on the workers that are out of memory (blocked).	String	`total-reservation-on-blocked-nodes`	`query.low-memory-killer.policy`	Y
`presto_worker_query_max_stage_count`	Add a limit on the number of stages in a query. The default is 100 and can be changed with the `query.max-stage-count` configuration property and the query_max_stage_count session property.	Integer	200	`query.max-stage-count`	Y
`presto_worker_query_min_schedule_split_batch_size`	Add `query.min-schedule-split-batch-size` config flag to set the minimum number of splits to consider for scheduling per batch.	Boolean	-	`query.min-schedule-split-batch-size`	Y
`presto_worker_query_stage_count_warning_threshold`	Add a config option (`query.stage-count-warning-threshold`) to specify a per-query threshold for the number of stages. When this threshold is exceeded, a `TOO_MANY_STAGES` warning is raised.	Integer	150	`query.stage-count-warning-threshold`	Y
`presto_worker_scale_writers`	Enable writer scaling by dynamically increasing the number of writer tasks on the cluster.	Boolean	-	`scale-writers`	Y
`presto_worker_sink_max_buffer_size`	Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment.	Integer	-	`sink.max-buffer-size`	Y
`presto_worker_experimental_max_revocable_memory_per_node`	The amount of revocable memory a query can use on each node.	Units of Bytes	-	`experimental.max-revocable-memory-per-node`	Y
`presto_worker_experimental_reserved_pool_enabled`	This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock.	Boolean	False	`experimental.reserved-pool-enabled`	Y
`presto_worker_ query_min_expire_age`	This property describes the minimum time after which you can remove the query metadata from the server.	String (Duration)	120 minuets	`query.min-expire-age`	Y
`presto_worker_enable_dynamic_filtering`	This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support.	Boolean	-	`experimental.enable-dynamic-filtering`	Y
`presto_worker_exchange_client_threads`	This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution	Integer		`exchange.client-threads`
`presto_worker_exchange_http_client_max_connections`	This property specifies the maximum number of HTTP connections that the Exchange service can establish concurrently across all servers it interacts with. it helps to regulate the total number of simultaneous connections used by the exchange client for communication between Presto nodes.	Integer		`exchange.http-client.max-connections`
`presto_worker_exchange_http_client_max_requests_queued_per_destination`	This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client.	Integer		`exchange.http-client.max-requests-queued-per-destination`
`presto_worker_http_server_log_max_size`	This property specifies the maximum file size for the log file generated by the HTTP server component.	Units of Bytes		`http-server.log.max-size`
`presto_worker_http_server_log_max_history`	The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content	Integer		`http-server.log.max-history`
`presto_worker_http_server_threads_max`		Integer		`http-server.threads.max`
`presto_worker_join_max_broadcast_table_size`	This property allows to specify a maximum size for replicated tables used in joins.	Units of Bytes		`join-max-broadcast-table-size`
`presto_worker_log_max_history`	This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out.	Integer		`log.max-history`
`presto_worker_log_max_size`	The property `log.max-size` defines the maximum file size allowed for the general application log file.	Units of Bytes		`log.max-size`
`presto_worker_node_scheduler_max_splits_per_node`	This property specifies the target maximum number of splits that can concurrently run on each worker node. Splits represent units of work within queries. Adjusting this property allows administrators to optimize resource utilization, especially in scenarios involving large query batches or connectors generating numerous splits. CAUTION: Setting `presto_worker_node_scheduler_max_splits_per_node` at too high value might lead to inefficient memory usage and performance degradation. Ideally, it should be set such that there is always at least one split waiting to be processed, but not higher.	Integer		`node-scheduler.max-splits-per-node`
`presto_worker_optimize_nulls_in_join`	This property when enabled reduces the overhead of processing `NULL` values during `JOIN` operations, particularly beneficial when dealing with columns containing a significant number of `NULL` values.	Boolean		`optimize-nulls-in-join`
`presto_worker_optimizer_default_filter_factor_enabled`	This property enables the use of a default value for estimating the cost of filters in query optimization.	Boolean		`optimizer.default-filter-factor-enabled`
`presto_worker_optimizer_exploit_constraints`	This property enables constraint optimizations for querying catalogs that support table constraints.	Boolean		`optimizer.exploit-constraints`
`presto_worker_query_client_timeout`	This property specifies the duration for which the cluster waits without any communication from the client application, (for example, CLI) before abandoning and canceling the ongoing query or work.	String (Duration)		`query.client.timeout`
`presto_worker_query_max_execution_time`	This property specifies the maximum allowed time for a query to be actively executing on the cluster, before it is terminated.	String (Duration)		`query.max-execution-time`
`presto_worker_query_max_history`	This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this value is reached, queries are removed based on age	Integer		`query.max-history`
`presto_worker_query_max_length`	The maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error.	Integer		`query.max-length`
`presto_worker_shutdown_grace_period`	This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete.	String (Duration)		`shutdown.grace-period`
`presto_worker_experimental_max_spill_per_node`	This property refers to the maximum spill space to be used by all queries on a single node (when the memory allocated for query processing is exceeded).	Units of Bytes		`experimental.max-spill-per-node`
`presto_worker_experimental_query_max_spill_per_node`	This property refers to the maximum spill space to be used by a single query on a single node.	Units of Bytes		`experimental.query-max-spill-per-node`
`presto_worker_experimental_spiller_max_used_space_threshold`	This property sets a threshold disk space usage ratio, if the usage exceeds beyond this value, this spill path will not be eligible for spilling.	Double		`experimental.spiller-max-used-space-threshold`
`presto_worker_experimental_spiller_spill_path`	This property specifies a directory where spilled content is written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to use multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.)	String		`experimental.spiller-spill-path`
`presto_worker_resources_requests_ephemeral_storage`	This parameter sets the minimum/guaranteed size of local ephemeral storage that a container in a Presto worker Pod requests.	Units of Bytes	small: 1G small_mincpureq: 1G medium: 1G large: 1G xlarge: 1G xxlarge: 1G	`resources.request.ephemeral-storage`	N
`presto_worker_httpserver_max_request_header_size`	This property is used to set the maximum size of the request header that `http` supports.	Data size	16kB	`httpserver.max_request_header_size`	Y
`presto_worker_httpserver_max_response_header_size`	This property is used to set the maximum size of the response header that `http` supports.	Data size	16kB	`httpserver.max_response_header_size`	Y

Table 3. watsonx.data component: Presto singlenode
Property	Description	Type	Default value / Default setting	System property	Restart containers required
`presto_singlenode_resources_limits_cpu`	Resource CPU limit for Presto singlenode, container is allowed to use only this much CPU.	Kubernetes CPU Unit	small: 3 small_mincpureq: 3 medium: 6 large: 9 xlarge: 12 xxlarge: 12	`resources.limits.cpu`	N
`presto_singlenode_resources_limits_memory`	Resource Memory limit for Presto singlenode, container is allowed to use only this much memory.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	small: 24G small_mincpureq: 24G medium: 48G large: 72G xlarge: 96G xxlarge: 96G	`resources.limits.memory`	N
`presto_singlenode_resources_limits_ephemeral_storage`	This parameter sets the maximum amount of local ephemeral storage that a container in a Presto singlenode Pod can consume.	Units of Bytes	small: 10G small_mincpureq: 10G medium: 10G large: 10G xlarge: 10G xxlarge: 10G	`resources.limits.ephemeral-storage`	N
`presto_singlenode_resources_requests_cpu`	Resource CPU request for Presto single node.	Kubernetes CPU Unit	small: 3 small_mincpureq: 0.005 medium: 6 large: 9 xlarge: 12 xxlarge: 12	`resources.requests.cpu`	N
`presto_singlenode_resources_requests_memory`	Resource memory request for Presto singlenode.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	small: 24G small_mincpureq: 24G medium: 48G large: 72G xlarge: 96G xxlarge: 96G	`resources.requests.memory`	N
`presto_singlenode_resources_requests_ephemeral_storage`	This parameter sets the minimum/guaranteed amount of local ephemeral storage for a container in a Presto singlenode Pod.	Units of Bytes	small: 500Mi small_mincpureq: 500Mi medium: 1G large: 1G xlarge: 1G xxlarge: 1G	`resources.request.ephemeral-storage`	N
`presto_singlenode_jvm_Xmx`	Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM).	-	-	`jvm.config.Xmx`	Y
`presto_singlenode_task_concurrency`	Default local concurrency for parallel operators such as joins and aggregations.	Number(must be the power of two)	-	`config.properties.task.concurrency`	Y
`presto_singlenode_query_max_memory`	The maximum amount of user memory that a query can use across the entire cluster.	Data size	1TB	`config.properties.query.max-memory`	Y
`presto_singlenode_query_max_memory_per_node`	The maximum amount of user memory that a query can use on a worker.	Data size	presto_singlenode_jvm_Xmx*0.795	`config.properties.query.max-memory-per-node`	Y
`presto_singlenode_query_max_total_memory_per_node`	The maximum amount of user and system memory that a query can use on a worker.	Data size	presto_singlenode_jvm_Xmx*0.795	`config.properties.query.max-total-memory-per-node`	Y
`presto_singlenode_query_max_concurrent_queries`	Describes how many queries can be processed simultaneously in a single cluster node.	Integer	-	`config.properties.query.max-concurrent-queries`	Y
`presto_singlenode_memory_heap_headroom_per_node`	This is the amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto.	Data size	presto_singlenode_jvm_Xmx*0.2	`config.properties.query.memory.heap-headeroom-per-node`	Y
`presto_singlenode_query_max_total_memory`	The maximum amount of user and system memory that a query can use across the entire cluster.	Data size	2TB	`config.properties.query.max-total-memory`	Y
`presto_singlenode_experimental_optimized_repartitioning`	Improve performance of repartitioning data between stages.	Boolean	`true`	`experimental.optimized-repartitioning`	Y
`presto_singlenode_experimental_pushdown_dereference_enabled`	Add support for pushdown of dereference expressions for querying nested data.	Boolean	-	`experimental.pushdown-dereference-enabled`	Y
`presto_singlenode_experimental_pushdown_subfields_enabled`	Add support for pushdown of subfields expressions for querying nested data.	Boolean	-	`experimental.pushdown-subfields-enabled`	Y
`presto_singlenode_join_max_broadcast_table_size`	Add join-max-broadcast-table-size configuration property and `join_max_broadcast_table_size` session property to control the maximum estimated size of a table that can be broadcast when using AUTOMATIC join distribution type.	Integer	-	`join-max-broadcast-table-size`	Y
`presto_singlenode_node_scheduler_max_pending_splits_per_task`	The number of outstanding splits with the standard split weight that can be queued for each singlenode node for a single stage of a query, even when the node is already at the limit for total number of splits. Allowing a minimum number of splits per stage is required to prevent starvation and deadlocks. This value must be smaller than `node-scheduler.max-splits-per-node`, will usually be increased for the same reasons, and has similar drawbacks if set too high.	Integer	-	`node-scheduler.max-pending-splits-per-task`	Y
`presto_singlenode_node_scheduler_max_splits_per_node`	The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler.	Integer	-	`node-scheduler.max-splits-per-node`	Y
`presto_singlenode_optimizer_prefer_partial_aggregation`	This property allow users to disable partial aggregations for queries that do not benefit.	Boolean	-	`optimizer.prefer-partial-aggregation`	Y
`presto_singlenode_query_execution_policy`	Configures the algorithm to organize the processing of all of the stages of a query.	String	`phased`	`query.execution-policy`	Y
`presto_singlenode_query_low_memory_killer_policy`	The policy used for selecting the query to kill when the cluster is out of memory (OOM). This property can have one of the following values: `none`, `total-reservation`, or `total-reservation-on-blocked-nodes`. none disables the cluster OOM killer. The value of total-reservation configures a policy that kills the query with the largest memory reservation across the cluster. The value of `total-reservation-on-blocked-nodes` configures a policy that kills the query using the most memory on the workers that are out of memory (blocked).	String	`total-reservation-on-blocked-nodes`	`query.low-memory-killer.policy`	Y
`presto_singlenode_query_max_stage_count`	Add a limit on the number of stages in a query. The default is 100 and can be changed with the `query.max-stage-count` configuration property and the `query_max_stage_count` session property.	Integer	200	`query.max-stage-count`	Y
`presto_singlenode_query_min_schedule_split_batch_size`	Add `query.min-schedule-split-batch-size` config flag to set the minimum number of splits to consider for scheduling per batch.	Boolean	-	`query.min-schedule-split-batch-size`	Y
`presto_singlenode_query_stage_count_warning_threshold`	Add a config ox`ption (`query.stage-count-warning-threshold`) to specify a per-query threshold for the number of stages. When this threshold is exceeded, a `TOO_MANY_STAGES` warning is raised.	Integer	150	`query.stage-count-warning-threshold`	Y
`presto_singlenode_scale_writers`	Enable writer scaling by dynamically increasing the number of writer tasks on the cluster.	Boolean	-	`scale-writers`	Y
`presto_singlenode_sink_max_buffer_size`	Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment.	Integer	-	`sink.max-buffer-size`	Y
`presto_singlenode_experimental_max_revocable_memory_per_node`	The amount of revocable memory a query can use on each node.	Units of Bytes	-	`experimental.max-revocable-memory-per-node`	Y
`presto_singlenode_experimental_reserved_pool_enabled`	This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock.	Boolean	False	`experimental.reserved-pool-enabled`	Y
`presto_singlenode_ query_min_expire_age`	This property describes the minimum time after which you can remove the query metadata from the server.	String	120 minuets	`query.min-expire-age`	Y
`presto_singlenode_enable_dynamic_filtering`	This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support.	Boolean	-	`experimental.enable-dynamic-filtering`	Y
`presto_singlenode_exchange_client_threads`	This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution	Integer		`exchange.client-threads`
`presto_singlenode_exchange_http_client_max_connections`		Integer		`exchange.http-client.max-connections`
`presto_singlenode_exchange_http_client_max_requests_queued_per_destination`	This property determines the maximum number of `HTTP` requests that can be queued for each destination server by the exchange client.	Integer		`exchange.http-client.max-requests-queued-per-destination`
`presto_singlenode_http_server_log_max_size`	This property specifies the maximum file size for the log file generated by the `HTTP` server component.	Units of Bytes		`http-server.log.max-size`
`presto_singlenode_http_server_log_max_history`	The property specifies the maximum number of log files that the `HTTP` server component will retain before rotating old log content	Integer		`http-server.log.max-history`
`presto_singlenode_http_server_threads_max`		Integer		`http-server.threads.max`
`presto_singlenode_join_max_broadcast_table_size`	This property allows to specify a maximum size for replicated tables used in joins.	Units of Bytes		`join-max-broadcast-table-size`
`presto_singlenode_log_max_history`	This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out.	Integer		`log.max-history`
`presto_singlenode_log_max_size`	The property `log.max-size` defines the maximum file size allowed for the general application log file.	Units of Bytes		`log.max-size`
`presto_singlenode_node_scheduler_max_splits_per_node`	This property specifies the target maximum number of splits that can concurrently run on each worker node. Splits represent units of work within queries. Adjusting this property allows administrators to optimize resource utilization, especially in scenarios involving large query batches or connectors generating numerous splits. CAUTION: Setting `presto_singlenode_node_scheduler_max_splits_per_node` at too high value might lead to inefficient memory usage and performance degradation. Ideally, it should be set such that there is always at least one split waiting to be processed, but not higher.	Integer		`node-scheduler.max-splits-per-node`
`presto_singlenode_optimize_nulls_in_join`	This property when enabled reduces the overhead of processing NULL values during JOIN operations, particularly beneficial when dealing with columns containing a significant number of NULLs.	Boolean		`optimize-nulls-in-join`
`presto_singlenode_optimizer_default_filter_factor_enabled`	This property enables the use of a default value for estimating the cost of filters in query optimization.	Boolean		`optimizer.default-filter-factor-enabled`
`presto_singlenode_optimizer_exploit_constraints`	This property enables constraint optimizations for querying catalogs that support table constraints.	Boolean		`optimizer.exploit-constraints`
`presto_singlenode_query_client_timeout`	This property specifies the duration for which the cluster waits without any communication from the client application, such as the CLI, before abandoning and canceling the ongoing query or work.	String (Duration)		`query.client.timeout`
`presto_singlenode_query_max_execution_time`	This property specifies the maximum allowed time for a query to be actively executing on the cluster, before it is terminated.	String (Duration)		`query.max-execution-time`
`presto_singlenode_query_max_history`	This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this amount is reached, queries are removed based on age	Integer		`query.max-history`
`presto_singlenode_query_max_length`	The maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error.	Integer		`query.max-length`
`presto_singlenode_shutdown_grace_period`	This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete.	String (Duration)		`shutdown.grace-period`
`presto_singlenode_experimental_max_spill_per_node`	This property refers to the maximum spill space to be used by all queries on a single node. (when the memory allocated for query processing is exceeded.)	Units of Bytes		`experimental.max-spill-per-node`
`presto_singlenode_experimental_query_max_spill_per_node`	This property refers to the maximum spill space to be used by a single query on a single node.	Units of Bytes		`experimental.query-max-spill-per-node`
`presto_singlenode_experimental_spiller_max_used_space_threshold`	This property sets a threshold disk space usage ratio, if the usgage exceeds beyond this value, this spill path will not be eligible for spilling.	Double		`experimental.spiller-max-used-space-threshold`
`presto_singlenode_experimental_spiller_spill_path`	This property specifies a directory where spilled content will be written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to utilize multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.)	String		`experimental.spiller-spill-path`
`presto_singlenode_httpserver_max_request_header_size`	This property is used to set the maximum size of the request header that `http` supports.	Data size	16kB	`httpserver.max_request_header_size`	Y
`presto_singlenode_httpserver_max_response_header_size`	This property is used to set the maximum size of the response header that `http` supports.	Data size	16kB	`httpserver.max_response_header_size`	Y

Table 4. watsonx.data component: Postgres
Property	Description	Type	System property	Restart containers required
`postgres_replicas`	The number of replicas for Postgres.	Integer	`spec.replicas`	N
`postgres_resources_limits_cpu`	Resource CPU limit for Postgres. Container is allowed to use only this much CPU.	Kubernetes CPU Unit	`resources.limits.cpu`	N
`postgres_resources_limits_memory`	Resource Memory limit for Postgres. Container is allowed to use only this much memory.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	`resources.limits.memory`	N
`postgres_resources_requests_cpu`	Resource CPU request for Postgres.	Kubernetes CPU Unit	`resources.requests.cpu`	N
`postgres_resources_requests_memory`	Resource memory request for Postgres.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	`resources.requests.memory`	N

Table 5. watsonx.data component: MinIO
Property	Description	Type	System property	Restart containers required
`minio_resources_limits_cpu`	Resource CPU limit for MinIO. Container is allowed to use only this much CPU.	Kubernetes CPU Unit	`resources.limits.cpu`	N
`minio_resources_limits_memory`	Resource Memory limit for MinIO. A container is allowed to use only this much memory.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	`resources.limits.memory`	N
`minio_resources_requests_cpu`	Resource CPU request for MinIO.	Kubernetes CPU Unit	`resources.requests.cpu`	N
`minio_resources_requests_memory`	Resource memory request for MinIO.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	`resources.requests.memory`	N

Table 6. watsonx.data component: Hive
Property	Description	Type	System property	Restart containers required
`hive_replicas`	The number of replicas for Hive.	Integer	`spec.replicas`	N
`hive_resources_limits_cpu`	Resource CPU limit for Hive, container is allowed to use only this much CPU.	Kubernetes CPU Unit	`resources.limits.cpu`	N
`hive_resources_limits_memory`	Resource Memory limit for Hive. A container is allowed to use only this much memory.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	`resources.limits.memory`	N
`hive_resources_requests_cpu`	Resource CPU request for Hive.	Kubernetes CPU Unit	`resources.requests.cpu`	N
`hive_resources_requests_memory`	Resource memory request for Hive.	Units of Bytes Note: For more information about the memory unit, see Memory resource units.	`resources.requests.memory`	N

Table 7. watsonx.data component: Presto catalog
Property	Description	Type	Default value / Default setting	System property	Default value / Default setting	Restart containers required
`presto_hive_max_outstanding_splits`	Limit of number of splits waiting to be served by split source. After reaching this limit writers will stop writing new splits to split source until some of them are used by workers. Higher value will increase memory usage, but will allow to concentrate all IO at one time which may be much faster and increase resources utilization.	Integer		`hive.max-outstanding-splits`	Y
`presto_hive_max_initial_splits`	This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first `hive.max-initial-splits` splits with size of `hive.max-initial-split-size` instead of `hive.max-split-size`. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database.	Integer		`hive.max-initial-splits`	Y
`presto_hive_max_initial_split_size`	This property describes max size of each of initially created splits for a single query. The logic of initial splits is described in `hive.max-initial-splits` property. Changing this value changes what is considered small query. Higher value causes smaller parallelism for small queries. Lower value increases concurrency for them. This is max size, as the real size may be lower when end of blocks in single DataNode is reached.	Integer		`hive.max-initial-split-size`	Y
`presto_hive_max_split_size`	This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first `hive.max-initial-splits` splits with size of `hive.max-initial-split-size` instead of `hive.max-split-size`. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database.	Integer		`hive.max-split-size`	Y
`presto_hive_split_loader_concurrency`	This property specifies the level of concurrency for loading data from Hive tables using the Presto Hive connector. It controls the number of concurrent split loader threads that Presto can utilize to fetch data in parallel from the Hive source. A higher concurrency value can improve data retrieval speed and resource utilization, particularly for large Hive queries, but it can also increase system resource consumption.	Integer		`hive.split-loader-concurrency`	Y
`presto_hive_pushdown_filter_enabled`	This property controls whether filter pushdown is enabled in the Presto Hive connector. Filter pushdown is a query optimization technique that allows Presto to push filtering conditions directly to the underlying Hive data source. When enabled, filtering conditions specified in SQL queries are evaluated as close to the data source as possible, reducing the amount of data that needs to be transferred to Presto for processing.	Integer		`hive.pushdown-filter-enabled`	Y
`presto_hive_node_selection_strategy`	Add configuration property `hive.node-selection-strategy` to choose `NodeSelectionStrategy`. When `SOFT_AFFINITY` is selected, scheduler will make the best effort to request the same worker to fetch the same file.	String		`hive.node-selection-strategy`	Y
`presto_hive_max_partitions_per_writers`	The maximum number of partitions permitted for a single writer.	Integer		`hive.max-partitions-per-writers`		Y
`presto_hive_metastore_timeout`	This parameter specifies the timeout duration for requests made to the Hive metastore. Note: It is applicable for version 1.1.4 and later of watsonx.data.	String (Duration)	`hive.metastore-timeout`	10s	Y
`hive_s3_max_error_retries`	This property allows to set the maximum number of retry attempts for S3 client operations in Hive.	Integer	`hive.s3.max-error-retries`	50	Y
`hive_s3_connect_timeout`	This property specifies the TCP connection timeout for S3 operations in Hive.	String (Duration)	`hive.s3.connect-timeout`	1m	Y
`hive_s3_socket_timeout`	This property sets the maximum time allowed for reading data from the socket for S3 operations in Hive.	String (Duration)	`hive.s3.socket-timeout`	2m	Y
`hive_s3_max_connections`	This property defines the maximum number of concurrent open connections permitted to S3 in Hive.	Integer	`hive.s3.max-connections`	5000	Y
`hive_s3_max_client_retries`	This property sets the maximum number of retry attempts for read operations for S3 in Hive.	Integer	`hive.s3.max-client-retries`	50	Y

Table 8. watsonx.data component: Engine
Property	Description	Type	System property	Restart containers required
`presto_qhmm_enable`	This parameter enables (or disables) the Query History Monitoring and Management (QHMM) service in an engine.	Boolean

You can customize the watsonx.data components by following the instructions in Customizing watsonx.data components .