Specifying additional customization for watsonx.data

A project administrator can specify extra customization of properties for different components in IBM® watsonx.data. You can set extra customizations other than the default ones, as part of the post-install procedure. The following specifications are optional and it can be altered to change the component level customizations.

watsonx.data on Red Hat® OpenShift®

The complete list of possible custom resource descriptions is listed in the following table.

Table 1. watsonx.data component: Presto coordinator
Property Description Type Default value / Default setting System property Restart containers required
presto_coordinator_resources_limits_cpu Resource CPU limit for Presto coordinator, container is allowed to use only this much CPU. Kubernetes CPU Unit

small: 12

small_mincpureq: 12

medium: 12

large: 12

xlarge: 12

xxlarge: 12

resources.limits.cpu N
presto_coordinator_resources_limits_memory Resource Memory limit for Presto coordinator, container is allowed to use only this much memory. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.

small: 100G

small_mincpureq: 100G

medium: 100G

large: 100G

xlarge: 100G

xxlarge: 100G

resources.limits.memory
Note: For more information about the memory unit, see Memory resource units.
N
presto_coordinator_resources_limits_ephemeral_storage This parameter sets the maximum amount of local ephemeral storage that a container in a Presto coordinator Pod can consume. Units of Bytes

small: 10G

small_mincpureq: 10G

medium: 10G

large: 10G

xlarge: 10G

xxlarge: 10G

resources.limits.ephemeral-storage N
presto_coordinator_resources_requests_cpu Resource CPU request for Presto coordinator. Kubernetes CPU Unit

small: 12

small_mincpureq: 0.005

medium: 12

large: 12

xlarge: 12

xxlarge: 12

resources.requests.cpu N
presto_coordinator_resources_requests_memory Resource memory request for Presto coordinator. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.

small: 100G

small_mincpureq: 100G

medium: 100G

large: 100G

xlarge: 100G

xxlarge: 100G

resources.requests.memory
Note: For more information about the memory unit, see Memory resource units.
N
presto_coordinator_resources_requests_ephemeral_storage This parameter sets the minimum/guaranteed size of local ephemeral storage that a container in a Presto coordinator Pod requests. Units of Bytes

small: 1G

small_mincpureq: 1G

medium: 1G

large: 1G

xlarge: 1G

xxlarge: 1G

resources.request.ephemeral-storage N
presto_coordinator_jvm_Xmx Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM). - - jvm.config.Xmx Y
presto_coordinator_task_concurrency Default local concurrency for parallel operators such as joins and aggregations. Number(must be the power of two) 16 config.properties.task.concurrency Y
presto_coordinator_query_max_memory The maximum amount of user memory that a query can use across the entire cluster. Data size 1TB config.properties.query.max-memory Y
presto_coordinator_query_max_memory_per_node The maximum amount of user memory that a query can use on a worker. Data size presto_coordinator_jvm_Xmx*0.795 config.properties.query.max-memory-per-node Y
presto_coordinator_query_max_total_memory_per_node The maximum amount of user and system memory that a query can use on a worker. Data size presto_coordinator_jvm_Xmx*0.795 config.properties.query.max-total-memory-per-node Y
presto_coordinator_query_max_concurrent_queries Describes how many queries can be processed simultaneously in a single cluster node. Integer 15 config.properties.query.max-concurrent-queries Y
presto_coordinator_memory_heap_headroom_per_node The amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto. Data size presto_coordinator_jvm_Xmx*0.2 config.properties.query.memory.heap-headeroom-per-node Y
presto_coordinator_query_max_total_memory The maximum amount of user and system memory that a query can use across the entire cluster. Data size 2TB config.properties.query.max-total-memory Y
presto_coordinator_experimental_optimized_repartitioning Improve performance of repartitioning data between stages. Boolean true experimental.optimized-repartitioning Y
presto_coordinator_experimental_pushdown_dereference_enabled Add support for pushdown of dereference expressions for querying nested data. Boolean - experimental.pushdown-dereference-enabled Y
presto_coordinator_experimental_pushdown_subfields_enabled Add support for pushdown of subfields expressions for querying nested data. Boolean - experimental.pushdown-subfields-enabled Y
presto_coordinator_join_max_broadcast_table_size Add join-max-broadcast-table-size configuration property andjoin_max_broadcast_table_size session property to control the maximum estimated size of a table that can be broadcast when using AUTOMATIC join distribution type. Integer - join-max-broadcast-table-size Y
presto_coordinator_node_scheduler_max_pending_splits_per_task The number of outstanding splits with the standard split weight that can be queued for each worker node for a single stage of a query, even when the node is already at the limit for total number of splits. Allowing a minimum number of splits per stage is required to prevent starvation and deadlocks. This value must be smaller than node-scheduler.max-splits-per-node, will usually be increased for the same reasons, and has similar drawbacks if set too high. Integer - node-scheduler.max-pending-splits-per-task Y
presto_coordinator_node_scheduler_max_splits_per_node The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler. Integer - node-scheduler.max-splits-per-node Y
presto_coordinator_optimizer_prefer_partial_aggregation This property allow users to disable partial aggregations for queries that do not benefit. Boolean - optimizer.prefer-partial-aggregation Y
presto_coordinator_query_execution_policy Configures the algorithm to organize the processing of all of the stages of a query. String phased query.execution-policy Y
presto_coordinator_query_low_memory_killer_policy The policy used for selecting the query to kill when the cluster is out of memory (OOM). This property can have one of the following values: none, total-reservation, or total-reservation-on-blocked-nodes. none disables the cluster OOM killer. The value of total-reservation configures a policy that kills the query with the largest memory reservation across the cluster. The value of total-reservation-on-blocked-nodes configures a policy that kills the query using the most memory on the workers that are out of memory (blocked). String

total-reservation-on-blocked-nodes

query.low-memory-killer.policy Y
presto_coordinator_query_max_stage_count Add a limit on the number of stages in a query. The default is 100 and can be changed with the query.max-stage-countconfiguration property and the query_max_stage_count session property. Integer 200 query.max-stage-count Y
presto_coordinator_query_min_schedule_split_batch_size Add query.min-schedule-split-batch-size config flag to set the minimum number of splits to consider for scheduling per batch. Boolean - query.min-schedule-split-batch-size Y
presto_coordinator_query_stage_count_warning_threshold Add a config option (query.stage-count-warning-threshold) to specify a per-query threshold for the number of stages. When this threshold is exceeded, a TOO_MANY_STAGES warning is raised. Integer 150 query.stage-count-warning-threshold Y
presto_coordinator_scale_writers Enable writer scaling by dynamically increasing the number of writer tasks on the cluster. Boolean - scale-writers Y
presto_coordinator_sink_max_buffer_size Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment. Integer - sink.max-buffer-size Y
presto_coordinator_experimental_max_revocable_memory_per_node The amount of revocable memory a query can use on each node. Units of Bytes - experimental.max-revocable-memory-per-node Y
presto_coordinator_experimental_reserved_pool_enabled This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock. Boolean False experimental.reserved-pool-enabled Y
presto_coordinator_query_min_expire_age This property describes the minimum time after which you can remove the query metadata from the server. String (Duration) 120 minuets query.min-expire-age Y
presto_coordinator_enable_dynamic_filtering This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support. Boolean - experimental.enable-dynamic-filtering Y
presto_coordinator_com_facebook_presto_governance This property sets the minimum log level for the logger com.facebook.presto.governance. It helps to customize the logging behavior based on the severity of log messages. String(log levels)
Note: There are four levels: DEBUG, INFO, WARN and ERROR
  com.facebook.presto.governance  
presto_coordinator_com_facebook_presto_governance_util This property sets the minimum log level for the logger com.facebook.presto.governance_util. It helps to customize the logging behavior based on the severity of log messages. String(log levels)
Note: There are four levels: DEBUG, INFO, WARN and ERROR
  com.facebook.presto.governance.util  
presto_coordinator_com_facebook_presto_dispatcher This property sets the minimum log level for the logger com.facebook.presto.dispatcher. It helps to customize the logging behavior based on the severity of log messages. String(log levels)
Note: There are four levels: DEBUG, INFO, WARN and ERROR
  com.facebook.presto.dispatcher  
presto_coordinator_exchange_client_threads This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution. Integer   exchange.client-threads  
presto_coordinator_exchange_http_client_max_connections This property specifies the maximum number of HTTP connections that the Exchange service can establish concurrently across all servers it interacts with. it helps to regulate the total number of simultaneous connections used by the exchange client for communication between Presto nodes. Integer   exchange.http-client.max-connections  
presto_coordinator_exchange_http_client_max_requests_queued_per_destination This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client. Integer   exchange.http-client.max-requests-queued-per-destination  
presto_coordinator_http_server_log_max_size This property specifies the maximum file size for the log file generated by the HTTP server component. Units of Bytes   http-server.log.max-size  
presto_coordinator_http_server_log_max_history The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content Integer   http-server.log.max-history  
presto_coordinator_http_server_threads_max   Integer   http-server.threads.max  
presto_coordinator_join_max_broadcast_table_size This property allows to specify a maximum size for replicated tables used in joins. Units of Bytes   join-max-broadcast-table-size  
presto_coordinator_log_max_history This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out. Integer   log.max-history  
presto_coordinator_log_max_size The property log.max-size defines the maximum file size allowed for the general application log file. Units of Bytes   log.max-size  
presto_coordinator_node_scheduler_max_splits_per_node This property specifies the target maximum number of splits that can concurrently run on each worker node. Splits represent units of work within queries. Adjusting this property allows administrators to optimize resource utilization, especially in scenarios involving large query batches or connectors generating numerous splits.
CAUTION:
Setting presto_coordinator_node_scheduler_max_splits_per_node at too high value might lead to inefficient memory usage and performance degradation.
Set this property such that there is always at least one split waiting to be processed, but not higher.
Integer   node-scheduler.max-splits-per-node  
presto_coordinator_optimize_nulls_in_join This property when enabled reduces the overhead of processing NULL values during JOIN operations, particularly beneficial when dealing with columns containing a significant number of NULL values. Boolean   optimize-nulls-in-join  
presto_coordinator_optimizer_default_filter_factor_enabled This property enables the use of a default value for estimating the cost of filters in query optimization. Boolean   optimizer.default-filter-factor-enabled  
presto_coordinator_optimizer_exploit_constraints This property enables constraint optimizations for querying catalogs that support table constraints. Boolean   optimizer.exploit-constraints  
presto_coordinator_query_client_timeout This property specifies the duration for which the cluster waits without any communication from the client application, (for example CLI) before abandoning and canceling the ongoing query or task. String (Duration)   query.client.timeout  
presto_coordinator_query_max_execution_time This property specifies the maximum allowed time for a query to be actively executing on the cluster before termination. String (Duration)   query.max-execution-time  
presto_coordinator_query_max_history This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this value is reached, queries are removed based on age Integer   query.max-history  
presto_coordinator_query_max_length This property specifies the maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error. Integer   query.max-length  
presto_coordinator_shutdown_grace_period This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete. String (Duration)   shutdown.grace-period  
presto_coordinator_experimental_max_spill_per_node This property refers to the maximum spill space used by all queries on a single node (when the memory allocated for query processing is exceeded). Units of Bytes   experimental.max-spill-per-node  
presto_coordinator_experimental_query_max_spill_per_node This property refers to the maximum spill space used by a single query on a single node. Units of Bytes   experimental.query-max-spill-per-node  
presto_coordinator_experimental_spiller_max_used_space_threshold This property sets a threshold disk space usage ratio. If the usage exceeds beyond threshold value, this spill path becomes ineligible for spilling. Double   experimental.spiller-max-used-space-threshold  
presto_coordinator_experimental_spiller_spill_path This property specifies a directory where spilled content is written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to use multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.) String   experimental.spiller-spill-path  
presto_coordinator_httpserver_max_request_header_size This property is used to set the maximum size of the request header that http supports. Data size 16kB httpserver.max_request_header_size Y
presto_coordinator_httpserver_max_response_header_size This property is used to set the maximum size of the response header that http supports. Data size 16kB httpserver.max_response_header_size Y
presto_coordinator_join_distribution_type This property specifies the type of distributed join to use. Allowed values include: AUTOMATIC, PARTITIONED, BROADCAST String AUTOMATIC join-distribution-type  
Table 2. watsonx.data component: Presto worker
Property Description Type Default value / Default setting System property Restart containers required
presto_worker_replicas The number of replicas for Presto worker. Integer

small: 3

small_mincpureq: 3

medium: 9

large: 19

xlarge: 69

xxlarge: 199

spec.replicas N
presto_worker_resources_limits_cpu Resource CPU limit for Presto worker, container is allowed to use only this much CPU. Kubernetes CPU Unit

small: 12

small_mincpureq: 12

medium: 12

large: 12

xlarge: 12

xxlarge: 12

resources.limits.cpu N
presto_worker_resources_limits_memory Resource Memory limit for Presto worker, container is allowed to use only this much memory. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.

small: 100G

small_mincpureq: 100G

medium: 100G

large: 100G

xlarge: 100G

xxlarge: 100G

resources.limits.memory
Note: For more information about the memory unit, see Memory resource units.
N
presto_worker_resources_limits_ephemeral_storage This parameter sets the maximum amount of local ephemeral storage that a container in a Presto worker Pod can consume. Units of Bytes

small: 10G

small_mincpureq: 10G

medium: 10G

large: 10G

xlarge: 10G

xxlarge: 10G

resources.limits.ephemeral-storage N
presto_worker_resources_requests_cpu Resource CPU request for Presto worker. Kubernetes CPU Unit

small: 12

small_mincpureq: 0.005

medium: 12

large: 12

xlarge: 12

xxlarge: 12

resources.requests.cpu N
presto_worker_resources_requests_memory Resource memory request for Presto worker. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.

small: 100G

small_mincpureq: 100G

medium: 100G

large: 100G

xlarge: 100G

xxlarge: 100G

resources.requests.memory
Note: For more information about the memory unit, see Memory resource units.
N
presto_worker_jvm_Xmx Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM). - - jvm.config.Xmx Y
presto_worker_task_concurrency Default local concurrency for parallel operators such as joins and aggregations. Number(must be the power of two) 16 config.properties.task.concurrency Y
presto_worker_query_max_memory The maximum amount of user memory that a query can use across the entire cluster. Data size 1TB config.properties.query.max-memory Y
presto_worker_query_max_memory_per_node The maximum amount of user memory that a query can use on a worker. Data size presto_worker_jvm_Xmx*0.795 config.properties.query.max-memory-per-node Y
presto_worker_query_max_total_memory_per_node The maximum amount of user and system memory that a query can use on a worker. Data size presto_worker_jvm_Xmx*0.795 config.properties.query.max-total-memory-per-node Y
presto_worker_query_max_concurrent_queries Describes how many queries can be processed simultaneously in a single cluster node. Integer 15 config.properties.query.max-concurrent-queries Y
presto_worker_memory_heap_headroom_per_node This is the amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto. Data size presto_worker_jvm_Xmx*0.2 config.properties.query.memory.heap-headeroom-per-node Y
presto_worker_query_max_total_memory The maximum amount of user and system memory that a query can use across the entire cluster. Data size 2TB config.properties.query.max-total-memory Y
presto_worker_experimental_optimized_repartitioning Improve performance of repartitioning data between stages. Boolean true experimental.optimized-repartitioning Y
presto_worker_experimental_pushdown_dereference_enabled Add support for pushdown of dereference expressions for querying nested data. Boolean - experimental.pushdown-dereference-enabled Y
presto_worker_experimental_pushdown_subfields_enabled Add support for pushdown of subfields expressions for querying nested data. Boolean - experimental.pushdown-subfields-enabled Y
presto_worker_join_max_broadcast_table_size Add join-max-broadcast-table-size configuration property and join_max_broadcast_table_size session property to control the maximum estimated size of a table that can be broadcast when using AUTOMATIC join distribution type. Integer - join-max-broadcast-table-size Y
presto_worker_node_scheduler_max_pending_splits_per_task The number of outstanding splits with the standard split weight that can be queued for each worker node for a single stage of a query, even when the node is already at the limit for total number of splits. Allowing a minimum number of splits per stage is required to prevent starvation and deadlocks. This value must be smaller than node-scheduler.max-splits-per-node, will usually be increased for the same reasons, and has similar drawbacks if set too high. Integer - node-scheduler.max-pending-splits-per-task Y
presto_worker_node_scheduler_max_splits_per_node The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler. Integer - node-scheduler.max-splits-per-node Y
presto_worker_optimizer_prefer_partial_aggregation This property allow users to disable partial aggregations for queries that do not benefit. Boolean - optimizer.prefer-partial-aggregation Y
presto_worker_query_execution_policy Configures the algorithm to organize the processing of all of the stages of a query. String phased query.execution-policy Y
presto_worker_query_low_memory_killer_policy The policy used for selecting the query to kill when the cluster is out of memory (OOM). This property can have one of the following values: none, total-reservation, or total-reservation-on-blocked-nodes. none disables the cluster OOM killer. The value of total-reservation configures a policy that kills the query with the largest memory reservation across the cluster. The value of total-reservation-on-blocked-nodes configures a policy that kills the query using the most memory on the workers that are out of memory (blocked). String

total-reservation-on-blocked-nodes

query.low-memory-killer.policy Y
presto_worker_query_max_stage_count Add a limit on the number of stages in a query. The default is 100 and can be changed with the query.max-stage-count configuration property and the query_max_stage_count session property. Integer 200 query.max-stage-count Y
presto_worker_query_min_schedule_split_batch_size Add query.min-schedule-split-batch-size config flag to set the minimum number of splits to consider for scheduling per batch. Boolean - query.min-schedule-split-batch-size Y
presto_worker_query_stage_count_warning_threshold Add a config option (query.stage-count-warning-threshold) to specify a per-query threshold for the number of stages. When this threshold is exceeded, a TOO_MANY_STAGES warning is raised. Integer 150 query.stage-count-warning-threshold Y
presto_worker_scale_writers Enable writer scaling by dynamically increasing the number of writer tasks on the cluster. Boolean - scale-writers Y
presto_worker_sink_max_buffer_size Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment. Integer - sink.max-buffer-size Y
presto_worker_experimental_max_revocable_memory_per_node The amount of revocable memory a query can use on each node. Units of Bytes - experimental.max-revocable-memory-per-node Y
presto_worker_experimental_reserved_pool_enabled This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock. Boolean False experimental.reserved-pool-enabled Y
presto_worker_ query_min_expire_age This property describes the minimum time after which you can remove the query metadata from the server. String (Duration) 120 minuets query.min-expire-age Y
presto_worker_enable_dynamic_filtering This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support. Boolean - experimental.enable-dynamic-filtering Y
presto_worker_exchange_client_threads This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution Integer   exchange.client-threads  
presto_worker_exchange_http_client_max_connections This property specifies the maximum number of HTTP connections that the Exchange service can establish concurrently across all servers it interacts with. it helps to regulate the total number of simultaneous connections used by the exchange client for communication between Presto nodes. Integer   exchange.http-client.max-connections  
presto_worker_exchange_http_client_max_requests_queued_per_destination This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client. Integer   exchange.http-client.max-requests-queued-per-destination  
presto_worker_http_server_log_max_size This property specifies the maximum file size for the log file generated by the HTTP server component. Units of Bytes   http-server.log.max-size  
presto_worker_http_server_log_max_history The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content Integer   http-server.log.max-history  
presto_worker_http_server_threads_max   Integer   http-server.threads.max  
presto_worker_join_max_broadcast_table_size This property allows to specify a maximum size for replicated tables used in joins. Units of Bytes   join-max-broadcast-table-size  
presto_worker_log_max_history This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out. Integer   log.max-history  
presto_worker_log_max_size The property log.max-size defines the maximum file size allowed for the general application log file. Units of Bytes   log.max-size  
presto_worker_node_scheduler_max_splits_per_node This property specifies the target maximum number of splits that can concurrently run on each worker node. Splits represent units of work within queries. Adjusting this property allows administrators to optimize resource utilization, especially in scenarios involving large query batches or connectors generating numerous splits.
CAUTION:
Setting presto_worker_node_scheduler_max_splits_per_node at too high value might lead to inefficient memory usage and performance degradation.
Ideally, it should be set such that there is always at least one split waiting to be processed, but not higher.
Integer   node-scheduler.max-splits-per-node  
presto_worker_optimize_nulls_in_join This property when enabled reduces the overhead of processing NULL values during JOIN operations, particularly beneficial when dealing with columns containing a significant number of NULL values. Boolean   optimize-nulls-in-join  
presto_worker_optimizer_default_filter_factor_enabled This property enables the use of a default value for estimating the cost of filters in query optimization. Boolean   optimizer.default-filter-factor-enabled  
presto_worker_optimizer_exploit_constraints This property enables constraint optimizations for querying catalogs that support table constraints. Boolean   optimizer.exploit-constraints  
presto_worker_query_client_timeout This property specifies the duration for which the cluster waits without any communication from the client application, (for example, CLI) before abandoning and canceling the ongoing query or work. String (Duration)   query.client.timeout  
presto_worker_query_max_execution_time This property specifies the maximum allowed time for a query to be actively executing on the cluster, before it is terminated. String (Duration)   query.max-execution-time  
presto_worker_query_max_history This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this value is reached, queries are removed based on age Integer   query.max-history  
presto_worker_query_max_length The maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error. Integer   query.max-length  
presto_worker_shutdown_grace_period This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete. String (Duration)   shutdown.grace-period  
presto_worker_experimental_max_spill_per_node This property refers to the maximum spill space to be used by all queries on a single node (when the memory allocated for query processing is exceeded). Units of Bytes   experimental.max-spill-per-node  
presto_worker_experimental_query_max_spill_per_node This property refers to the maximum spill space to be used by a single query on a single node. Units of Bytes   experimental.query-max-spill-per-node  
presto_worker_experimental_spiller_max_used_space_threshold This property sets a threshold disk space usage ratio, if the usage exceeds beyond this value, this spill path will not be eligible for spilling. Double   experimental.spiller-max-used-space-threshold  
presto_worker_experimental_spiller_spill_path This property specifies a directory where spilled content is written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to use multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.) String   experimental.spiller-spill-path  
presto_worker_resources_requests_ephemeral_storage This parameter sets the minimum/guaranteed size of local ephemeral storage that a container in a Presto worker Pod requests. Units of Bytes

small: 1G

small_mincpureq: 1G

medium: 1G

large: 1G

xlarge: 1G

xxlarge: 1G

resources.request.ephemeral-storage N
presto_worker_httpserver_max_request_header_size This property is used to set the maximum size of the request header that http supports. Data size 16kB httpserver.max_request_header_size Y
presto_worker_httpserver_max_response_header_size This property is used to set the maximum size of the response header that http supports. Data size 16kB httpserver.max_response_header_size Y
Table 3. watsonx.data component: Presto singlenode
Property Description Type Default value / Default setting System property Restart containers required
presto_singlenode_resources_limits_cpu Resource CPU limit for Presto singlenode, container is allowed to use only this much CPU. Kubernetes CPU Unit

small: 3

small_mincpureq: 3

medium: 6

large: 9

xlarge: 12

xxlarge: 12

resources.limits.cpu N
presto_singlenode_resources_limits_memory Resource Memory limit for Presto singlenode, container is allowed to use only this much memory. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.

small: 24G

small_mincpureq: 24G

medium: 48G

large: 72G

xlarge: 96G

xxlarge: 96G

resources.limits.memory N
presto_singlenode_resources_limits_ephemeral_storage This parameter sets the maximum amount of local ephemeral storage that a container in a Presto singlenode Pod can consume. Units of Bytes

small: 10G

small_mincpureq: 10G

medium: 10G

large: 10G

xlarge: 10G

xxlarge: 10G

resources.limits.ephemeral-storage N
presto_singlenode_resources_requests_cpu Resource CPU request for Presto single node. Kubernetes CPU Unit

small: 3

small_mincpureq: 0.005

medium: 6

large: 9

xlarge: 12

xxlarge: 12

resources.requests.cpu N
presto_singlenode_resources_requests_memory Resource memory request for Presto singlenode. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.

small: 24G

small_mincpureq: 24G

medium: 48G

large: 72G

xlarge: 96G

xxlarge: 96G

resources.requests.memory N
presto_singlenode_resources_requests_ephemeral_storage This parameter sets the minimum/guaranteed amount of local ephemeral storage for a container in a Presto singlenode Pod. Units of Bytes

small: 500Mi

small_mincpureq: 500Mi

medium: 1G

large: 1G xlarge:

1G xxlarge: 1G

resources.request.ephemeral-storage N
presto_singlenode_jvm_Xmx Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM). - - jvm.config.Xmx Y
presto_singlenode_task_concurrency Default local concurrency for parallel operators such as joins and aggregations. Number(must be the power of two) - config.properties.task.concurrency Y
presto_singlenode_query_max_memory The maximum amount of user memory that a query can use across the entire cluster. Data size 1TB config.properties.query.max-memory Y
presto_singlenode_query_max_memory_per_node The maximum amount of user memory that a query can use on a worker. Data size presto_singlenode_jvm_Xmx*0.795 config.properties.query.max-memory-per-node Y
presto_singlenode_query_max_total_memory_per_node The maximum amount of user and system memory that a query can use on a worker. Data size presto_singlenode_jvm_Xmx*0.795 config.properties.query.max-total-memory-per-node Y
presto_singlenode_query_max_concurrent_queries Describes how many queries can be processed simultaneously in a single cluster node. Integer - config.properties.query.max-concurrent-queries Y
presto_singlenode_memory_heap_headroom_per_node This is the amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto. Data size presto_singlenode_jvm_Xmx*0.2 config.properties.query.memory.heap-headeroom-per-node Y
presto_singlenode_query_max_total_memory The maximum amount of user and system memory that a query can use across the entire cluster. Data size 2TB config.properties.query.max-total-memory Y
presto_singlenode_experimental_optimized_repartitioning Improve performance of repartitioning data between stages. Boolean true experimental.optimized-repartitioning Y
presto_singlenode_experimental_pushdown_dereference_enabled Add support for pushdown of dereference expressions for querying nested data. Boolean - experimental.pushdown-dereference-enabled Y
presto_singlenode_experimental_pushdown_subfields_enabled Add support for pushdown of subfields expressions for querying nested data. Boolean - experimental.pushdown-subfields-enabled Y
presto_singlenode_join_max_broadcast_table_size Add join-max-broadcast-table-size configuration property and join_max_broadcast_table_size session property to control the maximum estimated size of a table that can be broadcast when using AUTOMATIC join distribution type. Integer - join-max-broadcast-table-size Y
presto_singlenode_node_scheduler_max_pending_splits_per_task The number of outstanding splits with the standard split weight that can be queued for each singlenode node for a single stage of a query, even when the node is already at the limit for total number of splits. Allowing a minimum number of splits per stage is required to prevent starvation and deadlocks. This value must be smaller than node-scheduler.max-splits-per-node, will usually be increased for the same reasons, and has similar drawbacks if set too high. Integer - node-scheduler.max-pending-splits-per-task Y
presto_singlenode_node_scheduler_max_splits_per_node The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler. Integer - node-scheduler.max-splits-per-node Y
presto_singlenode_optimizer_prefer_partial_aggregation This property allow users to disable partial aggregations for queries that do not benefit. Boolean - optimizer.prefer-partial-aggregation Y
presto_singlenode_query_execution_policy Configures the algorithm to organize the processing of all of the stages of a query. String phased query.execution-policy Y
presto_singlenode_query_low_memory_killer_policy The policy used for selecting the query to kill when the cluster is out of memory (OOM). This property can have one of the following values: none, total-reservation, or total-reservation-on-blocked-nodes. none disables the cluster OOM killer. The value of total-reservation configures a policy that kills the query with the largest memory reservation across the cluster. The value of total-reservation-on-blocked-nodes configures a policy that kills the query using the most memory on the workers that are out of memory (blocked). String

total-reservation-on-blocked-nodes

query.low-memory-killer.policy Y
presto_singlenode_query_max_stage_count Add a limit on the number of stages in a query. The default is 100 and can be changed with the query.max-stage-count configuration property and the query_max_stage_count session property. Integer 200 query.max-stage-count Y
presto_singlenode_query_min_schedule_split_batch_size Add query.min-schedule-split-batch-size config flag to set the minimum number of splits to consider for scheduling per batch. Boolean - query.min-schedule-split-batch-size Y
presto_singlenode_query_stage_count_warning_threshold Add a config ox`ption (query.stage-count-warning-threshold) to specify a per-query threshold for the number of stages. When this threshold is exceeded, a TOO_MANY_STAGES warning is raised. Integer 150 query.stage-count-warning-threshold Y
presto_singlenode_scale_writers Enable writer scaling by dynamically increasing the number of writer tasks on the cluster. Boolean - scale-writers Y
presto_singlenode_sink_max_buffer_size Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment. Integer - sink.max-buffer-size Y
presto_singlenode_experimental_max_revocable_memory_per_node The amount of revocable memory a query can use on each node. Units of Bytes - experimental.max-revocable-memory-per-node Y
presto_singlenode_experimental_reserved_pool_enabled This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock. Boolean False experimental.reserved-pool-enabled Y
presto_singlenode_ query_min_expire_age This property describes the minimum time after which you can remove the query metadata from the server. String 120 minuets query.min-expire-age Y
presto_singlenode_enable_dynamic_filtering This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support. Boolean - experimental.enable-dynamic-filtering Y
presto_singlenode_exchange_client_threads This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution Integer   exchange.client-threads  
presto_singlenode_exchange_http_client_max_connections   Integer   exchange.http-client.max-connections  
presto_singlenode_exchange_http_client_max_requests_queued_per_destination This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client. Integer   exchange.http-client.max-requests-queued-per-destination  
presto_singlenode_http_server_log_max_size This property specifies the maximum file size for the log file generated by the HTTP server component. Units of Bytes   http-server.log.max-size  
presto_singlenode_http_server_log_max_history The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content Integer   http-server.log.max-history  
presto_singlenode_http_server_threads_max   Integer   http-server.threads.max  
presto_singlenode_join_max_broadcast_table_size This property allows to specify a maximum size for replicated tables used in joins. Units of Bytes   join-max-broadcast-table-size  
presto_singlenode_log_max_history This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out. Integer   log.max-history  
presto_singlenode_log_max_size The property log.max-size defines the maximum file size allowed for the general application log file. Units of Bytes   log.max-size  
presto_singlenode_node_scheduler_max_splits_per_node This property specifies the target maximum number of splits that can concurrently run on each worker node. Splits represent units of work within queries. Adjusting this property allows administrators to optimize resource utilization, especially in scenarios involving large query batches or connectors generating numerous splits.
CAUTION:
Setting presto_singlenode_node_scheduler_max_splits_per_node at too high value might lead to inefficient memory usage and performance degradation.
Ideally, it should be set such that there is always at least one split waiting to be processed, but not higher.
Integer   node-scheduler.max-splits-per-node  
presto_singlenode_optimize_nulls_in_join This property when enabled reduces the overhead of processing NULL values during JOIN operations, particularly beneficial when dealing with columns containing a significant number of NULLs. Boolean   optimize-nulls-in-join  
presto_singlenode_optimizer_default_filter_factor_enabled This property enables the use of a default value for estimating the cost of filters in query optimization. Boolean   optimizer.default-filter-factor-enabled  
presto_singlenode_optimizer_exploit_constraints This property enables constraint optimizations for querying catalogs that support table constraints. Boolean   optimizer.exploit-constraints  
presto_singlenode_query_client_timeout This property specifies the duration for which the cluster waits without any communication from the client application, such as the CLI, before abandoning and canceling the ongoing query or work. String (Duration)   query.client.timeout  
presto_singlenode_query_max_execution_time This property specifies the maximum allowed time for a query to be actively executing on the cluster, before it is terminated. String (Duration)   query.max-execution-time  
presto_singlenode_query_max_history This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this amount is reached, queries are removed based on age Integer   query.max-history  
presto_singlenode_query_max_length The maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error. Integer   query.max-length  
presto_singlenode_shutdown_grace_period This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete. String (Duration)   shutdown.grace-period  
presto_singlenode_experimental_max_spill_per_node This property refers to the maximum spill space to be used by all queries on a single node. (when the memory allocated for query processing is exceeded.) Units of Bytes   experimental.max-spill-per-node  
presto_singlenode_experimental_query_max_spill_per_node This property refers to the maximum spill space to be used by a single query on a single node. Units of Bytes   experimental.query-max-spill-per-node  
presto_singlenode_experimental_spiller_max_used_space_threshold This property sets a threshold disk space usage ratio, if the usgage exceeds beyond this value, this spill path will not be eligible for spilling. Double   experimental.spiller-max-used-space-threshold  
presto_singlenode_experimental_spiller_spill_path This property specifies a directory where spilled content will be written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to utilize multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.) String   experimental.spiller-spill-path  
presto_singlenode_httpserver_max_request_header_size This property is used to set the maximum size of the request header that http supports. Data size 16kB httpserver.max_request_header_size Y
presto_singlenode_httpserver_max_response_header_size This property is used to set the maximum size of the response header that http supports. Data size 16kB httpserver.max_response_header_size Y
Table 4. watsonx.data component: Postgres
Property Description Type System property Restart containers required
postgres_replicas The number of replicas for Postgres. Integer spec.replicas N
postgres_resources_limits_cpu Resource CPU limit for Postgres. Container is allowed to use only this much CPU. Kubernetes CPU Unit resources.limits.cpu N
postgres_resources_limits_memory Resource Memory limit for Postgres. Container is allowed to use only this much memory. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.
resources.limits.memory N
postgres_resources_requests_cpu Resource CPU request for Postgres. Kubernetes CPU Unit resources.requests.cpu N
postgres_resources_requests_memory Resource memory request for Postgres. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.
resources.requests.memory N
Table 5. watsonx.data component: MinIO
Property Description Type System property Restart containers required
minio_resources_limits_cpu Resource CPU limit for MinIO. Container is allowed to use only this much CPU. Kubernetes CPU Unit resources.limits.cpu N
minio_resources_limits_memory Resource Memory limit for MinIO. A container is allowed to use only this much memory. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.
resources.limits.memory N
minio_resources_requests_cpu Resource CPU request for MinIO. Kubernetes CPU Unit resources.requests.cpu N
minio_resources_requests_memory Resource memory request for MinIO. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.
resources.requests.memory N
Table 6. watsonx.data component: Hive
Property Description Type Default value / Default setting System property Restart containers required
hive_replicas The number of replicas for Hive. Integer   spec.replicas N
hive_resources_limits_cpu Resource CPU limit for Hive, container is allowed to use only this much CPU. Kubernetes CPU Unit   resources.limits.cpu N
hive_resources_limits_memory Resource Memory limit for Hive. A container is allowed to use only this much memory. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.
  resources.limits.memory N
hive_resources_requests_cpu Resource CPU request for Hive. Kubernetes CPU Unit   resources.requests.cpu N
hive_resources_requests_memory Resource memory request for Hive. Units of Bytes
Note: For more information about the memory unit, see Memory resource units.
  resources.requests.memory N
Table 7. watsonx.data component: Presto catalog
Property Description Type Default value / Default setting System property Default value / Default setting Restart containers required
presto_hive_max_outstanding_splits Limit of number of splits waiting to be served by split source. After reaching this limit writers will stop writing new splits to split source until some of them are used by workers. Higher value will increase memory usage, but will allow to concentrate all IO at one time which may be much faster and increase resources utilization. Integer   hive.max-outstanding-splits Y
presto_hive_max_initial_splits This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first hive.max-initial-splits splits with size of hive.max-initial-split-size instead of hive.max-split-size. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database. Integer   hive.max-initial-splits Y
presto_hive_max_initial_split_size This property describes max size of each of initially created splits for a single query. The logic of initial splits is described in hive.max-initial-splits property. Changing this value changes what is considered small query. Higher value causes smaller parallelism for small queries. Lower value increases concurrency for them. This is max size, as the real size may be lower when end of blocks in single DataNode is reached. Integer   hive.max-initial-split-size Y
presto_hive_max_split_size This property describes how many splits may be initially created for a single query. The initial splits are created to allow better concurrency for small queries. Hive connector will create first hive.max-initial-splits splits with size of hive.max-initial-split-size instead of hive.max-split-size. Having this value higher will force more splits to have smaller size effectively increasing definition of what is considered small query in database. Integer   hive.max-split-size Y
presto_hive_split_loader_concurrency This property specifies the level of concurrency for loading data from Hive tables using the Presto Hive connector. It controls the number of concurrent split loader threads that Presto can utilize to fetch data in parallel from the Hive source. A higher concurrency value can improve data retrieval speed and resource utilization, particularly for large Hive queries, but it can also increase system resource consumption. Integer   hive.split-loader-concurrency Y
presto_hive_pushdown_filter_enabled This property controls whether filter pushdown is enabled in the Presto Hive connector. Filter pushdown is a query optimization technique that allows Presto to push filtering conditions directly to the underlying Hive data source. When enabled, filtering conditions specified in SQL queries are evaluated as close to the data source as possible, reducing the amount of data that needs to be transferred to Presto for processing. Integer   hive.pushdown-filter-enabled Y
presto_hive_node_selection_strategy Add configuration property hive.node-selection-strategy to choose NodeSelectionStrategy. When SOFT_AFFINITY is selected, scheduler will make the best effort to request the same worker to fetch the same file. String   hive.node-selection-strategy Y
presto_hive_max_partitions_per_writers The maximum number of partitions permitted for a single writer. Integer   hive.max-partitions-per-writers   Y
presto_hive_metastore_timeout This parameter specifies the timeout duration for requests made to the Hive metastore.
Note: It is applicable for version 1.1.4 and later of watsonx.data.
String (Duration) hive.metastore-timeout 10s Y
hive_s3_max_error_retries This property allows to set the maximum number of retry attempts for S3 client operations in Hive. Integer hive.s3.max-error-retries 50 Y
hive_s3_connect_timeout This property specifies the TCP connection timeout for S3 operations in Hive. String (Duration) hive.s3.connect-timeout 1m Y
hive_s3_socket_timeout This property sets the maximum time allowed for reading data from the socket for S3 operations in Hive. String (Duration) hive.s3.socket-timeout 2m Y
hive_s3_max_connections This property defines the maximum number of concurrent open connections permitted to S3 in Hive. Integer hive.s3.max-connections 5000 Y
hive_s3_max_client_retries This property sets the maximum number of retry attempts for read operations for S3 in Hive. Integer hive.s3.max-client-retries 50 Y
Table 8. watsonx.data component: Engine
Property Description Type System property Restart containers required  
presto_qhmm_enable This parameter enables (or disables) the Query History Monitoring and Management (QHMM) service in an engine. Boolean      

You can customize the watsonx.data components by following the instructions in Customizing watsonx.data components .