Specifying additional customization for watsonx.data
A project administrator can specify extra customization of properties for different components in IBM® watsonx.data. You can set extra customizations other than the default ones, as part of the post-install procedure. The following specifications are optional and it can be altered to change the component level customizations.
watsonx.data on Red Hat® OpenShift®
The complete list of possible custom resource descriptions is listed in the following table.
Property | Description | Type | Default value / Default setting | System property | Restart containers required |
---|---|---|---|---|---|
presto_coordinator_resources_limits_cpu |
Resource CPU limit for Presto coordinator, container is allowed to use only this much CPU. | Kubernetes CPU Unit |
small: 12 small_mincpureq: 12 medium: 12 large: 12 xlarge: 12 xxlarge: 12 |
resources.limits.cpu |
N |
presto_coordinator_resources_limits_memory |
Resource Memory limit for Presto coordinator, container is allowed to use only this much memory. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G |
resources.limits.memory Note: For more information about the memory unit, see
Memory resource units.
|
N |
presto_coordinator_resources_limits_ephemeral_storage |
This parameter sets the maximum amount of local ephemeral storage that a container in a Presto coordinator Pod can consume. | Units of Bytes |
small: 10G small_mincpureq: 10G medium: 10G large: 10G xlarge: 10G xxlarge: 10G |
resources.limits.ephemeral-storage |
N |
presto_coordinator_resources_requests_cpu |
Resource CPU request for Presto coordinator. | Kubernetes CPU Unit |
small: 12 small_mincpureq: 0.005 medium: 12 large: 12 xlarge: 12 xxlarge: 12 |
resources.requests.cpu |
N |
presto_coordinator_resources_requests_memory |
Resource memory request for Presto coordinator. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G |
resources.requests.memory
Note: For more information about the memory unit, see Memory resource units.
|
N |
presto_coordinator_resources_requests_ephemeral_storage |
This parameter sets the minimum/guaranteed size of local ephemeral storage that a container in a Presto coordinator Pod requests. | Units of Bytes |
small: 1G small_mincpureq: 1G medium: 1G large: 1G xlarge: 1G xxlarge: 1G |
resources.request.ephemeral-storage |
N |
|
Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM). | - | - | jvm.config.Xmx |
Y |
|
Default local concurrency for parallel operators such as joins and aggregations. | Number(must be the power of two) | 16 | config.properties.task.concurrency |
Y |
|
The maximum amount of user memory that a query can use across the entire cluster. | Data size | 1TB | config.properties.query.max-memory |
Y |
|
The maximum amount of user memory that a query can use on a worker. | Data size | presto_coordinator_jvm_Xmx*0.795 | config.properties.query.max-memory-per-node |
Y |
|
The maximum amount of user and system memory that a query can use on a worker. | Data size | presto_coordinator_jvm_Xmx*0.795 | config.properties.query.max-total-memory-per-node |
Y |
|
Describes how many queries can be processed simultaneously in a single cluster node. | Integer | 15 | config.properties.query.max-concurrent-queries |
Y |
|
The amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto. | Data size | presto_coordinator_jvm_Xmx*0.2 | config.properties.query.memory.heap-headeroom-per-node |
Y |
|
The maximum amount of user and system memory that a query can use across the entire cluster. | Data size | 2TB | config.properties.query.max-total-memory |
Y |
presto_coordinator_experimental_optimized_repartitioning |
Improve performance of repartitioning data between stages. | Boolean | true |
experimental.optimized-repartitioning |
Y |
presto_coordinator_experimental_pushdown_dereference_enabled |
Add support for pushdown of dereference expressions for querying nested data. | Boolean | - | experimental.pushdown-dereference-enabled |
Y |
presto_coordinator_experimental_pushdown_subfields_enabled |
Add support for pushdown of subfields expressions for querying nested data. | Boolean | - | experimental.pushdown-subfields-enabled |
Y |
presto_coordinator_join_max_broadcast_table_size |
Add join-max-broadcast-table-size configuration property
andjoin_max_broadcast_table_size session property to control the maximum estimated
size of a table that can be broadcast when using AUTOMATIC join distribution
type. |
Integer | - | join-max-broadcast-table-size |
Y |
presto_coordinator_node_scheduler_max_pending_splits_per_task |
The number of outstanding splits with the standard split weight that can be queued for each
worker node for a single stage of a query, even when the node is already at the limit for total
number of splits. Allowing a minimum number of splits per stage is required to prevent starvation
and deadlocks. This value must be smaller than node-scheduler.max-splits-per-node ,
will usually be increased for the same reasons, and has similar drawbacks if set too high. |
Integer | - | node-scheduler.max-pending-splits-per-task |
Y |
presto_coordinator_node_scheduler_max_splits_per_node |
The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler. | Integer | - | node-scheduler.max-splits-per-node |
Y |
presto_coordinator_optimizer_prefer_partial_aggregation |
This property allow users to disable partial aggregations for queries that do not benefit. | Boolean | - | optimizer.prefer-partial-aggregation |
Y |
presto_coordinator_query_execution_policy |
Configures the algorithm to organize the processing of all of the stages of a query. | String | phased |
query.execution-policy |
Y |
presto_coordinator_query_low_memory_killer_policy |
The policy used for selecting the query to kill when the cluster is out of memory (OOM). This
property can have one of the following values: none, total-reservation, or
total-reservation-on-blocked-nodes . none disables the cluster OOM killer.
The value of total-reservation configures a policy that kills the query with the largest memory
reservation across the cluster. The value of total-reservation-on-blocked-nodes
configures a policy that kills the query using the most memory on the workers that are out of memory
(blocked). |
String |
|
query.low-memory-killer.policy |
Y |
presto_coordinator_query_max_stage_count |
Add a limit on the number of stages in a query. The default is 100 and can be changed with
the query.max-stage-count configuration property and the
query_max_stage_count session property. |
Integer | 200 | query.max-stage-count |
Y |
presto_coordinator_query_min_schedule_split_batch_size |
Add query.min-schedule-split-batch-size config flag to set the minimum
number of splits to consider for scheduling per batch. |
Boolean | - | query.min-schedule-split-batch-size |
Y |
presto_coordinator_query_stage_count_warning_threshold |
Add a config option (query.stage-count-warning-threshold ) to specify a
per-query threshold for the number of stages. When this threshold is exceeded, a
TOO_MANY_STAGES warning is raised. |
Integer | 150 | query.stage-count-warning-threshold |
Y |
presto_coordinator_scale_writers |
Enable writer scaling by dynamically increasing the number of writer tasks on the cluster. | Boolean | - | scale-writers |
Y |
presto_coordinator_sink_max_buffer_size |
Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment. | Integer | - | sink.max-buffer-size |
Y |
presto_coordinator_experimental_max_revocable_memory_per_node |
The amount of revocable memory a query can use on each node. | Units of Bytes | - | experimental.max-revocable-memory-per-node |
Y |
presto_coordinator_experimental_reserved_pool_enabled |
This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock. | Boolean | False | experimental.reserved-pool-enabled |
Y |
presto_coordinator_query_min_expire_age |
This property describes the minimum time after which you can remove the query metadata from the server. | String (Duration) | 120 minuets | query.min-expire-age |
Y |
presto_coordinator_enable_dynamic_filtering |
This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support. | Boolean | - | experimental.enable-dynamic-filtering |
Y |
presto_coordinator_com_facebook_presto_governance |
This property sets the minimum log level for the logger
com.facebook.presto.governance . It helps to customize the logging behavior based on
the severity of log messages. |
String(log levels) Note: There are four levels:
DEBUG ,
INFO , WARN and
ERROR |
com.facebook.presto.governance |
||
presto_coordinator_com_facebook_presto_governance_util |
This property sets the minimum log level for the logger
com.facebook.presto.governance_util . It helps to customize the logging behavior
based on the severity of log messages. |
String(log levels) Note: There are four levels:
DEBUG ,
INFO , WARN and
ERROR |
com.facebook.presto.governance.util |
||
presto_coordinator_com_facebook_presto_dispatcher |
This property sets the minimum log level for the logger
com.facebook.presto.dispatcher . It helps to customize the logging behavior based on
the severity of log messages. |
String(log levels) Note: There are four levels:
DEBUG ,
INFO , WARN and
ERROR |
com.facebook.presto.dispatcher |
||
presto_coordinator_exchange_client_threads |
This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution. | Integer | exchange.client-threads |
||
presto_coordinator_exchange_http_client_max_connections |
This property specifies the maximum number of HTTP connections that the Exchange service can establish concurrently across all servers it interacts with. it helps to regulate the total number of simultaneous connections used by the exchange client for communication between Presto nodes. | Integer | exchange.http-client.max-connections |
||
presto_coordinator_exchange_http_client_max_requests_queued_per_destination |
This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client. | Integer | exchange.http-client.max-requests-queued-per-destination |
||
presto_coordinator_http_server_log_max_size |
This property specifies the maximum file size for the log file generated by the HTTP server component. | Units of Bytes | http-server.log.max-size |
||
presto_coordinator_http_server_log_max_history |
The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content | Integer | http-server.log.max-history |
||
presto_coordinator_http_server_threads_max |
Integer | http-server.threads.max |
|||
presto_coordinator_join_max_broadcast_table_size |
This property allows to specify a maximum size for replicated tables used in joins. | Units of Bytes | join-max-broadcast-table-size |
||
presto_coordinator_log_max_history |
This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out. | Integer | log.max-history |
||
presto_coordinator_log_max_size |
The property log.max-size defines the maximum file size allowed for the
general application log file. |
Units of Bytes | log.max-size |
||
presto_coordinator_node_scheduler_max_splits_per_node |
This property specifies the target maximum number of splits that can concurrently run on each
worker node. Splits represent units of work within queries. Adjusting this property allows
administrators to optimize resource utilization, especially in scenarios involving large query
batches or connectors generating numerous splits. CAUTION: Setting
Set this property such that there is
always at least one split waiting to be processed, but not higher.presto_coordinator_node_scheduler_max_splits_per_node at too high value might lead
to inefficient memory usage and performance degradation. |
Integer | node-scheduler.max-splits-per-node |
||
presto_coordinator_optimize_nulls_in_join |
This property when enabled reduces the overhead of processing NULL values
during JOIN operations, particularly beneficial when dealing with columns
containing a significant number of NULL values. |
Boolean | optimize-nulls-in-join |
||
presto_coordinator_optimizer_default_filter_factor_enabled |
This property enables the use of a default value for estimating the cost of filters in query optimization. | Boolean | optimizer.default-filter-factor-enabled |
||
presto_coordinator_optimizer_exploit_constraints |
This property enables constraint optimizations for querying catalogs that support table constraints. | Boolean | optimizer.exploit-constraints |
||
presto_coordinator_query_client_timeout |
This property specifies the duration for which the cluster waits without any communication from the client application, (for example CLI) before abandoning and canceling the ongoing query or task. | String (Duration) | query.client.timeout |
||
presto_coordinator_query_max_execution_time |
This property specifies the maximum allowed time for a query to be actively executing on the cluster before termination. | String (Duration) | query.max-execution-time |
||
presto_coordinator_query_max_history |
This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this value is reached, queries are removed based on age | Integer | query.max-history |
||
presto_coordinator_query_max_length |
This property specifies the maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error. | Integer | query.max-length |
||
presto_coordinator_shutdown_grace_period |
This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete. | String (Duration) | shutdown.grace-period |
||
presto_coordinator_experimental_max_spill_per_node |
This property refers to the maximum spill space used by all queries on a single node (when the memory allocated for query processing is exceeded). | Units of Bytes | experimental.max-spill-per-node |
||
presto_coordinator_experimental_query_max_spill_per_node |
This property refers to the maximum spill space used by a single query on a single node. | Units of Bytes | experimental.query-max-spill-per-node |
||
presto_coordinator_experimental_spiller_max_used_space_threshold |
This property sets a threshold disk space usage ratio. If the usage exceeds beyond threshold value, this spill path becomes ineligible for spilling. | Double | experimental.spiller-max-used-space-threshold |
||
presto_coordinator_experimental_spiller_spill_path |
This property specifies a directory where spilled content is written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to use multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.) | String | experimental.spiller-spill-path |
||
presto_coordinator_httpserver_max_request_header_size |
This property is used to set the maximum size of the request header that
http supports. |
Data size | 16kB | httpserver.max_request_header_size |
Y |
presto_coordinator_httpserver_max_response_header_size |
This property is used to set the maximum size of the response header that
http supports. |
Data size | 16kB | httpserver.max_response_header_size |
Y |
presto_coordinator_join_distribution_type |
This property specifies the type of distributed join to use. Allowed values include:
AUTOMATIC , PARTITIONED , BROADCAST |
String | AUTOMATIC |
join-distribution-type |
Property | Description | Type | Default value / Default setting | System property | Restart containers required |
---|---|---|---|---|---|
presto_worker_replicas |
The number of replicas for Presto worker. | Integer |
small: 3 small_mincpureq: 3 medium: 9 large: 19 xlarge: 69 xxlarge: 199 |
spec.replicas |
N |
presto_worker_resources_limits_cpu |
Resource CPU limit for Presto worker, container is allowed to use only this much CPU. | Kubernetes CPU Unit |
small: 12 small_mincpureq: 12 medium: 12 large: 12 xlarge: 12 xxlarge: 12 |
resources.limits.cpu |
N |
presto_worker_resources_limits_memory |
Resource Memory limit for Presto worker, container is allowed to use only this much memory. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G |
resources.limits.memory Note: For more information about the memory unit, see
Memory resource units.
|
N |
presto_worker_resources_limits_ephemeral_storage |
This parameter sets the maximum amount of local ephemeral storage that a container in a Presto worker Pod can consume. | Units of Bytes |
small: 10G small_mincpureq: 10G medium: 10G large: 10G xlarge: 10G xxlarge: 10G |
resources.limits.ephemeral-storage |
N |
presto_worker_resources_requests_cpu |
Resource CPU request for Presto worker. | Kubernetes CPU Unit |
small: 12 small_mincpureq: 0.005 medium: 12 large: 12 xlarge: 12 xxlarge: 12 |
resources.requests.cpu |
N |
presto_worker_resources_requests_memory |
Resource memory request for Presto worker. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
small: 100G small_mincpureq: 100G medium: 100G large: 100G xlarge: 100G xxlarge: 100G |
resources.requests.memory
Note: For more information about the memory unit, see Memory resource units.
|
N |
presto_worker_jvm_Xmx |
Xmx specifies the maximum memory allocation pool for a Java virtual machine
(JVM). |
- | - | jvm.config.Xmx |
Y |
presto_worker_task_concurrency |
Default local concurrency for parallel operators such as joins and aggregations. | Number(must be the power of two) | 16 | config.properties.task.concurrency |
Y |
presto_worker_query_max_memory |
The maximum amount of user memory that a query can use across the entire cluster. | Data size | 1TB | config.properties.query.max-memory |
Y |
presto_worker_query_max_memory_per_node |
The maximum amount of user memory that a query can use on a worker. | Data size | presto_worker_jvm_Xmx*0.795 | config.properties.query.max-memory-per-node |
Y |
presto_worker_query_max_total_memory_per_node |
The maximum amount of user and system memory that a query can use on a worker. | Data size | presto_worker_jvm_Xmx*0.795 | config.properties.query.max-total-memory-per-node |
Y |
presto_worker_query_max_concurrent_queries |
Describes how many queries can be processed simultaneously in a single cluster node. | Integer | 15 | config.properties.query.max-concurrent-queries |
Y |
presto_worker_memory_heap_headroom_per_node |
This is the amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto. | Data size | presto_worker_jvm_Xmx*0.2 | config.properties.query.memory.heap-headeroom-per-node |
Y |
presto_worker_query_max_total_memory |
The maximum amount of user and system memory that a query can use across the entire cluster. | Data size | 2TB | config.properties.query.max-total-memory |
Y |
presto_worker_experimental_optimized_repartitioning |
Improve performance of repartitioning data between stages. | Boolean | true |
experimental.optimized-repartitioning |
Y |
presto_worker_experimental_pushdown_dereference_enabled |
Add support for pushdown of dereference expressions for querying nested data. | Boolean | - | experimental.pushdown-dereference-enabled |
Y |
presto_worker_experimental_pushdown_subfields_enabled |
Add support for pushdown of subfields expressions for querying nested data. | Boolean | - | experimental.pushdown-subfields-enabled |
Y |
presto_worker_join_max_broadcast_table_size |
Add join-max-broadcast-table-size configuration property and
join_max_broadcast_table_size session property to control the maximum estimated
size of a table that can be broadcast when using AUTOMATIC join distribution type. |
Integer | - | join-max-broadcast-table-size |
Y |
presto_worker_node_scheduler_max_pending_splits_per_task |
The number of outstanding splits with the standard split weight that can be queued for each
worker node for a single stage of a query, even when the node is already at the limit for total
number of splits. Allowing a minimum number of splits per stage is required to prevent starvation
and deadlocks. This value must be smaller than node-scheduler.max-splits-per-node ,
will usually be increased for the same reasons, and has similar drawbacks if set too high. |
Integer | - | node-scheduler.max-pending-splits-per-task |
Y |
presto_worker_node_scheduler_max_splits_per_node |
The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler. | Integer | - | node-scheduler.max-splits-per-node |
Y |
presto_worker_optimizer_prefer_partial_aggregation |
This property allow users to disable partial aggregations for queries that do not benefit. | Boolean | - | optimizer.prefer-partial-aggregation |
Y |
presto_worker_query_execution_policy |
Configures the algorithm to organize the processing of all of the stages of a query. | String | phased |
query.execution-policy |
Y |
presto_worker_query_low_memory_killer_policy |
The policy used for selecting the query to kill when the cluster is out of memory (OOM). This
property can have one of the following values: none, total-reservation, or
total-reservation-on-blocked-nodes . none disables the cluster OOM killer. The value
of total-reservation configures a policy that kills the query with the largest memory reservation
across the cluster. The value of total-reservation-on-blocked-nodes configures a
policy that kills the query using the most memory on the workers that are out of memory
(blocked). |
String |
|
query.low-memory-killer.policy |
Y |
presto_worker_query_max_stage_count |
Add a limit on the number of stages in a query. The default is 100 and can be changed with
the query.max-stage-count configuration property and the query_max_stage_count
session property. |
Integer | 200 | query.max-stage-count |
Y |
presto_worker_query_min_schedule_split_batch_size |
Add query.min-schedule-split-batch-size config flag to set the minimum
number of splits to consider for scheduling per batch. |
Boolean | - | query.min-schedule-split-batch-size |
Y |
presto_worker_query_stage_count_warning_threshold |
Add a config option (query.stage-count-warning-threshold ) to specify a
per-query threshold for the number of stages. When this threshold is exceeded, a
TOO_MANY_STAGES warning is raised. |
Integer | 150 | query.stage-count-warning-threshold |
Y |
presto_worker_scale_writers |
Enable writer scaling by dynamically increasing the number of writer tasks on the cluster. | Boolean | - | scale-writers |
Y |
presto_worker_sink_max_buffer_size |
Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment. | Integer | - | sink.max-buffer-size |
Y |
presto_worker_experimental_max_revocable_memory_per_node |
The amount of revocable memory a query can use on each node. | Units of Bytes | - | experimental.max-revocable-memory-per-node |
Y |
presto_worker_experimental_reserved_pool_enabled |
This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock. | Boolean | False | experimental.reserved-pool-enabled |
Y |
presto_worker_ query_min_expire_age
|
This property describes the minimum time after which you can remove the query metadata from the server. | String (Duration) | 120 minuets | query.min-expire-age |
Y |
presto_worker_enable_dynamic_filtering |
This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support. | Boolean | - | experimental.enable-dynamic-filtering |
Y |
presto_worker_exchange_client_threads |
This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution | Integer | exchange.client-threads |
||
presto_worker_exchange_http_client_max_connections |
This property specifies the maximum number of HTTP connections that the Exchange service can establish concurrently across all servers it interacts with. it helps to regulate the total number of simultaneous connections used by the exchange client for communication between Presto nodes. | Integer | exchange.http-client.max-connections |
||
presto_worker_exchange_http_client_max_requests_queued_per_destination |
This property determines the maximum number of HTTP requests that can be queued for each destination server by the exchange client. | Integer | exchange.http-client.max-requests-queued-per-destination |
||
presto_worker_http_server_log_max_size |
This property specifies the maximum file size for the log file generated by the HTTP server component. | Units of Bytes | http-server.log.max-size |
||
presto_worker_http_server_log_max_history |
The property specifies the maximum number of log files that the HTTP server component will retain before rotating old log content | Integer | http-server.log.max-history |
||
presto_worker_http_server_threads_max |
Integer | http-server.threads.max |
|||
presto_worker_join_max_broadcast_table_size |
This property allows to specify a maximum size for replicated tables used in joins. | Units of Bytes | join-max-broadcast-table-size |
||
presto_worker_log_max_history |
This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out. | Integer | log.max-history |
||
presto_worker_log_max_size |
The property log.max-size defines the maximum file size allowed for the
general application log file. |
Units of Bytes | log.max-size |
||
presto_worker_node_scheduler_max_splits_per_node |
This property specifies the target maximum number of splits that can concurrently run on each
worker node. Splits represent units of work within queries. Adjusting this property allows
administrators to optimize resource utilization, especially in scenarios involving large query
batches or connectors generating numerous splits. CAUTION: Setting
Ideally, it should be set such that
there is always at least one split waiting to be processed, but not higher.presto_worker_node_scheduler_max_splits_per_node at too high value might lead to
inefficient memory usage and performance degradation. |
Integer | node-scheduler.max-splits-per-node |
||
presto_worker_optimize_nulls_in_join |
This property when enabled reduces the overhead of processing NULL values
during JOIN operations, particularly beneficial when dealing with columns
containing a significant number of NULL values. |
Boolean | optimize-nulls-in-join |
||
presto_worker_optimizer_default_filter_factor_enabled |
This property enables the use of a default value for estimating the cost of filters in query optimization. | Boolean | optimizer.default-filter-factor-enabled |
||
presto_worker_optimizer_exploit_constraints |
This property enables constraint optimizations for querying catalogs that support table constraints. | Boolean | optimizer.exploit-constraints |
||
presto_worker_query_client_timeout |
This property specifies the duration for which the cluster waits without any communication from the client application, (for example, CLI) before abandoning and canceling the ongoing query or work. | String (Duration) | query.client.timeout |
||
presto_worker_query_max_execution_time |
This property specifies the maximum allowed time for a query to be actively executing on the cluster, before it is terminated. | String (Duration) | query.max-execution-time |
||
presto_worker_query_max_history |
This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this value is reached, queries are removed based on age | Integer | query.max-history |
||
presto_worker_query_max_length |
The maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error. | Integer | query.max-length |
||
presto_worker_shutdown_grace_period |
This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete. | String (Duration) | shutdown.grace-period |
||
presto_worker_experimental_max_spill_per_node |
This property refers to the maximum spill space to be used by all queries on a single node (when the memory allocated for query processing is exceeded). | Units of Bytes | experimental.max-spill-per-node |
||
presto_worker_experimental_query_max_spill_per_node |
This property refers to the maximum spill space to be used by a single query on a single node. | Units of Bytes | experimental.query-max-spill-per-node |
||
presto_worker_experimental_spiller_max_used_space_threshold |
This property sets a threshold disk space usage ratio, if the usage exceeds beyond this value, this spill path will not be eligible for spilling. | Double | experimental.spiller-max-used-space-threshold |
||
presto_worker_experimental_spiller_spill_path |
This property specifies a directory where spilled content is written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to use multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.) | String | experimental.spiller-spill-path |
||
presto_worker_resources_requests_ephemeral_storage |
This parameter sets the minimum/guaranteed size of local ephemeral storage that a container in a Presto worker Pod requests. | Units of Bytes |
small: 1G small_mincpureq: 1G medium: 1G large: 1G xlarge: 1G xxlarge: 1G |
resources.request.ephemeral-storage |
N |
presto_worker_httpserver_max_request_header_size |
This property is used to set the maximum size of the request header that
http supports. |
Data size | 16kB | httpserver.max_request_header_size |
Y |
presto_worker_httpserver_max_response_header_size |
This property is used to set the maximum size of the response header that
http supports. |
Data size | 16kB | httpserver.max_response_header_size |
Y |
Property | Description | Type | Default value / Default setting | System property | Restart containers required |
---|---|---|---|---|---|
presto_singlenode_resources_limits_cpu |
Resource CPU limit for Presto singlenode, container is allowed to use only this much CPU. | Kubernetes CPU Unit |
small: 3 small_mincpureq: 3 medium: 6 large: 9 xlarge: 12 xxlarge: 12 |
resources.limits.cpu |
N |
presto_singlenode_resources_limits_memory |
Resource Memory limit for Presto singlenode, container is allowed to use only this much memory. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
small: 24G small_mincpureq: 24G medium: 48G large: 72G xlarge: 96G xxlarge: 96G |
resources.limits.memory |
N |
presto_singlenode_resources_limits_ephemeral_storage |
This parameter sets the maximum amount of local ephemeral storage that a container in a Presto singlenode Pod can consume. | Units of Bytes |
small: 10G small_mincpureq: 10G medium: 10G large: 10G xlarge: 10G xxlarge: 10G |
resources.limits.ephemeral-storage |
N |
presto_singlenode_resources_requests_cpu |
Resource CPU request for Presto single node. | Kubernetes CPU Unit |
small: 3 small_mincpureq: 0.005 medium: 6 large: 9 xlarge: 12 xxlarge: 12 |
resources.requests.cpu |
N |
presto_singlenode_resources_requests_memory |
Resource memory request for Presto singlenode. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
small: 24G small_mincpureq: 24G medium: 48G large: 72G xlarge: 96G xxlarge: 96G |
resources.requests.memory |
N |
presto_singlenode_resources_requests_ephemeral_storage |
This parameter sets the minimum/guaranteed amount of local ephemeral storage for a container in a Presto singlenode Pod. | Units of Bytes |
small: 500Mi small_mincpureq: 500Mi medium: 1G large: 1G xlarge: 1G xxlarge: 1G |
resources.request.ephemeral-storage |
N |
presto_singlenode_jvm_Xmx |
Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM). | - | - | jvm.config.Xmx |
Y |
presto_singlenode_task_concurrency |
Default local concurrency for parallel operators such as joins and aggregations. | Number(must be the power of two) | - | config.properties.task.concurrency |
Y |
presto_singlenode_query_max_memory |
The maximum amount of user memory that a query can use across the entire cluster. | Data size | 1TB | config.properties.query.max-memory |
Y |
presto_singlenode_query_max_memory_per_node |
The maximum amount of user memory that a query can use on a worker. | Data size | presto_singlenode_jvm_Xmx*0.795 | config.properties.query.max-memory-per-node |
Y |
presto_singlenode_query_max_total_memory_per_node |
The maximum amount of user and system memory that a query can use on a worker. | Data size | presto_singlenode_jvm_Xmx*0.795 | config.properties.query.max-total-memory-per-node |
Y |
presto_singlenode_query_max_concurrent_queries |
Describes how many queries can be processed simultaneously in a single cluster node. | Integer | - | config.properties.query.max-concurrent-queries |
Y |
presto_singlenode_memory_heap_headroom_per_node |
This is the amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Presto. | Data size | presto_singlenode_jvm_Xmx*0.2 | config.properties.query.memory.heap-headeroom-per-node |
Y |
presto_singlenode_query_max_total_memory |
The maximum amount of user and system memory that a query can use across the entire cluster. | Data size | 2TB | config.properties.query.max-total-memory |
Y |
presto_singlenode_experimental_optimized_repartitioning |
Improve performance of repartitioning data between stages. | Boolean | true |
experimental.optimized-repartitioning |
Y |
presto_singlenode_experimental_pushdown_dereference_enabled |
Add support for pushdown of dereference expressions for querying nested data. | Boolean | - | experimental.pushdown-dereference-enabled |
Y |
presto_singlenode_experimental_pushdown_subfields_enabled |
Add support for pushdown of subfields expressions for querying nested data. | Boolean | - | experimental.pushdown-subfields-enabled |
Y |
presto_singlenode_join_max_broadcast_table_size |
Add join-max-broadcast-table-size configuration property and
join_max_broadcast_table_size session property to control the maximum estimated
size of a table that can be broadcast when using AUTOMATIC join distribution type. |
Integer | - | join-max-broadcast-table-size |
Y |
presto_singlenode_node_scheduler_max_pending_splits_per_task |
The number of outstanding splits with the standard split weight that can be queued for each
singlenode node for a single stage of a query, even when the node is already at the limit for total
number of splits. Allowing a minimum number of splits per stage is required to prevent starvation
and deadlocks. This value must be smaller than node-scheduler.max-splits-per-node ,
will usually be increased for the same reasons, and has similar drawbacks if set too high. |
Integer | - | node-scheduler.max-pending-splits-per-task |
Y |
presto_singlenode_node_scheduler_max_splits_per_node |
The target value for the total number of splits that can be running for each worker node, assuming all splits have the standard split weight. Using a higher value is recommended, if queries are submitted in large batches (for example, running a large group of reports periodically), or for connectors that produce many splits that complete quickly but do not support assigning split weight values to express that to the split scheduler. | Integer | - | node-scheduler.max-splits-per-node |
Y |
presto_singlenode_optimizer_prefer_partial_aggregation |
This property allow users to disable partial aggregations for queries that do not benefit. | Boolean | - | optimizer.prefer-partial-aggregation |
Y |
presto_singlenode_query_execution_policy |
Configures the algorithm to organize the processing of all of the stages of a query. | String | phased |
query.execution-policy |
Y |
presto_singlenode_query_low_memory_killer_policy |
The policy used for selecting the query to kill when the cluster is out of memory (OOM). This
property can have one of the following values: none ,
total-reservation , or total-reservation-on-blocked-nodes . none
disables the cluster OOM killer. The value of total-reservation configures a policy that kills the
query with the largest memory reservation across the cluster. The value of
total-reservation-on-blocked-nodes configures a policy that kills the query using
the most memory on the workers that are out of memory (blocked). |
String |
|
query.low-memory-killer.policy |
Y |
presto_singlenode_query_max_stage_count |
Add a limit on the number of stages in a query. The default is 100 and can be changed with
the query.max-stage-count configuration property and the
query_max_stage_count session property. |
Integer | 200 | query.max-stage-count |
Y |
presto_singlenode_query_min_schedule_split_batch_size |
Add query.min-schedule-split-batch-size config flag to set the minimum
number of splits to consider for scheduling per batch. |
Boolean | - | query.min-schedule-split-batch-size |
Y |
presto_singlenode_query_stage_count_warning_threshold |
Add a config ox`ption (query.stage-count-warning-threshold ) to specify a
per-query threshold for the number of stages. When this threshold is exceeded, a
TOO_MANY_STAGES warning is raised. |
Integer | 150 | query.stage-count-warning-threshold |
Y |
presto_singlenode_scale_writers |
Enable writer scaling by dynamically increasing the number of writer tasks on the cluster. | Boolean | - | scale-writers |
Y |
presto_singlenode_sink_max_buffer_size |
Buffer size for IO writes while collecting pipeline results. Higher value may increase speed of IO operations with the cost of additional memory. Also higher value may increase number of data lost when presto node will fail effectively slowing down IO in unstable environment. | Integer | - | sink.max-buffer-size |
Y |
presto_singlenode_experimental_max_revocable_memory_per_node |
The amount of revocable memory a query can use on each node. | Units of Bytes | - | experimental.max-revocable-memory-per-node |
Y |
presto_singlenode_experimental_reserved_pool_enabled |
This property allows users to enable or disable Reserved Pool in Presto. When the General Pool is full, this property uses OOM killer in Presto to increase the General Pool concurrency and prevent the deadlock. | Boolean | False | experimental.reserved-pool-enabled |
Y |
presto_singlenode_ query_min_expire_age
|
This property describes the minimum time after which you can remove the query metadata from the server. | String | 120 minuets | query.min-expire-age |
Y |
presto_singlenode_enable_dynamic_filtering
|
This property improves performance for queries with broadcast or collocated joins by adding dynamic filtering and bucket pruning support. | Boolean | - | experimental.enable-dynamic-filtering |
Y |
presto_singlenode_exchange_client_threads |
This property helps to control the number of threads used by exchange clients in Presto to fetch data from other Presto nodes during query execution | Integer | exchange.client-threads |
||
presto_singlenode_exchange_http_client_max_connections |
Integer | exchange.http-client.max-connections |
|||
presto_singlenode_exchange_http_client_max_requests_queued_per_destination |
This property determines the maximum number of HTTP requests that can be
queued for each destination server by the exchange client. |
Integer | exchange.http-client.max-requests-queued-per-destination |
||
presto_singlenode_http_server_log_max_size |
This property specifies the maximum file size for the log file generated by the
HTTP server component. |
Units of Bytes | http-server.log.max-size |
||
presto_singlenode_http_server_log_max_history |
The property specifies the maximum number of log files that the HTTP server
component will retain before rotating old log content |
Integer | http-server.log.max-history |
||
presto_singlenode_http_server_threads_max |
Integer | http-server.threads.max |
|||
presto_singlenode_join_max_broadcast_table_size |
This property allows to specify a maximum size for replicated tables used in joins. | Units of Bytes | join-max-broadcast-table-size |
||
presto_singlenode_log_max_history |
This property represents the maximum number of general application log files retained by a logging system before older logs are rotated out. | Integer | log.max-history |
||
presto_singlenode_log_max_size |
The property log.max-size defines the maximum file size allowed for the
general application log file. |
Units of Bytes | log.max-size |
||
presto_singlenode_node_scheduler_max_splits_per_node |
This property specifies the target maximum number of splits that can concurrently run on each
worker node. Splits represent units of work within queries. Adjusting this property allows
administrators to optimize resource utilization, especially in scenarios involving large query
batches or connectors generating numerous splits. CAUTION: Setting
Ideally, it should be set such that
there is always at least one split waiting to be processed, but not higher.presto_singlenode_node_scheduler_max_splits_per_node at too high value might lead
to inefficient memory usage and performance degradation. |
Integer | node-scheduler.max-splits-per-node |
||
presto_singlenode_optimize_nulls_in_join |
This property when enabled reduces the overhead of processing NULL values during JOIN operations, particularly beneficial when dealing with columns containing a significant number of NULLs. | Boolean | optimize-nulls-in-join |
||
presto_singlenode_optimizer_default_filter_factor_enabled |
This property enables the use of a default value for estimating the cost of filters in query optimization. | Boolean | optimizer.default-filter-factor-enabled |
||
presto_singlenode_optimizer_exploit_constraints |
This property enables constraint optimizations for querying catalogs that support table constraints. | Boolean | optimizer.exploit-constraints |
||
presto_singlenode_query_client_timeout |
This property specifies the duration for which the cluster waits without any communication from the client application, such as the CLI, before abandoning and canceling the ongoing query or work. | String (Duration) | query.client.timeout |
||
presto_singlenode_query_max_execution_time |
This property specifies the maximum allowed time for a query to be actively executing on the cluster, before it is terminated. | String (Duration) | query.max-execution-time |
||
presto_singlenode_query_max_history |
This property refers to the maximum number of queries to keep in the query history to provide statistics and other information. If this amount is reached, queries are removed based on age | Integer | query.max-history |
||
presto_singlenode_query_max_length |
The maximum number of characters allowed for the SQL query text. Longer queries are not processed, and are terminated with error. | Integer | query.max-length |
||
presto_singlenode_shutdown_grace_period |
This property specifies the duration of time that the system waits after receiving a shutdown request before initiating the shutdown process. During this grace period, the system continues to operate normally, allowing ongoing active tasks to complete. | String (Duration) | shutdown.grace-period |
||
presto_singlenode_experimental_max_spill_per_node |
This property refers to the maximum spill space to be used by all queries on a single node. (when the memory allocated for query processing is exceeded.) | Units of Bytes | experimental.max-spill-per-node |
||
presto_singlenode_experimental_query_max_spill_per_node |
This property refers to the maximum spill space to be used by a single query on a single node. | Units of Bytes | experimental.query-max-spill-per-node |
||
presto_singlenode_experimental_spiller_max_used_space_threshold |
This property sets a threshold disk space usage ratio, if the usgage exceeds beyond this value, this spill path will not be eligible for spilling. | Double | experimental.spiller-max-used-space-threshold |
||
presto_singlenode_experimental_spiller_spill_path |
This property specifies a directory where spilled content will be written. It can be a comma separated list to spill simultaneously to multiple directories, which helps to utilize multiple drives installed in the system. (It is recommended to avoid spilling to system drives and to ensure that spill operations do not interfere with the JVM operation or disk performance.) | String | experimental.spiller-spill-path |
||
presto_singlenode_httpserver_max_request_header_size |
This property is used to set the maximum size of the request header that
http supports. |
Data size | 16kB | httpserver.max_request_header_size |
Y |
presto_singlenode_httpserver_max_response_header_size |
This property is used to set the maximum size of the response header that
http supports. |
Data size | 16kB | httpserver.max_response_header_size |
Y |
Property | Description | Type | System property | Restart containers required |
---|---|---|---|---|
postgres_replicas |
The number of replicas for Postgres. | Integer | spec.replicas |
N |
postgres_resources_limits_cpu |
Resource CPU limit for Postgres. Container is allowed to use only this much CPU. | Kubernetes CPU Unit | resources.limits.cpu |
N |
postgres_resources_limits_memory |
Resource Memory limit for Postgres. Container is allowed to use only this much memory. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
resources.limits.memory |
N |
postgres_resources_requests_cpu |
Resource CPU request for Postgres. | Kubernetes CPU Unit | resources.requests.cpu |
N |
postgres_resources_requests_memory |
Resource memory request for Postgres. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
resources.requests.memory |
N |
Property | Description | Type | System property | Restart containers required |
---|---|---|---|---|
minio_resources_limits_cpu |
Resource CPU limit for MinIO. Container is allowed to use only this much CPU. | Kubernetes CPU Unit | resources.limits.cpu |
N |
minio_resources_limits_memory |
Resource Memory limit for MinIO. A container is allowed to use only this much memory. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
resources.limits.memory |
N |
minio_resources_requests_cpu |
Resource CPU request for MinIO. | Kubernetes CPU Unit | resources.requests.cpu |
N |
minio_resources_requests_memory |
Resource memory request for MinIO. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
resources.requests.memory |
N |
Property | Description | Type | Default value / Default setting | System property | Restart containers required |
---|---|---|---|---|---|
hive_replicas |
The number of replicas for Hive. | Integer | spec.replicas |
N | |
hive_resources_limits_cpu |
Resource CPU limit for Hive, container is allowed to use only this much CPU. | Kubernetes CPU Unit | resources.limits.cpu |
N | |
hive_resources_limits_memory |
Resource Memory limit for Hive. A container is allowed to use only this much memory. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
resources.limits.memory |
N | |
hive_resources_requests_cpu |
Resource CPU request for Hive. | Kubernetes CPU Unit | resources.requests.cpu |
N | |
hive_resources_requests_memory |
Resource memory request for Hive. | Units of Bytes Note: For more information about the memory unit, see Memory resource units.
|
resources.requests.memory |
N |
Property | Description | Type | Default value / Default setting | System property | Default value / Default setting | Restart containers required |
---|---|---|---|---|---|---|
presto_hive_max_outstanding_splits |
Limit of number of splits waiting to be served by split source. After reaching this limit writers will stop writing new splits to split source until some of them are used by workers. Higher value will increase memory usage, but will allow to concentrate all IO at one time which may be much faster and increase resources utilization. | Integer | hive.max-outstanding-splits |
Y | ||
presto_hive_max_initial_splits |
This property describes how many splits may be initially created for a single query. The
initial splits are created to allow better concurrency for small queries. Hive connector will create
first hive.max-initial-splits splits with size of
hive.max-initial-split-size instead of hive.max-split-size . Having
this value higher will force more splits to have smaller size effectively increasing definition of
what is considered small query in database. |
Integer | hive.max-initial-splits |
Y | ||
presto_hive_max_initial_split_size |
This property describes max size of each of initially created splits for a single query. The
logic of initial splits is described in hive.max-initial-splits property. Changing
this value changes what is considered small query. Higher value causes smaller parallelism for small
queries. Lower value increases concurrency for them. This is max size, as the real size may be lower
when end of blocks in single DataNode is reached. |
Integer | hive.max-initial-split-size |
Y | ||
presto_hive_max_split_size |
This property describes how many splits may be initially created for a single query. The
initial splits are created to allow better concurrency for small queries. Hive connector will create
first hive.max-initial-splits splits with size of
hive.max-initial-split-size instead of hive.max-split-size . Having
this value higher will force more splits to have smaller size effectively increasing definition of
what is considered small query in database. |
Integer | hive.max-split-size |
Y | ||
presto_hive_split_loader_concurrency |
This property specifies the level of concurrency for loading data from Hive tables using the Presto Hive connector. It controls the number of concurrent split loader threads that Presto can utilize to fetch data in parallel from the Hive source. A higher concurrency value can improve data retrieval speed and resource utilization, particularly for large Hive queries, but it can also increase system resource consumption. | Integer | hive.split-loader-concurrency |
Y | ||
presto_hive_pushdown_filter_enabled |
This property controls whether filter pushdown is enabled in the Presto Hive connector. Filter pushdown is a query optimization technique that allows Presto to push filtering conditions directly to the underlying Hive data source. When enabled, filtering conditions specified in SQL queries are evaluated as close to the data source as possible, reducing the amount of data that needs to be transferred to Presto for processing. | Integer | hive.pushdown-filter-enabled |
Y | ||
presto_hive_node_selection_strategy |
Add configuration property hive.node-selection-strategy to choose
NodeSelectionStrategy . When SOFT_AFFINITY is selected, scheduler
will make the best effort to request the same worker to fetch the same file. |
String | hive.node-selection-strategy |
Y | ||
presto_hive_max_partitions_per_writers |
The maximum number of partitions permitted for a single writer. | Integer | hive.max-partitions-per-writers |
Y | ||
presto_hive_metastore_timeout |
This parameter specifies the timeout duration for requests made to the Hive metastore. Note:
It is applicable for version 1.1.4 and later of watsonx.data.
|
String (Duration) | hive.metastore-timeout |
10s | Y | |
hive_s3_max_error_retries |
This property allows to set the maximum number of retry attempts for S3 client operations in Hive. | Integer | hive.s3.max-error-retries |
50 | Y | |
hive_s3_connect_timeout |
This property specifies the TCP connection timeout for S3 operations in Hive. | String (Duration) | hive.s3.connect-timeout |
1m | Y | |
hive_s3_socket_timeout |
This property sets the maximum time allowed for reading data from the socket for S3 operations in Hive. | String (Duration) | hive.s3.socket-timeout |
2m | Y | |
hive_s3_max_connections |
This property defines the maximum number of concurrent open connections permitted to S3 in Hive. | Integer | hive.s3.max-connections |
5000 | Y | |
hive_s3_max_client_retries |
This property sets the maximum number of retry attempts for read operations for S3 in Hive. | Integer | hive.s3.max-client-retries |
50 | Y |
Property | Description | Type | System property | Restart containers required | |
---|---|---|---|---|---|
presto_qhmm_enable |
This parameter enables (or disables) the Query History Monitoring and Management (QHMM) service in an engine. | Boolean |
You can customize the watsonx.data components by following the instructions in Customizing watsonx.data components .