To provide optimized queries, Query optimizer manager pulls data about table
definitions and Hive and Iceberg statistics to synchronize with Metadata Service in IBM®
watsonx.data. You can select the specific Hive and Iceberg
table that must be available for Query optimizer manager. It is recommended to generate
Hive and Iceberg statistics and label columns for primary and foreign keys to get the best
results.
Activating Query optimizer manager automatically synchronizes metadata for
catalogs that are connected to Presto (C++) engines. However, you will need to run the following
steps if:
- Metadata for inaccessible or corrupted catalogs or schemas during deployment are missing.
- Significant changes are made to a table.
- New tables are introduced after the initial sync operation.
- An intermittent issue is preventing tables from being synced during the automatic syncing
process upon activation.
To sync tables from watsonx.data, the
following items are required:
Note: The statistics collection and synchronization steps in this topic can now be executed using
the enhanced feature
Managing statistical updates from the
Optimizer dashboard, which enables advanced query performance enhancements and optimization
capabilities across multiple catalogs.
- Verify that all expected tables are synced by following the procedure in Verifying table sync in watsonx.data.
- Identify the list of Hive and Iceberg tables in watsonx.data that you require for Query
optimizer manager.
- Identify columns as primary and foreign keys in the Hive and Iceberg tables.
ANALYZE Hive and Iceberg tables in Presto (C++) to generate Hive and Iceberg
statistics.
- Only users with administrator privilege is allowed to run
ExecuteWxdQueryOptimizer command as a security enhancement feature.
- If the session parameter
is_query_rewriter_plugin_enabled is set to
false, you will not be able to execute the
ExecuteWxdQueryOptimizer commands.
- Log in to watsonx.data.
- Go to Query workspace.
- Run the
ANALYZE command from the watsonx.data web console for the tables that you want
to sync to generate the statistics (Statistics is the number of rows, column name, data_size, row
count, and more).
ANALYZE catalog_name.schema_name.table_name ;
- For watsonx.data, Hive and Iceberg
tables are managed using distinct metastore server types. Depending on your application needs, you
must register metastore servers for Hive, Iceberg, or both.
- Registering a Hive catalog in watsonx.data:
ExecuteWxdQueryOptimizer 'CALL SYSHADOOP.REGISTER_EXT_METASTORE('<CATALOG_NAME>','type=watsonx-data-hive, /
uri=https://<THRIFT_URL>/mds/thrift,use.SSL=true,auth.mode=PLAIN, /
ssl.cert=/secrets/external/ibm-lh-tls-secret/ca.crt, /
auth.plain.credentials=<USERNAME>:<PASSWORD>', ?, ?)';
For example:
ExecuteWxdQueryOptimizer 'CALL SYSHADOOP.REGISTER_EXT_METASTORE('hive_data','type=watsonx-data-hive, /
uri=https://ibm-lh-lakehouse-mds-thrift-svc.cpd-instance.svc.cluster.local:8381/mds/thrift,use.SSL=true,auth.mode=PLAIN, /
ssl.cert=/secrets/external/ibm-lh-tls-secret/ca.crt, /
auth.plain.credentials=username:password', ?, ?)';
type - The type of metastore to which you are connecting. For the metastore
managing Hive tables, the value is: watsonx-data-hive.
<CATALOG_NAME> - as shown on the Infrastructure
Manager page (case sensitive).
<THRIFT_URL> - The URI of the watsonx.data
MDS thrift server. It must start with
https://.
- MDS credentials
(
<Username> and <Password>) in
auth.plain.credentials - Must be created on the watsonx.data side. See Connecting to watsonx.data on OpenShift. If the metastore
requires PLAIN authentication, the credentials must be specified in the format
username:password or ibmlhapikey_<username>:apikey. The
password is stored securely in a software keystore.
auth.mode - If the metastore requires authentication, indicates the mode of
authentication to use. The auth.mode must be set to PLAIN.
use.SSL - It must be true if the metastore requires an SSL
connection.
<MDS_certificate_file_path> - This must be provided as a file on the db2u
container as a certificate to validate the SSL connection. It is not necessary to pass a certificate
if the SSL connection is established using a certificate issued by a well-known CA such as DigiCert
or VeriSign. By default, the MDS certificates
are available under the /secrets/external/ibm-lh-tls-secret/ca.crt path in
Query optimizer manager.
- Registering an Iceberg catalog from watsonx.data:
ExecuteWxdQueryOptimizer 'CALL SYSHADOOP.REGISTER_EXT_METASTORE('<CATALOG_NAME>','type=iceberg-rest, /
catalog.name=<CATALOG_NAME>,uri=https://<REST_URL>/mds/iceberg,auth.mode=basic, /
ssl.cert=/secrets/external/ibm-lh-tls-secret/ca.crt, /
auth.plain.credentials=<USERNAME>:<PASSWORD>', ?, ?)';
For example:
ExecuteWxdQueryOptimizer 'CALL SYSHADOOP.REGISTER_EXT_METASTORE('iceberg_data','type=iceberg-rest, /
catalog.name=iceberg_rest,uri=https://ibm-lh-lakehouse-mds-rest-svc.cpd-instance.svc.cluster.local:8180/mds/iceberg,auth.mode=basic, /
ssl.cert=/secrets/external/ibm-lh-tls-secret/ca.crt, /
auth.plain.credentials=username:password', ?, ?)';
type - The type of metastore to which you are connecting. For the metastore
managing Iceberg tables, the value is: iceberg-rest.
<CATALOG_NAME> - as shown on the Infrastructure
Manager page (case sensitive).
<REST_URL> - The URI of the watsonx.data Iceberg REST MDS server. The URI must start with
https:// and contain the base path to the REST API catalog. For watsonx.data, the base path is
/mds/iceberg.
- watsonx.data credentials
(
<Username> and <Password>) in
auth.plain.credentials - Must be created on the watsonx.data side. See Connecting to watsonx.data on OpenShift. If the metastore
requires PLAIN authentication, the credentials must be specified in the format
username:password or ibmlhapikey_<username>:apikey. The
password is stored securely in a software keystore.
auth.mode - If the metastore requires authentication, indicates the mode of
authentication to use. The auth.mode must be set to basic.
use.SSL - It must be true if the metastore requires an SSL
connection.
MDS_certificate_file_path> - This must be provided as a file on the db2u
container as a certificate to validate the SSL connection. It is not necessary to pass a certificate
if the SSL connection is established using a certificate issued by a well-known CA such as DigiCert
or VeriSign. By default, the MDS certificates
are available under the /secrets/external/ibm-lh-tls-secret/ca.crt path in
Query optimizer manager.
- Run the following command to synchronize the tables for each schema in the catalog:
ExecuteWxdQueryOptimizer 'CALL SYSHADOOP.EXT_METASTORE_SYNC('CATALOG_NAME', 'SCHEMA_NAME', '.*', 'SYNC MODE', 'CONTINUE', 'OAAS')';
<CATALOG_NAME>: The name of the catalog where the tables to synchronize
belong to.
<SCHEMA_NAME>: The name of the schema where the tables to synchronize belong
to.
<SYNC MODE>: SKIP is a sync mode indicating the objects
that are already defined should be skipped. REPLACE is another sync mode used to
update the object if it has been modified since last synced.
<CONTINUE>: The error is logged, but processing continues if multiple tables
are to be imported.
Note: When synchronization is completed, the output displays the list of synced tables. The total count of synced tables must be double the number of tables within the catalog or schema. This is because, each table are synced two times. Once from the external metastore to the local metastore, and then from the local metastore to the Db2 catalog.
- Identify the list of catalogs and schemas in watsonx.data that you require for Query
optimizer manager.
Provide an SQL file to define the constraints for the Query optimizer
manager to use. In the SQL file, identify primary keys, foreign keys, and not null
columns where applicable for each table in your data set.
For example, if you have the following three tables with the given columns:
Employees (EmployeeID, FirstName, LastName, Department, and Salary), Departments (DepartmentID
and DepartmentName), and EmployeeDepartmentMapping (MappingID, EmployeeID, and DepartmentID).
Run the ALTER table command to define the constraints:
-- NOT NULL
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.Employees ALTER COLUMN FirstName SET NOT NULL ALTER COLUMN LasttName SET NOT NULL ALTER COLUMN Salary SET NOT NULL ALTER COLUMN EmployeeID SET NOT NULL';
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.Departments ALTER COLUMN DepartmentName SET NOT NULL ALTER COLUMN DepartmentID SET NOT NULL ';
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.EmployeeDepartmentMapping ALTER COLUMN MappingID SET NOT NULL ';
-- PRIMARY KEY
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.Employees ADD PRIMARY KEY (EmployeeID) NOT ENFORCED';
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.Departments ADD PRIMARY KEY (DepartmentID) NOT ENFORCED';
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.EmployeeDepartmentMapping ADD PRIMARY KEY (MappingID) NOT ENFORCED';
-- FOREIGN KEY
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.EmployeeDepartmentMapping ADD FOREIGN KEY (EmployeeID) REFERENCES "catalog_name".schema_name.Employees(EmployeeID) NOT ENFORCED';
ExecuteWxdQueryOptimizer 'ALTER TABLE "catalog_name".schema_name.EmployeeDepartmentMapping ADD FOREIGN KEY (DepartmentID) REFERENCES "catalog_name".schema_name.Departments(DepartmentID) NOT ENFORCED';