Table Optimizer

Table Optimizer is an automated service in watsonx.data that maintains Iceberg table health and accelerates query performance. It works behind the scenes to keep your data organized and optimized, reducing manual effort while improving efficiency.

With Table Optimizer, you gain accelerated queries with shorter scan times and improved response speed, cost efficiency through reduced storage and compute usage, and enhanced scalability to maintain consistent performance as data volumes grow.

Features:

Consolidates small files to minimize I/O overhead
Refines metadata for quicker query planning
Restructures data layouts for optimal access
Operates on flexible schedules with minimal manual input

Before you begin

Ensure that you have IBM watsonx.data Premium on IBM Software Hub.
From the navigation menu, use the application switcher to select the watsonx.data context.

Configure Table Optimizer

You can configure Table Optimizer at three levels:

Catalog level – Applies to all schemas listed under the selected Iceberg catalog.
Schema level – Applies only to the selected schema within the Iceberg catalog.
Table level – Applies only to the selected table within a schema.

Configure Table Optimizer at catalog level

Configure Table Optimizer at schema level

When you configure a schema individually, the system lists it under the Custom schemas tab in the Catalog-level Optimizer Configuration page.

In the Workbench page, navigate to your watsonx.data instance > Presto > User catalogs > your Iceberg catalog > Schemas > your schema. The schema overview page opens.
In the Health section for the schema, click Optimizer Configuration.

Note: The Health section shows the number of unhealthy schemas.
Select your Spark engine from the Engine tab.
Click the Configurations tab. The page displays configuration options such as:
- Compaction
- Expire Snapshots
- Remove Orphan Files
- Partition Evolution
Note:
For detailed information about each configuration option and its parameters, see Table Optimizer configuration options.
You can use one of the following options to configure the table optimizer:
- To inherit all catalog configurations, complete the following steps:
  1. Click the toggle switch to inherit all catalog-level configurations and apply them to the schema.
  2. Click Save and Apply. The system creates Spark jobs at the default interval with the prefix System_Table_Analyzer_Job_<catalog name>. These jobs are generated within 30 minutes.
- To customize configurations, complete the following steps:
  
  Note: The following steps are common for all configuration options.
  1. Click Edit configuration icon for any configuration.
  2. Configure settings based on your workload:
    - Choose Automatic runs:
      - Evaluation only: Runs evaluation and reports issues without modifying data.
      - Evaluation and optimization: Runs evaluation and applies the recommended optimization automatically.
    - Set the Repeat job interval (Minutes, Hour, Day, Week, Month).
    - To customize the optimization behavior, expand Optional Settings and modify the parameters based on your workload.
      
      Note: If you are configuring Partition Evolution, specify the path where the queries are stored.
    - Click Save.
  3. Click Save and Apply. Customizing configurations creates Spark jobs with the prefix System_Table_Analyzer_Job_<catalog name>. These jobs are generated within 30 minutes.

To do optimization after selecting Evaluation only

If you select Evaluation only as the configuration option at both the catalog and schema levels and want to optimize, complete the following steps:

In the schema overview page, click Tables tab.
You can see all tables in the schema along with their Health status and type, click Optimize all tables for bulk optimization or select an individual table and click Optimize. The system creates Spark jobs at the default interval with the prefix System_Table_Analyzer_Job_<catalog name>. These jobs are generated within 30 minutes.

Configure Table Optimizer at table level

Note:

Table level jobs are one‑time jobs and cannot be scheduled.

In the Workbench page, navigate to your watsonx.data instance > Presto > User catalogs > your Iceberg catalog > Schemas > your schema > Schema tables > your table. The table overview page opens.
Click Health tab.
The page lists optimization options for that table:
- Compaction
- Expire Snapshots
- Remove Orphan Files
- Partition Evolution
Note:
For detailed information about each configuration option and its parameters, see Table Optimizer configuration options.
Click Optimize to run optimization immediately. The system creates Spark jobs with the prefix System_Table_Analyzer_Job_<catalog name>.

View and edit Table Optimizer Spark job

Note:

Important Considerations for Table Optimizer Jobs

System-Managed Jobs: Table optimizer jobs are automatically managed by the system. Do not edit job details, except for notification settings.
Project scoped: Table optimizer jobs are scoped to individual projects. Jobs scheduled within a project are only visible in that project. Deleting a project permanently removes all associated optimization jobs.
Avoid disruptions: Deleting a job or modifying its name will interrupt scheduled execution and may cause optimization failures.
Prerequisites: Before scheduling optimization for a catalog, verify that all schemas in the catalog contain at least one table.

Access your Project.
Click Assets tab.
Select the required Table Optimizer Spark job and click on it. The Job Details page opens.
To configure notifications, select the job and click Edit configuration icon (on top right side of the page).
Note: You can only configure notification for Table Optimizer Spark jobs.
The Edit Spark job page opens.
Edit the required details.
Click Save.