Auto-analyze

You can use the auto-analyze feature to automatically run an ANALYZE TABLE statement on tables when it is determined to be necessary.

Auto-analyze only considers a table for analyzing when that table is involved in a scan. If it determines a table needs analyzing, it is then added to the auto-analyze queue.

When more than 50% of the table data is new, or the table has never been analyzed, the auto-analyze process runs the ANALYZE TABLE statement. Auto-analyze uses a 10% sampling of the table data. If you have a very large table, you might want to manually run the ANALYZE TABLE statement, specifying a lower percentage of sampling.

At Db2® Warehouse SaaS installation time, auto-analyze is enabled to run every 10 minutes with the default cron schedule. If you need a different schedule, you can modify the default cron schedule by using a DB2® Administrative Task Scheduler (ATS) procedure named SYSPROC.ADMIN_TASK_UPDATE to update the ATS task named BIGSQL_CHECK_ANALYZE. For more information about setting a task cron schedule, see UNIX cron format.

An analyze task is also added to the ATS when auto-analyze detects that an ANALYZE TABLE statement should be run. These analyze tasks are scheduled to run immediately and only once.

To disable auto-analyze, either modify the cron schedule for the task or remove the ATS task (named BIGSQL_CHECK_ANALYZE) that checks for tables that are to be analyzed.

Db2 procedures for auto-analyze

Note: SYSHADOOP.BIGSQL_RUN_ANALYZE and SYSHADOOP.BIGSQL_CHECK_ANALYZE are internal procedures that are intended to be used only by auto-analyze.
SYSHADOOP.BIGSQL_RUN_ANALYZE
An analyze task calls this procedure to run the ANALYZE TABLE statement.
SYSHADOOP.BIGSQL_CHECK_ANALYZE
A scheduled task calls this procedure to determine whether any tables should be analyzed. This procedure has one parameter:
SYSHADOOP.BIGSQL_CHECK_ANALYZE(maxConcurrentTasks)
maxConcurrentTasks
Specifies the maximum number of concurrently running tasks. The value is an integer that is used to override the setting of the biginsights.stats.auto.analyze.concurrent.max property.
SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS
A Db2 user calls this procedure to check the status of auto-analyze tasks. For details, see BIGSQL_AUTO_ANALYZE_STATUS stored procedure.

Configuration properties for auto-analyze

Use the following properties to set the behavior of auto-analyze.
biginsights.stats.auto.analyze.concurrent.max
Specifies the maximum number of analyze tasks that can be fetched at one time from the scheduler auto-analyze queue. This is equivalent to the number of concurrent ANALYZE TABLE statements that can be started by auto-analyze. The number should be kept low to prevent too much resource from being used by auto-analyze, which could impact the performance of other concurrent operations. The default is 1.
biginsights.stats.auto.analyze.newdata.min
Specifies the minimum percentage of new data that must be added to the table before an auto-analyze operation is triggered. The value must be an integer greater than 0. The default is 50, which means that at least 50% of the table data must be new before auto-analyze determines that the table needs to be analyzed.
biginsights.stats.auto.analyze.task.retention.time
Specifies how long completed analyze tasks and their histories are to be kept before they are purged. When histories are purged, the status of completed tasks is no longer available. Valid values are 1WEEK, 1MONTH, FOREVER, or NONE. The default is 1MONTH. When set to NONE, analyze tasks are deleted immediately after completion. When set to FOREVER, completed analyze tasks and their histories are never deleted. You can use the SYSPROC.ADMIN_TASK_REMOVE procedure to periodically purge completed analyze tasks manually.

Using multiple schedules

There can be multiple tasks with different schedules for calling the SYSHADOOP.BIGSQL_CHECK_ANALYZE procedure, or tasks with different maximum concurrent task settings. These tasks must be added by the Db2 administrator. For examples, see Use multiple schedules.

Examples

  • Have auto-analyze run once every day at midnight:
    CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL,'0 0 * * *', NULL, NULL);
  • Disable the auto-analyze task so that it does not run:
    CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, 0, NULL, NULL, NULL);
  • Disable auto-analyze by removing the ATS task that checks the tables (Db2 administrator only):
    CALL SYSPROC.ADMIN_TASK_REMOVE('BIGSQL_CHECK_ANALYZE', NULL);
  • Enable auto-analyze by adding an ATS task that checks the tables (Db2 administrator only):
    CALL SYSPROC.ADMIN_TASK_ADD('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL, '0,10,20,30,40,50 * * * *', 'SYSHADOOP', 'BIGSQL_CHECK_ANALYZE', 'VALUES(0)', NULL, NULL);
  • Disable auto-analyze by updating the cron schedule of the ATS task (Db2 administrator only):
    CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, 0, NULL, NULL, NULL);
  • Enable auto-analyze by updating the cron schedule of the ATS task (Db2 administrator only). This example sets the cron schedule to check every 10 minutes:
    CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL, '0,10,20,30,40,50 * * * *', NULL, NULL);
  • Manually purge completed analyze tasks. In this example, the name of the ATS analyze task is Analyze 1458912222 UIUSER.COUNTRY:
    SYSPROC.ADMIN_TASK_REMOVE('Analyze 1458912222 UIUSER.COUNTRY', NULL);
  • Use multiple schedules. The following examples show how to add two SYSHADOOP.BIGSQL_CHECK_ANALYZE tasks with different schedules and different maximum concurrent task settings. The first task (for daytime) runs every 15 minutes from 7:00-18:59 and allows a maximum of one concurrent analyze task. The second task (for nighttime) runs every 30 minutes from 00:00-6:59 and from 19:00-12:59, allowing a maximum of five concurrent analyze tasks.
    Daytime schedule
    CALL SYSPROC.ADMIN_TASK_ADD('BIGSQL_CHECK_ANALYZE days max 1', NULL, NULL, NULL, '0,15,30,45 7-18 * * *', 'SYSHADOOP', 'BIGSQL_CHECK_ANALYZE', 'VALUES(1)', NULL, NULL);
    Nighttime schedule
    CALL  SYSPROC.ADMIN_TASK_ADD('BIGSQL_CHECK_ANALYZE nights max 5',NULL, NULL,  NULL, '0,30 0-6,19-23 * * *', 'SYSHADOOP','BIGSQL_CHECK_ANALYZE', 'VALUES(5)',NULL,NULL);