Auto-analyze
You can use the auto-analyze feature to automatically run an ANALYZE TABLE statement on tables when it is determined to be necessary.
Auto-analyze only considers a table for analyzing when that table is involved in a scan. If it determines a table needs analyzing, it is then added to the auto-analyze queue.
When more than 50% of the table data is new, or the table has never been analyzed, the auto-analyze process runs the ANALYZE TABLE statement. Auto-analyze uses a 10% sampling of the table data. If you have a very large table, you might want to manually run the ANALYZE TABLE statement, specifying a lower percentage of sampling.
At Db2® Warehouse SaaS installation time, auto-analyze is enabled to run every 10 minutes with the default cron schedule. If you need a different schedule, you can modify the default cron schedule by using a DB2® Administrative Task Scheduler (ATS) procedure named SYSPROC.ADMIN_TASK_UPDATE to update the ATS task named BIGSQL_CHECK_ANALYZE. For more information about setting a task cron schedule, see UNIX cron format.
An analyze task is also added to the ATS when auto-analyze detects that an ANALYZE TABLE statement should be run. These analyze tasks are scheduled to run immediately and only once.
To disable auto-analyze, either modify the cron schedule for the task or remove the ATS task (named BIGSQL_CHECK_ANALYZE) that checks for tables that are to be analyzed.
Db2 procedures for auto-analyze
- SYSHADOOP.BIGSQL_RUN_ANALYZE
- An analyze task calls this procedure to run the ANALYZE TABLE statement.
- SYSHADOOP.BIGSQL_CHECK_ANALYZE
- A scheduled task calls this procedure to determine whether any tables should be analyzed. This
procedure has one parameter:
SYSHADOOP.BIGSQL_CHECK_ANALYZE(maxConcurrentTasks)- maxConcurrentTasks
- Specifies the maximum number of concurrently running tasks. The value is an integer that is used to override the setting of the biginsights.stats.auto.analyze.concurrent.max property.
- SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS
- A Db2 user calls this procedure to check the status of auto-analyze tasks. For details, see BIGSQL_AUTO_ANALYZE_STATUS stored procedure.
Configuration properties for auto-analyze
- biginsights.stats.auto.analyze.concurrent.max
- Specifies the maximum number of analyze tasks that can be fetched at one time from the scheduler auto-analyze queue. This is equivalent to the number of concurrent ANALYZE TABLE statements that can be started by auto-analyze. The number should be kept low to prevent too much resource from being used by auto-analyze, which could impact the performance of other concurrent operations. The default is 1.
- biginsights.stats.auto.analyze.newdata.min
- Specifies the minimum percentage of new data that must be added to the table before an auto-analyze operation is triggered. The value must be an integer greater than 0. The default is 50, which means that at least 50% of the table data must be new before auto-analyze determines that the table needs to be analyzed.
- biginsights.stats.auto.analyze.task.retention.time
- Specifies how long completed analyze tasks and their histories are to be kept before they are purged. When histories are purged, the status of completed tasks is no longer available. Valid values are 1WEEK, 1MONTH, FOREVER, or NONE. The default is 1MONTH. When set to NONE, analyze tasks are deleted immediately after completion. When set to FOREVER, completed analyze tasks and their histories are never deleted. You can use the SYSPROC.ADMIN_TASK_REMOVE procedure to periodically purge completed analyze tasks manually.
Using multiple schedules
There can be multiple tasks with different schedules for calling the SYSHADOOP.BIGSQL_CHECK_ANALYZE procedure, or tasks with different maximum concurrent task settings. These tasks must be added by the Db2 administrator. For examples, see Use multiple schedules.
Examples
- Have auto-analyze run once every day at
midnight:
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL,'0 0 * * *', NULL, NULL); - Disable the auto-analyze task so that it does not
run:
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, 0, NULL, NULL, NULL); - Disable auto-analyze by removing the ATS task that checks the tables (Db2 administrator
only):
CALL SYSPROC.ADMIN_TASK_REMOVE('BIGSQL_CHECK_ANALYZE', NULL); - Enable auto-analyze by adding an ATS task that checks the tables (Db2 administrator
only):
CALL SYSPROC.ADMIN_TASK_ADD('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL, '0,10,20,30,40,50 * * * *', 'SYSHADOOP', 'BIGSQL_CHECK_ANALYZE', 'VALUES(0)', NULL, NULL); - Disable auto-analyze by updating the cron schedule of the ATS task (Db2 administrator
only):
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, 0, NULL, NULL, NULL); - Enable auto-analyze by updating the cron schedule of the ATS task (Db2 administrator only). This example sets the cron schedule to check every 10
minutes:
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL, '0,10,20,30,40,50 * * * *', NULL, NULL); - Manually purge completed analyze tasks. In this example, the name of the ATS analyze task is
Analyze 1458912222 UIUSER.COUNTRY:SYSPROC.ADMIN_TASK_REMOVE('Analyze 1458912222 UIUSER.COUNTRY', NULL); - Use multiple schedules. The following examples show how to add two
SYSHADOOP.BIGSQL_CHECK_ANALYZE tasks with different schedules and different maximum concurrent task
settings. The first task (for daytime) runs every 15 minutes from 7:00-18:59 and allows a maximum of
one concurrent analyze task. The second task (for nighttime) runs every 30 minutes from 00:00-6:59
and from 19:00-12:59, allowing a maximum of five concurrent analyze tasks.
- Daytime schedule
-
CALL SYSPROC.ADMIN_TASK_ADD('BIGSQL_CHECK_ANALYZE days max 1', NULL, NULL, NULL, '0,15,30,45 7-18 * * *', 'SYSHADOOP', 'BIGSQL_CHECK_ANALYZE', 'VALUES(1)', NULL, NULL); - Nighttime schedule
-
CALL SYSPROC.ADMIN_TASK_ADD('BIGSQL_CHECK_ANALYZE nights max 5',NULL, NULL, NULL, '0,30 0-6,19-23 * * *', 'SYSHADOOP','BIGSQL_CHECK_ANALYZE', 'VALUES(5)',NULL,NULL);