Auto-analyze
You can use the auto-analyze feature to automatically determine if a table should be analyzed. If you add new data, or if you have never run ANALYZE on a table, then auto-analyze schedules to run an ANALYZE command.
Running auto-analyze
When more than 50% of the data is new, or the table has never been analyzed, then auto-analyze detects and runs an analyze. For Big SQL HBase tables, if a major compaction has been done since the last analyze, then auto-analyze detects and runs an analyze.
Use the DB2 Administrative Task Scheduler (ATS) to schedule and run tasks that execute an ANALYZE command. Analyze tasks are added to ATS when auto-analyze detects that an ANALYZE command should be run. These analyze tasks are scheduled to run once only and immediately.
There is also one task that is scheduled to run every 10 minutes indefinitely to check if there are tables that need to be analyzed. This task is added during the installation of Big SQL. If you need a different schedule, modify the default cron schedule. Use use multiple schedules as shown in Using multiple schedules for checking analyze.
Modifying the cron schedule for auto-analyze
- Example that will run once every day at midnight:
-
CALL SYSPROC.ADMIN_TASK_UPDATE( 'BIGSQL_CHECK_ANALYZE',NULL, NULL, NULL, '0 0 * * *',NULL,NULL); - Example that will disable the task so that it does not run:
-
CALL SYSPROC.ADMIN_TASK_UPDATE( 'BIGSQL_CHECK_ANALYZE',NULL, NULL, 0, NULL,NULL,NULL);
Using multiple schedules for checking analyze
There can be multiple tasks with different schedules for calling the BIGSQL_CHECK_ANALYZE procedure and with different maximum concurrent task settings. This task must be added as the bigsql administrator user only.
- Daytime schedule
-
CALL SYSPROC.ADMIN_TASK_ADD(' BIGSQL_CHECK_ANALYZE days max 1',NULL, NULL, NULL, '0,15,30,45 7-18 * * *','SYSHADOOP', 'BIGSQL_CHECK_ANALYZE','VALUES(1)',NULL,NULL); - Nighttime schedule
-
CALL SYSPROC.ADMIN_TASK_ADD( 'BIGSQL_CHECK_ANALYZE nights max 5',NULL, NULL, NULL, '0,30 0-6,19-23 * * *', 'SYSHADOOP','BIGSQL_CHECK_ANALYZE', 'VALUES(5)',NULL,NULL);
Big SQL procedures for auto-analyze
- SYSHADOOP.BIGSQL_RUN_ANALYZE
- This procedure is called by the analyze task to run an ANALYZE command. This procedure is an internal procedure that is intended to be used only by auto-analyze.
- SYSHADOOP.BIGSQL_CHECK_ANALYZE
- This procedure is called by a scheduled task that checks if there are tables that should be
analyzed. This procedure is an internal procedure that is intended to be used only by auto-analyze.
This procedure has one parameter:
SYSHADOOP.BIGSQL_CHECK_ANALYZE(maxConcurrentTasks)- maxConcurrentTasks
- The value is an integer that is used to override the property setting of biginsights.stats.auto.analyze.concurrent.max. When the value is set to an integer larger then zero (0), it is used as the maximum number of concurrent tasks.
- SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS
- This procedure is intended to be used by the Big SQL administrator to check the status of
auto-analyze tasks. Successful analyze tasks show a STATUS of COMPLETED and SQLCODE 0. Tasks that
are waiting to run contain NULL values for STATUS and END_TIME. A task that is running shows a
STATUS of RUNNING. If an error was encountered the SQLCODE and MESSAGE show the reason the task
failed. The procedure signature has 2 parameters:
SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS(schema, table)- schema
- When not NULL the results are filtered to show tables in this schema only. This parameter can be NULL to not filter the results. A wild card % can be used.
- table
- When not NULL the results are filtered to show tables matching this name only. This parameter can be NULL to not filter the results. A wild card % can be used.
CALL SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS(NULL, NULL);
+------------+--------+---------------+----------+------------+------------+---------+----------+-------------------------+
| TASKNAME | TASKID | ANALYZE_INPUT | STATUS | BEGIN_TIME | END_TIME | SQLCODE | SQLSTATE | MESSAGE |
+------------+--------+---------------+----------+------------+------------+---------+----------+-------------------------+
| Analyze | 102 | VALUES('BIGSQ | COMPLETE | 2016-03-25 | 2016-03-25 | 0 | | SQL0000W |
| | | | | | | | | Statement processing |
| 1458949355 | | L','HCOUNTRY_ | | 16:50:34.1 | 16:50:46.8 | | | was successful. |
| 682 | | FILE',10, | | 12 | 31 | | | |
| BIGSQL.HCO | | NULL) | | | | | | |
| UNTRY_FILE | | | | | | | | |
|-------------------------------------------------------------------------------------------------------------------------|
| Analyze | 83 | VALUES('TEST' | COMPLETE | 2016-03-24 | 2016-03-24 | 0 | | SQL0000W |
| | | | | | | | | Statement processing |
| 1458912345 | | ,'GENDATA_1YR | | 18:09:18.3 | 18:10:23.7 | | | was successful. |
| TEST.GENDA | | ',10,NULL) | | 84 | 58 | | | |
| TA_1YR | | | | | | | | |
|-------------------------------------------------------------------------------------------------------------------------|
| Analyze | 42 | VALUES('UIUSE | COMPLETE | 2016-03-17 | 2016-03-17 | 0 | | SQL0000W |
| | | | | | | | | Statement processing |
| 1458912222 | | R','T_UIUSER' | | 11:17:39.2 | 11:18:37.9 | | | was successful. |
| UIUSER.T_U | | ,10,NULL) | | 81 | 45 | | | |
| IUSER | | | | | | | | |
|-------------------------------------------------------------------------------------------------------------------------|
| Analyze | 84 | VALUES('TEST' | COMPLETE | 2016-03-24 | 2016-03-24 | -4302 | 58040 | SQL4302N Procedure or |
| 1458987654 | | ,'BAD',10,NUL | | 19:04:19.0 | 19:15:52.3 | | | user-defined function |
| TEST.BAD | | L) | | 45 | 47 | | |"SYSHADOOP.BIGSQL_RUN_A",|
| | | | | | | | | specific name |
| | | | | | | | | "BIGSQL_RUN_ANALYZE" |
| | | | | | | | |aborted with |
| | | | | | | | | an exception |
| | | | | | | | | "[BSL-0-48470644E] |
| | | | | | | | | Error run". |
+------------+--------+---------------+----------+------------+------------+---------+----------+-------------------------+You can also see the status and history of the analyze tasks by querying the ATS views SYSTOOLS.ADMIN_TASK_STATUS and SYSTOOLS.ADMIN_TASK_LIST. These views show all ATS tasks. The BIGSQL_AUTO_ANALYZE_STATUS procedure shows auto-analyze tasks only.
For more information about viewing tasks and status, see ADMIN_TASK_LIST and ADMIN_TASK_STATUS.
Configuration properties for auto-analyze
- biginsights.stats.auto.analyze.concurrent.max
- The default value is 1. This property limits the maximum number of analyze tasks that can be in the task queue at a time. When the scheduler is asked for tables to be analyzed it checks how many tasks are already in the queue running or waiting and limits the amount returned so the total does not exceed this maximum. Any remaining tables are returned on subsequent requests to the scheduler to prevent too many ANALYZE command from being started concurrently.
- biginsights.stats.auto.analyze.task.retention.time
- The default value is 1MONTH. This property controls the housekeeping purge of old tasks that have completed. When old tasks are deleted, the history is also deleted. The available values are 1MONTH, 1WEEK, FOREVER, or NONE.When set to FOREVER, the completed analyze tasks and history are not deleted. When set to NONE, the completed analyze tasks are deleted. When set to 1MONTH or 1WEEK, tasks older then 1 month or 1 week are deleted. Otherwise, the user can use the SYSPROC.ADMIN_TASK_REMOVE procedure to manually purge the old tasks periodically.
- biginsights.stats.auto.analyze.newdata.min
- The default is 50. It is the minimum percentage of new data that is added to the table to cause auto-analyze to detect that the table needs to be analyzed. If less then this amount of data has been added to the table, then auto-analyze does not run an ANALYZE command. The value must be an integer larger then 0.
Disable and enable auto-analyze
At the time of installation, auto-analyze is enabled by default with the default cron schedule. To disable auto-analyze, either modify the cron schedule of the task or remove the ATS task that checks for tables to be analyzed.
- Example of how to disable by removing the task – run as the bigsql administrator user only:
- To disable auto-analyze, remove the ATS task that checks for tables to be
analyzed.
CALL SYSPROC.ADMIN_TASK_REMOVE( 'BIGSQL_CHECK_ANALYZE',NULL); - Example of how to enable by adding the task – run as the bigsql administrator user only:
- To enable auto-analyze, add the ATS task that checks for tables to be analyzed.
CALL SYSPROC.ADMIN_TASK_ADD( 'BIGSQL_CHECK_ANALYZE',NULL, NULL, NULL, '0,10,20,30,40,50 * * * *', 'SYSHADOOP','BIGSQL_CHECK_ANALYZE', 'VALUES(0)',NULL,NULL); - Example of how to disable by changing the cron schedule – run as the bigsql administrator user only:
- To disable auto-analyze, update the schedule of the ATS
task.
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE',NULL, NULL, 0, NULL,NULL,NULL); - Example of how to enable by changing the cron schedule – run as the bigsql administrator user only:
- To enable auto-analyze, update the schedule of the ATS
task.
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL, '0,10,20,30,40,50 * * * *', NULL, NULL); - Example of how to manually remove old completed analyze tasks:
-
SYSPROC.ADMIN_TASK_REMOVE( 'Analyze 1458912222 UIUSER.COUNTRY', NULL)
Impersonation and auto-analyze
When you enable impersonation for Big SQL, then the ANALYZE command runs as the table owner. Otherwise, the ANALYZE command runs as the bigsql administration user. The ATS tasks are always run as the bigsql administration user.
Auto-analyze after running LOAD
If you want to modify the auto-analyze after LOAD feature, then set the biginsights.stats.auto.analyze.post.load configuration property to ONCE, DEFERRED, NEVER, or ALWAYS in the bigsql-conf.xml file, or set a session property by using SET HADOOP PROPERTY. For more information about the auto analyze feature of ANALYZE, see ../../com.ibm.swg.im.infosphere.biginsights.biga.doc/doc/bsql_analyze.html#reference_ppv_gns_ll__auto_load .