Auto-analyze

You can use the auto-analyze feature to automatically determine if a table should be analyzed. If you add new data, or if you have never run ANALYZE on a table, then auto-analyze schedules to run an ANALYZE command.

Running auto-analyze

When more than 50% of the data is new, or the table has never been analyzed, then auto-analyze detects and runs an analyze. For Big SQL HBase tables, if a major compaction has been done since the last analyze, then auto-analyze detects and runs an analyze.

Use the DB2 Administrative Task Scheduler (ATS) to schedule and run tasks that execute an ANALYZE command. Analyze tasks are added to ATS when auto-analyze detects that an ANALYZE command should be run. These analyze tasks are scheduled to run once only and immediately.

There is also one task that is scheduled to run every 10 minutes indefinitely to check if there are tables that need to be analyzed. This task is added during the installation of Big SQL. If you need a different schedule, modify the default cron schedule. Use use multiple schedules as shown in Using multiple schedules for checking analyze.

Modifying the cron schedule for auto-analyze

The task that checks for tables to be analyzed has a default cron schedule of '0,10,20,30,40,50 * * * *'. This mean that the task runs every 10 minutes indefinitely. Change the cron schedule by using the ATS procedure, SYSPROC.ADMIN_TASK_UPDATE.
Example that will run once every day at midnight:
 

CALL SYSPROC.ADMIN_TASK_UPDATE(
  'BIGSQL_CHECK_ANALYZE',NULL, NULL, 
   NULL, '0 0 * * *',NULL,NULL);
Example that will disable the task so that it does not run:

CALL SYSPROC.ADMIN_TASK_UPDATE(
  'BIGSQL_CHECK_ANALYZE',NULL, NULL, 
  0, NULL,NULL,NULL);
For more information about setting a task cron schedule, see UNIX cron format.

Using multiple schedules for checking analyze

There can be multiple tasks with different schedules for calling the BIGSQL_CHECK_ANALYZE procedure and with different maximum concurrent task settings. This task must be added as the bigsql administrator user only.

The following examples show how to add 2 BIGSQL_CHECK_ANALYZE tasks with different schedules and different maximum concurrent tasks. In these examples, one task is for daytime, and it runs from 7:00-18:59, every 15 minutes with a maximum of 1 concurrent analyze task allowed. The other task is for nighttime, and it runs from 00:00-6:59 and 19:00-12:59, every half hour with a maximum of 5 concurrent analyze task allowed.
Daytime schedule

CALL SYSPROC.ADMIN_TASK_ADD('
  BIGSQL_CHECK_ANALYZE days max 1',NULL,  NULL, NULL,
      '0,15,30,45 7-18 * * *','SYSHADOOP',
  'BIGSQL_CHECK_ANALYZE','VALUES(1)',NULL,NULL); 
Nighttime schedule

CALL  SYSPROC.ADMIN_TASK_ADD(
  'BIGSQL_CHECK_ANALYZE nights max 5',NULL, NULL,  NULL,
      '0,30 0-6,19-23 * * *', 
  'SYSHADOOP','BIGSQL_CHECK_ANALYZE',
  'VALUES(5)',NULL,NULL);

Big SQL procedures for auto-analyze

SYSHADOOP.BIGSQL_RUN_ANALYZE
This procedure is called by the analyze task to run an ANALYZE command. This procedure is an internal procedure that is intended to be used only by auto-analyze.
SYSHADOOP.BIGSQL_CHECK_ANALYZE
This procedure is called by a scheduled task that checks if there are tables that should be analyzed. This procedure is an internal procedure that is intended to be used only by auto-analyze.
This procedure has one parameter:
SYSHADOOP.BIGSQL_CHECK_ANALYZE(maxConcurrentTasks)
maxConcurrentTasks
The value is an integer that is used to override the property setting of biginsights.stats.auto.analyze.concurrent.max. When the value is set to an integer larger then zero (0), it is used as the maximum number of concurrent tasks.
SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS
This procedure is intended to be used by the Big SQL administrator to check the status of auto-analyze tasks. Successful analyze tasks show a STATUS of COMPLETED and SQLCODE 0. Tasks that are waiting to run contain NULL values for STATUS and END_TIME. A task that is running shows a STATUS of RUNNING. If an error was encountered the SQLCODE and MESSAGE show the reason the task failed.
The procedure signature has 2 parameters:
SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS(schema, table)
schema
When not NULL the results are filtered to show tables in this schema only. This parameter can be NULL to not filter the results. A wild card % can be used.
table
When not NULL the results are filtered to show tables matching this name only. This parameter can be NULL to not filter the results. A wild card % can be used.
An example that shows all:

CALL SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS(NULL, NULL);

+------------+--------+---------------+----------+------------+------------+---------+----------+-------------------------+
| TASKNAME   | TASKID | ANALYZE_INPUT | STATUS   | BEGIN_TIME | END_TIME   | SQLCODE | SQLSTATE | MESSAGE                 |
+------------+--------+---------------+----------+------------+------------+---------+----------+-------------------------+
| Analyze    |    102 | VALUES('BIGSQ | COMPLETE | 2016-03-25 | 2016-03-25 |       0 |          | SQL0000W                |  
|            |        |               |          |            |            |         |          | Statement processing    |
| 1458949355 |        | L','HCOUNTRY_ |          | 16:50:34.1 | 16:50:46.8 |         |          | was successful.         |
| 682        |        | FILE',10,     |          | 12         | 31         |         |          |                         |
| BIGSQL.HCO |        | NULL)         |          |            |            |         |          |                         |
| UNTRY_FILE |        |               |          |            |            |         |          |                         |
|-------------------------------------------------------------------------------------------------------------------------| 
| Analyze    |     83 | VALUES('TEST' | COMPLETE | 2016-03-24 | 2016-03-24 |       0 |          | SQL0000W                |              
|            |        |               |          |            |            |         |          | Statement processing    |
| 1458912345 |        | ,'GENDATA_1YR |          | 18:09:18.3 | 18:10:23.7 |         |          | was successful.         |
| TEST.GENDA |        | ',10,NULL)    |          | 84         | 58         |         |          |                         |
| TA_1YR     |        |               |          |            |            |         |          |                         |
|-------------------------------------------------------------------------------------------------------------------------| 
| Analyze    |     42 | VALUES('UIUSE | COMPLETE | 2016-03-17 | 2016-03-17 |       0 |          | SQL0000W                |
|            |        |               |          |            |            |         |          | Statement processing    |
| 1458912222 |        | R','T_UIUSER' |          | 11:17:39.2 | 11:18:37.9 |         |          | was successful.         |
| UIUSER.T_U |        | ,10,NULL)     |          | 81         | 45         |         |          |                         |
| IUSER      |        |               |          |            |            |         |          |                         |
|-------------------------------------------------------------------------------------------------------------------------| 
| Analyze    |     84 | VALUES('TEST' | COMPLETE | 2016-03-24 | 2016-03-24 |   -4302 |   58040  | SQL4302N  Procedure or  |
| 1458987654 |        | ,'BAD',10,NUL |          | 19:04:19.0 | 19:15:52.3 |         |          | user-defined function   |
| TEST.BAD   |        | L)            |          | 45         | 47         |         |          |"SYSHADOOP.BIGSQL_RUN_A",|
|            |        |               |          |            |            |         |          | specific name           |
|            |        |               |          |            |            |         |          | "BIGSQL_RUN_ANALYZE"    |
|            |        |               |          |            |            |         |          |aborted with             |
|            |        |               |          |            |            |         |          | an exception            |
|            |        |               |          |            |            |         |          | "[BSL-0-48470644E]      |
|            |        |               |          |            |            |         |          | Error run".             |
+------------+--------+---------------+----------+------------+------------+---------+----------+-------------------------+

You can also see the status and history of the analyze tasks by querying the ATS views SYSTOOLS.ADMIN_TASK_STATUS and SYSTOOLS.ADMIN_TASK_LIST. These views show all ATS tasks. The BIGSQL_AUTO_ANALYZE_STATUS procedure shows auto-analyze tasks only.

For more information about viewing tasks and status, see ADMIN_TASK_LIST and ADMIN_TASK_STATUS.

Configuration properties for auto-analyze

Use these properties to set the behavior of auto-analyze.
biginsights.stats.auto.analyze.concurrent.max
The default value is 1. This property limits the maximum number of analyze tasks that can be in the task queue at a time. When the scheduler is asked for tables to be analyzed it checks how many tasks are already in the queue running or waiting and limits the amount returned so the total does not exceed this maximum. Any remaining tables are returned on subsequent requests to the scheduler to prevent too many ANALYZE command from being started concurrently.
biginsights.stats.auto.analyze.task.retention.time
The default value is 1MONTH. This property controls the housekeeping purge of old tasks that have completed. When old tasks are deleted, the history is also deleted. The available values are 1MONTH, 1WEEK, FOREVER, or NONE.When set to FOREVER, the completed analyze tasks and history are not deleted. When set to NONE, the completed analyze tasks are deleted. When set to 1MONTH or 1WEEK, tasks older then 1 month or 1 week are deleted. Otherwise, the user can use the SYSPROC.ADMIN_TASK_REMOVE procedure to manually purge the old tasks periodically.
biginsights.stats.auto.analyze.newdata.min
The default is 50. It is the minimum percentage of new data that is added to the table to cause auto-analyze to detect that the table needs to be analyzed. If less then this amount of data has been added to the table, then auto-analyze does not run an ANALYZE command. The value must be an integer larger then 0.

Disable and enable auto-analyze

At the time of installation, auto-analyze is enabled by default with the default cron schedule. To disable auto-analyze, either modify the cron schedule of the task or remove the ATS task that checks for tables to be analyzed.

CLOUD environment: You might not be able to add or remove the tasks as a bigsql user. In that case, always modify the cron schedule of the task with biadmin user and never remove the task. The task must run as bigsql user.
Examples
Example of how to disable by removing the task – run as the bigsql administrator user only:
To disable auto-analyze, remove the ATS task that checks for tables to be analyzed.

CALL SYSPROC.ADMIN_TASK_REMOVE(
  'BIGSQL_CHECK_ANALYZE',NULL);
Example of how to enable by adding the task – run as the bigsql administrator user only:
To enable auto-analyze, add the ATS task that checks for tables to be analyzed.

CALL SYSPROC.ADMIN_TASK_ADD(
  'BIGSQL_CHECK_ANALYZE',NULL, NULL, 
  NULL, '0,10,20,30,40,50 * * * *', 
  'SYSHADOOP','BIGSQL_CHECK_ANALYZE',
  'VALUES(0)',NULL,NULL);
Example of how to disable by changing the cron schedule – run as the bigsql administrator user only:
To disable auto-analyze, update the schedule of the ATS task.
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE',NULL, NULL, 0, NULL,NULL,NULL);
Example of how to enable by changing the cron schedule – run as the bigsql administrator user only:
To enable auto-analyze, update the schedule of the ATS task.
CALL SYSPROC.ADMIN_TASK_UPDATE('BIGSQL_CHECK_ANALYZE', NULL, NULL, NULL, '0,10,20,30,40,50 * * * *',  NULL, NULL);
Example of how to manually remove old completed analyze tasks:

SYSPROC.ADMIN_TASK_REMOVE(
  'Analyze 1458912222 UIUSER.COUNTRY', NULL)

Impersonation and auto-analyze

When you enable impersonation for Big SQL, then the ANALYZE command runs as the table owner. Otherwise, the ANALYZE command runs as the bigsql administration user. The ATS tasks are always run as the bigsql administration user.

Auto-analyze after running LOAD

If you want to modify the auto-analyze after LOAD feature, then set the biginsights.stats.auto.analyze.post.load configuration property to ONCE, DEFERRED, NEVER, or ALWAYS in the bigsql-conf.xml file, or set a session property by using SET HADOOP PROPERTY. For more information about the auto analyze feature of ANALYZE, see ../../com.ibm.swg.im.infosphere.biginsights.biga.doc/doc/bsql_analyze.html#reference_ppv_gns_ll__auto_load .