Db2 Big SQL scheduler

The Big SQL scheduler is a service that helps in the compilation and execution of queries involving Datalake tables. The scheduler provides access to the metadata for Datalake tables that are involved in a scan. This metadata includes table descriptors with details such as column types, column lengths, partition information, and input format.

The scheduler evenly distributes the scan of a Datalake table in your cluster and also does partition elimination or data skipping. It uses system resources efficiently to maximize throughput and minimize response time. It also assigns splits to readers and commits data insertion. The Db2 Big SQL I/O engine uses the information from the scheduler to do the actual reading and writing of the data.
Note: The Db2 Big SQL Scheduler must not be confused with the built-in Db2 Administrative Task Scheduler. There is no relationship.

Scheduler cache

To improve performance, the Db2 Big SQL scheduler caches the responses generated by the Db2 Big SQL Hive metastore, which by default remains cached for 20 minutes. The cache time can be adjusted by setting the value of the scheduler.tableMetaDataCache.timeToLive parameter to an integer representing the required cache time, in milliseconds. The cache time can also be manipulated by changing the expiration policy specified by the scheduler.tableMetaDataCache.expirePolicy parameter. The default policy is expireAfterAccess, which resets the timer every time a cache item is accessed before it expires, allowing for the lifespan of cache items to be extended when being frequently accessed. The alternative policy is expireAfterWrite, where the timer begins counting down after the cache items are first written into the cache. You can check the current value of these parameters using the GET_Datalake_CONFIG table function:
select * from table(syshadoop.get_datalake_config('BIGSQL'));
There might be cases where you need to disable the scheduler cache, such as scenarios where you need synchrony between metadata in the scheduler and metastore. If you wish to disable the scheduler cache, you can do so by setting scheduler.tableMetaDataCache.timeToLive to 0, which can be done with the SET_Datalake_CONFIG stored procedure by running the following command:
call syshadoop.set_datalake_config('BIGSQL', 'scheduler.tableMetaDataCache.timeToLive', '0');
If you want to selectively disable the scheduler cache for specific tables, you can disable the scheduler cache based on a table property. Use the ALTER Datalake TABLE statement to disable the scheduler cache for a particular table. Run the following example statement:
ALTER Datalake TABLE schema1.table1 SET TBLPROPERTIES('scheduler.cache.exclusion'='true');
This operation takes effect immediately.

Error logs

When the scheduler receives or generates an exception, an SQL5105N error is produced with the name of the component that issued the exception. The error includes a log entry identifier, a three-part name that enables you to identify to which file to refer for a particular error message. The identifier consists of three parts as in the following example: SCL-NNN-XXXYYYZZZ. You can use the SYSHADOOP.LOG_ENTRY table function to get log file information for a particular log entry identifier. Alternatively, you can manually search the $BIGSQL_DIST_VAR/logs/bigsql-sched.log file.

If the Db2 Big SQL I/O engine invokes the Db2 Big SQL scheduler and receives an exception, the log entry identifier might be specific to the Db2 Big SQL I/O engine. Therefore, you might need to search the logs that are specific to the readers and writers.

For more information about log entry identifiers and the locations of the log files, see Db2 Big SQL log files.

Scheduler configuration

The following properties are the core properties that must be defined in the scheduler configuration file. You can view the current property values by using the GET_Datalake_CONFIG table function, or change their value by using the SET_Datalake_CONFIG stored procedure:
scheduler.java.opts
The Java options for the scheduler JVM. You can specify any JVM option. The following example sets the starting memory to 512 MB and the maximum memory to 2 GB:
call syshadoop.set_datalake_config('BIGSQL', 'scheduler.java.opts', '-Xms512M -Xmx2G');
In addition, the scheduler has the following timeout configuration properties that a user can tune:
scheduler.client.request.timeout
Throughout normal workload execution, the Db2 engine components in Db2 Big SQL make requests to the scheduler and then wait for a response. Because the scheduler is running as a different service than the Db2 engine, there is a timeout that enables the Db2 engine to exit early and fail if the scheduler is taking too long to respond. The DDL processor component also uses this property when sending metadata refresh requests to the scheduler. The default value of this property is 120 seconds.
scheduler.client.request.IUDEnd.timeout
This property is an identical concept to the scheduler.client.request.timeout property, except that this property is uniquely for INSERT, or UPDATE, or DELETE (IUD) queries only. Due to the nature of IUD workloads, the scheduler might take somewhat longer for these types of operations. Therefore, this configuration property provides more granular timeout control depending on the type of query. In other words, it is possible to give more time for IUD queries with respect to scheduler.client.request.timeout, without having to influence the existing timeouts for non-IUD queries. The default value of this property is 600 seconds.
Note: After changing the scheduler timeout properties, it is important to restart the entire Db2 service and not just the Db2 Big SQL scheduler component, as these properties are client-level properties and must be propagated to all of the worker nodes.