Db2 Big SQL scheduler
The scheduler is a service that helps in the execution of queries. It communicates with all of the components in Db2® Big SQL, including the Hive metastore, the HDFS NameNode, the Db2 Big SQL readers and writers, and the Db2 Big SQL engine.
The scheduler provides access to the metadata for Hadoop tables that are involved in a scan. This metadata includes table descriptors with details such as column types, column lengths, partition information, and input format. It evenly distributes the scan of a Hadoop table in your cluster. The scheduler also does partition elimination. It uses system resources efficiently to maximize throughput and minimize response time.
The scheduler assigns splits to readers and commits data insertion. The Db2 Big SQL readers (also called the Db2 Big SQL readers and writers) use the information from the scheduler to do the actual reading of HDFS data.
$BIGSQL_HOME/bin/bigsql status -scheduler
Big SQL is unable to process queries while the scheduler is restarting.
$BIGSQL_HOME/bin/bigsql stop -scheduler $BIGSQL_HOME/bin/bigsql start -scheduler
- The name of the host on which the scheduler is to run. This host is typically the master node. Unless you have a single node environment, the name cannot be localhost.
- The port where the scheduler gets administrator commands such as stop or start. The default value is 7054. You can specify any valid port number.
- The port where the scheduler gets requests for scheduling work. The default value is 7053. You can specify any valid port number.
- The Java options for the scheduler JVM. You can specify
any JVM option. The following example sets the starting memory to 512 MB and the maximum memory to 2
- Every logical Db2 Big SQL worker has to connect to the Db2 Big SQL scheduler at the start of a scan to get a list of HDFS
splits that has to be read for the table involved in the query. The total number of threads used by
the scheduler, configured through the
scheduler.maxWorkerThreadsproperty in bigsql-conf.xml, is 1024 by default. Depending on the complexity of the query, the number of tables and sub-queries, and the number of table scans done, the number of requests from the Db2 Big SQL workers to scheduler can add up. On clusters configured with many logical worker nodes, featuring complex queries running in parallel, the value for the
scheduler.maxWorkerThreadsmight need to be increased beyond the default.The number of threads in use can be monitored by periodically running a command such as the following on the host on which the Db2 Big SQL head node resides:
ps -eLf | grep -c bigsql.scheduler.server
If in doubt, there is no harm in increasing the value of the
scheduler.maxWorkerThreadsproperty by a factor of two or more, as the scheduler will automatically taper down all the way to a minimum of eight threads, as defined in the
scheduler.minWorkerThreadsproperty, when there is no longer any SQL workload running. Doubling the value of the
scheduler.maxWorkerThreadsproperty is a good starting point if you determine the default value is likely to be exceeded.
In addition, the scheduler has the following three timeout configuration properties that you can tune:
- This property controls the duration of time that the scheduler server will wait for a client to connect and send its communication before timing out. The default value of this property is 60 minutes. Typically, the value of this property does not need to be modified.
- Throughout normal workload execution, the Db2 engine components in Db2 Big SQL make requests to the scheduler and then wait for a response. Because the scheduler is running as a different service than the Db2 engine, there is a timeout that enables the Db2 engine to exit early and fail if the scheduler is taking too long to respond. The DDL processor component also uses this property when sending metadata refresh requests to the scheduler. The default value of this property is 120 seconds.
- This property is an identical concept as scheduler.client.request.timeout, except that this property is uniquely for insert/update/delete (IUD) queries only. Due to the nature of IUD workloads, the scheduler might take somewhat longer for these types of operations. Therefore, this configuration property provides more granular timeout control depending on the type of query. In other words, it is possible to give more time for IUD queries with respect to scheduler.client.request.timeout, without having to influence the existing timeouts for non-IUD queries. The default value of this property is 600 seconds.
The scheduler.client.request.timeout and scheduler.client.request.IUDEnd.timeout properties might need to be adjusted when there are many files or many partitions involved. Adjustments might be needed because of the appreciable amount of time it takes to load the scheduler cache with the table metadata, including individual partitions and files metadata. In this case, review the diagnostics in the /var/ibm/bigsql/logs/bigsql-sched-recurring-diag-info.log file on the Db2 Big SQL head node host to help determine an appropriate value for these properties. The bigsql-sched-recurring-diag-info.log file shows various useful diagnostic information, including the time spent in the scheduler for various functions recorded in both a "top (5) max elapsed times:" and also a, "elapsed-time-range-in-millis and frequency-of-calls-in-that-range" histogram. The elapsed times are in milliseconds.
There are two functions in particular that are related to scheduler client timeouts, requestScanMetadata and registerQueryNew. The following example shows the elapsed timings recorded in the bigsql-sched-recurring-diag-info.log file for the requestScanMetadata function:
[requestScanMetadata] top (5) max elapsed times: time= 603484; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:bigsql)] time= 590778; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:bigsql)] time= 448659; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:bigsql)] time= 437044; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:bigsql)] time= 413360; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:bigsql)] elapsed-time-range-in-millis and frequency-of-calls-in-that-range: range= 0-10; freq= 10956 range= 10-100; freq= 5553 range= 100-1000; freq= 165 range= 1000-10000; freq= 48 range= 10000-100000; freq= 21 range= 100000-1000000; freq= 13 range= 1000000-9223372036854775807; freq= 0
Choose values for the scheduler.client.request.IUDEnd.timeout and scheduler.client.request.timeout properties to be greater than the longest elapsed times recorded under the registerQueryNew and requestScanMetadata functions. In the example above, the largest elapsed time recorded is 603484 milliseconds. Therefore, the value chosen for scheduler.client.request.timeout and scheduler.client.request.IUDEnd.timeout is 720000 (12 minutes).
- After changing the schedule timeout properties, it is important to restart the entire Db2 Big SQL service, and not just the Db2 Big SQL scheduler component, as these properties are client-level properties and must be propagated to all the worker nodes.
- It is also important to review how many files and partitions are present in any Hadoop table that is exhibiting a longer than average elapsed time in the registerQueryNew and requestScanMetadata operations, as per the bigsql-sched-recurring-diag-info.log file. Having too many small files in a table is generally discouraged by the Hadoop community. As a general rule of thumb, it is recommended that no individual table have more than a few hundred thousand files or a few thousand partitions.
Reading files in HDFS subdirectories
In this example, with the scheduler.read.subdirectories parameter set to true, both text1.txt and text2.txt are read.
[root@bdsup006 ~]# hadoop fs -ls -R /tmp/testtable/ drwxr-xr-x - bigsql hdfs 0 2015-12-09 22:29 /tmp/testtable/subtab -rw-r--r-- 3 bigsql hdfs 6 2015-12-09 22:29 /tmp/testtable/subtab/text2.txt -rw-r--r-- 3 bigsql hdfs 6 2015-12-09 22:28 /tmp/testtable/text1.txt
If you do not want subdirectories to be read, set this option to false, and restart the Db2 Big SQL service. In this case, only text1.txt is read.
Scheduler cache configuration
- If you require longer or shorter cache times, change the value of the scheduler.tableMetaDataCache.timeToLive parameter to an integer that represents the required cache time, in milliseconds.
- If you require synchrony with the Hive metastore and the HDFS NameNode metadata, disable the scheduler cache by setting the scheduler.tableMetaDataCache.timeToLive parameter to 0.
- If you require some tables to leverage the scheduler cache and other tables to maintain synchrony with the Hive metastore and the HDFS NameNode metadata, you can disable the scheduler cache for the latter.
- You can also manipulate cache times by changing the scheduler.tableMetaDataCache.expirePolicy parameter. The default value is expireAfterAccess, which resets the timer every time a cache item is accessed before it has expired. The alternative is expireAfterWrite, which means that the scheduler.tableMetaDataCache.timeToLive timer starts counting down when cache items are first written into the cache. The expireAfterAccess value is a method for extending the lifespan of metadata in the cache for more frequently accessed tables.
Disabling the scheduler cache for a set of tables
- Disable the scheduler cache based on a regular expression match. The
scheduler.cache.exclusion.regexps property in the Db2 Big SQL configuration file accepts a comma-separated list of regular
expressions. This list must have an even number of elements. Each pair of regular expressions in the
list is matched against table schemas and names, and all matching tables are excluded from the
scheduler cache. For example, to match all table names that have a prefix of
uncached, you could use the regular expression pair
.*,uncached.*, which would match the following tables:
Be sure to restart the scheduler after any configuration change.
- Disable the scheduler cache based on a table property. You can use the ALTER TABLE statement to
disable the scheduler cache for a particular table. For
This operation takes effect immediately.
ALTER TABLE schema1.table1 SET TBLPROPERTIES('scheduler.cache.exclusion'='true');
When the scheduler receives or generates an exception, an error (SQL5105N) is produced with the
name of the component that issued the exception. The error includes a log entry
identifier, a three-part name that enables you to identify which file to refer to for a
particular error message. The identifier consists of three parts:
SCL-NNN-XXXYYYZZZ. You can use the SYSHADOOP.LOG_ENTRY table function to get log file information for a particular log entry
identifier. Alternatively, you can manually search the
If the reader or writer invokes the Db2 Big SQL scheduler and receives an exception, the log entry identifier might be specific to the Db2 Big SQL readers and writers. Therefore, you might need to search the logs that are specific to the readers and writers.
For details about log entry identifiers and the locations of the log files, see Db2 Big SQL log management on CDP.