Analyze Troubleshooting tips – what to do and look for when things go wrong.
Analyze v2 in Big SQL v4.2 has made tremendous strides in terms of performance, reliability, and capability. Some of these capabilities include:
- ability to run multiple concurrent Analyze jobs
- ability to Analyze hundreds of columns at a time
- ability to preserve statistics when Analyzing new columns only
- ability to Analyze only a sample of a table
- ability to automatically Analyze tables
Analyze v2 achieved some of these by shedding reliance on Hive, MapReduce and YARN; making it more versatile. However, with added capabilities and versatility come added complexity, and as such, problems are likely to arise. Here’s how to do Analyze troubleshooting when the undesired occurs.
Is the Cluster Healthy?
If Analyze is performing slower than normal/expected, or errors are suddenly occurring, then this is the first thing you’ll want to confirm. Analyze performance is directly related to the number of active Big SQL worker nodes in the cluster – more is better. Analyze will still work if a Big SQL worker node or a few are down (albeit with reduced performance). However, Analyze and most other things Big SQL will not work if the head node is down. Verify that this is not the case.
Along with this, confirm that the following servers/nodes are healthy: HDFS, Hive and HBase (when Analyzing HBase tables). Oozie and ZooKeeper are indirectly required by Big SQL/Hadoop (there may be others), so check on those too. Finally, ensure there are no hardware issues, such as failed disks or network problems.
Does the User Have Authorization and Permission?
Users need to have appropriate authorization for executing Analyze. You should also confirm that the executing user has the appropriate permissions against the directories/files of the table’s directory in HDFS.
In some setups, security measures such as Kerberos will revoke write permissions on the table’s directory. In such instances, Analyze may not work with the default settings. To get around this, the biginsights.stats.use.homedir property can be set to true to work around this. With this setting in place, Analyze will write to the executing user’s home directory in HDFS, instead of the table’s directory.
Do SELECTs Work?
The vast majority of Analyze’s work is done in the Java Reader, and this work is a lot like a simple Select query. To quickly check if the problem is indeed with Analyze, do the following:
- ANALYZE TABLE tab1 COMPUTE STATISTICS FOR ALL COLUMNS; ← failing statement
- SET DFS_EXTERNAL_INPUT_LIBRARY = 'JAVA'; ← use Java Reader for queries
- SELECT * FROM tab1 FETCH FIRST 2 ROWS ONLY; ← does this succeed?
- SELECT * FROM tab1 WHERE nonpartcol=missing-value; ← does this succeed?
- SET DFS_EXTERNAL_INPUT_LIBRARY = NULL; ← restore the default Reader
If #3 or #4 fails, the problem is likely not directly due to Analyze. For #4, specify any non-partitioning column (nonpartcol) of the table (tab1) and supply a value that does not exist (missing-value) in that column’s data. The intention here is to scan all the data without returning any results. Note: Using a partitioning column in #4 will cause partition-elimination to be triggered, resulting in the data not being scanned.
What Does the Log Say?
Ring-ding-ding-ding … no, it should never say that. But in the error message of the failed Analyze execution, there should be something like: Log entry identifier: "[BSL-0-493b55a1e]".
BSLmeans Big SQL Log (bigsql.log). Consequently,SCLstands for Scheduler (bigsql-sched.log)-0-means node 0, or head node.-1-is for worker node 1,-2-for worker node 2, etc.493b55a1eis the unique hexadecimal identifier for this error
Log files can be found in $BIGSQL_DIST_VAR/logs/ of the indicated node. See Logs and their Locations for more information. Alternatively, log entries can be retrieved by issuing the following query:
SELECT * FROM TABLE(syshadoop.log_entry('BSL-0-493b55a1e'))
The log entry is usually an ERROR or sometimes WARN message, with a stack trace resembling:
... at com.ibm.biginsights.biga.udf.stats.AnalyzeTool.analyze(AnalyzeTool.java:##) at com.ibm.biginsights.biga.udf.BIGSQL_DDL.performAnalyze(BIGSQL_DDL.java:##) at com.ibm.biginsights.biga.udf.BIGSQL_DDL.doAnalyzeStatement(BIGSQL_DDL.java:##) at com.ibm.biginsights.biga.udf.BIGSQL_DDL.processDDL(BIGSQL_DDL.java:##)
Worker Node Indicator
Sometimes the BSL message shows node 0, but the error actually took place on a worker node. To determine if this is true, look for the following in the stack trace of the error message (unfortunately, this won’t tell you which worker node):
... at com.ibm.db2.jcc.am.ip.execute(Unknown Source) at com.ibm.biginsights.biga.udf.stats.AnalyzeCatalog.exec(AnalyzeCatalog.java:##) at com.ibm.biginsights.biga.udf.stats.AnalyzeDB2.doAnalyze(AnalyzeDB2.java:##) OR .AnalyzeDB2.doPartitionAnalyze(AnalyzeDB2.java:##)
Rollover
Depending on the workload and logging level, log files may rollover. Therefore, the message you are looking for may be in bigsql.log.1, or some other number instead. If/when contacting IBM support for help, you’ll likely see faster resolution if the above (and below) information is provided, as these can be used to quickly identify what and where the issue is.
Out-of-Memory
On rare occasions, the error may indicate an out of memory condition. In such cases, it would be helpful to locate the corresponding heapdump.*.phd file that is generated on the affected node(s), which can be used by IBM support to help diagnose the issue.
Debugging is your Friend
Sometimes the error message indicates an issue that is data dependent. However, it is oftentimes impractical to determine what data in/about the table is causing the issue. If you wish to be ultra-helpful (and I know you do, because you’re such an awesome person), then DEBUG logging usually can help to deduce the issue. With debug logging, a significant amount of Analyze information is written out to the log file, which can help tremendously in identifying issues. To turn on debugging for Analyze on a particular node, append the following to the node’s $BIGSQL_HOME/conf/log4j.properties file:
- log4j.logger.com.ibm.biginsights.biga=DEBUG
- log4j.logger.com.ibm.biginsights.bigsql=DEBUG
Save the file, restart Big SQL and re-execute the failing Analyze statement. The debugging information will then be written out to the bigsql.log file on the node(s). It is generally a good idea to perform debug logging on the head node, in addition to any other worker nodes that are indicated as problematic. Don’t forget to backup the original log4j.properties file(s), and restore it/them when done.
Debugging Without Restarting Big SQL
The logging level can also be changed without having to restart Big SQL, by executing the following against Big SQL:
- /*_!log('com.ibm.biginsights.biga', 'DEBUG')!_*/ create schema foo;
- /*_!log('com.ibm.biginsights.bigsql', 'DEBUG')!_*/ create schema foo;
Don’t worry, create schema foo won’t actually execute. This changes the logging level system-wide, so don’t forget to undo the changes by setting the above back to the default logging level (WARN).
These are the most common things that need to be checked when investigating issues with Analyze. Of course, it doesn’t hurt to also verify that the correct syntax and property settings are being used.
Additional Information
Visit the Knowledge Center for more details on the ANALYZE command.
To get started, download the IBM Open Platform with Apache Hadoop.
Visit the Knowledge Center for installation and configuration information.