How to diagnose and correct LOAD HADOOP problems
The Db2® Big SQL LOAD HADOOP statement is a powerful tool to load data from relational databases or delimited files that are in an external or local file systems. This guide to diagnosing and correcting problems that you might encounter while using the LOAD statement can help you to further expand your use of LOAD.
For detailed information about this statement, see LOAD HADOOP statement.
Finding error information in the bigsql.log file
When the LOAD statement fails, the SQL message that is returned can contain a Log Entry Identifier, such as [BSL-n-xxxxxxx]. This identifier marks the location in the bigsql.log file where more error information can be found. The n position in the identifier, is the Db2 Big SQL node where the file exists. The default local file location is /var/ibm/bigsql/logs/bigsql.log.
SELECT * FROM TABLE(SYSHADOOP.LOG_ENTRY('BSL-0-22b40517b'));
Finding error information in the logs of MapReduce jobs
The mapreduce-server is the node where the MapReduce History server is installed.
- Application Master
- The MapReduce Job xxxx page contains a section called Application Master. In
this section, select the link to logs. Expand the Click
Here link to see the entire file.
This log file might contain errors when starting, committing, or cleaning up the job. It can also have Out of Memory or other resource errors.
- The MapReduce Job xxxx page contains a section with column Task Type. Select
the map link, which shows the map tasks that were run. Select each
map task link to see the task attempts. Each task attempt contains a
logs link. Open the log and expand by selecting the Click
Here link to see the entire file.
These log files might have rejected record information, problems related to accessing the source data, or problems related to writing to the target table.
- The MapReduce Job xxxx page contains a section with column Task Type.
Select the reduce link, which shows the reduce tasks that were run. Not all
LOAD jobs have reduce tasks, so there might not be any tasks listed.
Select each reduce task link to see the task attempts. Each task attempt contains a logs link. Open the log and expand by selecting the Click Here link to see the entire file.
These log files might have error information about writing to the partitions of the target table.
LOAD might complete successfully and return SQL CODE 5108, but some of the rows were rejected. LOAD might stop and fail with SQL CODE 5109, if the number of rejected records exceeds the maximum set with option max.rejected.records.
- Mismatch in source and target data types
- You might need to use a different data type for your target table.
- For a database table source, you can CAST the source column to the intended target data type.
- Consult the JDBC driver documentation for the type of database source to determine the native data type mapping to JDBC.
- Null value for a column which does not allow null values
- For a file source, you can use option rejected.records.dir to write the rejected records to a file. You can then inspect the rejected records file, make the needed corrections, and retry the LOAD statement.
- Overflow of decimal value
- You might need to specify the target column with a different precision and scale to match the source data.
- You can use the decimal.overvlow option to round values to match the target precision and scale.
- Multi-row source file
- If the source file contains records that span multiple lines, then you must use the allow.multiline.record option set to a true value.
- Incorrect field delimiter which can cause a record to not get parsed
- Check the source file and verify that it contains the same field delimiter that you specified in the field.delimiter option.
- If the field delimiter is a non-printable character, then you can specify it as an escaped character (\t) or the octal value (\001).
Source file data corrections
- Column names on the top line.
- Too many fields.
- Fields in the wrong order.
- Multi-line records.
- Non-standard date or timestamp formats.
- Decimal values that overflow the target precision or scale.
- Place holder values for NULL.
- date.time.format - useful for date and timestamp formats.
- replace.with.null and replace.string.with.null - placeholder strings for NULL values.
- skip.lines.count - skip the top lines that do not have data.
- allow.multiline.record - allows records to span multiple lines in the file.
- decimal.overflow - round decimal values to fit the target precision and scale.
- field.indexes - reorder the fields in the source file to match the target column.
- ignore.extra.fields - useful for when the number of source fields do not match the number of target columns.