Big SQL memory error during LOAD

Big SQL generates a memory error during a LOAD operation when the size of the input data exceeds the memory available for the LOAD.

Symptoms

Big SQL LOAD a memory exception error: OutOfMemoryError.

Resolving the problem

The amount of memory required by a load is dependent on multiple factors such as: the number of partitions in the table, the size of the input data set, and the number map and reduce jobs (that are either specified by LOAD statement, or concluded from number of splits in the data source). To get around this problem, break up your input data set into two or more sets and re-LOAD.

You can inspect the following property values to see how memory can be further optimized:

YARN containers memory set in Ambari > YARN > Configs
Map task memory: mapreduce.map.java.opts
Heap allocated for Hadoop: HADOOP_HEAPSIZE
Memory allocated for data buffered during read operation: io.file.buffer.size
File sort buffer memory: mapreduce.task.io.sort.mb