Big SQL readers and writers

The Big SQL readers and writers comprise the Big SQL layer that serves as the interface between the relational engine and the external storage layer. To support all data formats that are commonly used in Hadoop, Big SQL has two external table interfaces: one uses native readers, such as C++, and one uses Java™.

The C++ interface offers better performance but can handle only a subset of data formats, including SEQUENCEFILE, RCFILE, AVRO, PARQUET, and TEXTFILE. However, by using the Java I/O interface, Big SQL can read and write data that is encoded in all formats that are handled by Hive. Formats such as ORC and user-defined formats that require custom Hive SerDes are handled through the Java interface.

The Java I/O interface is used in the following scenarios:
  • When the ORC data file format is being used
  • For data file formats that require a custom Hive SerDes
  • With the Analyze utility
  • With table columns that are defined with complex types (ARRAY, STRUCT, MAP). In this case, the ORC file format is recommended for optimal performance.
  • With table columns that are defined as DATE STORED AS DATE. In this case, the ORC file format is recommended for optimal performance.
  • With table columns that are defined as BINARY or VARBINARY. In this case, the ORC file format is recommended for optimal performance.
  • With tables that are partitioned on an expression (partition-expression) that evaluates to a DATE value
  • When object storage is being used with Big SQL tables
  • When Big SQL is used with data that is stored in HBase tables

The readers and writers are independent services that run in the fenced mode processes (FMPs) of a database. The Big SQL I/O libraries are installed as part of IBM Big SQL.