IBM Support

PI92937: SELECT ON PARQUET TABLE MAY FAIL WITH 'COULD NOT READ DATA PAGE BECAUSE PAGE HEADER EXCEEDED MAXIMUM SIZE OF 8.00 MB'

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • A select from a parquet table with large data values may fail
    with
    SQL5105N The statement failed because a Big SQL component
    encountered an
    error. Component receiving the error: "BIGSQL Native IO".
    Component returning
    the error: "BIGSQL Native IO". Log entry identifier:
    "[NRL-003-27b174de ]".
    SQLSTATE=58040
    and the native reader log shows:
    E1205 17:20:29.406167 4611 bi-dfs-reader.cc:494]
    [NRL-003-27b174de ] SQL CODE -5105: Error calling GetNext on
    Scan node (couldn't deserialize thrift msg:
    No more data to read.
    ParquetScanner: could not read data page because page header
    exceeded maximum size of 8.00 MB)
    The issue is that the C++ reader sets the maximum parquet page
    header size to 8 MB, so attempts to read a page with a larger
    header (due to a large string column, for example) will fail.
    The fix is to make the header size configurable, with a default
    size of 8 MB.  This fix adds support for a new config parm that
    must be added to the bigsql-conf.xml file on all nodes, and
    then Big SQL needs to be restarted:
    <property>
        <name>dfsio.max_page_header_size</name>
        <value>8388608</value>
    </property>
    Once this property is present on all nodes, changes made on the
    head node will be propagated to the workers when Big SQL is
    restarted, allowing the maximum value to be set as needed.
    

Local fix

  • Using the Java reader instead will work, although that may have
    performance implications.
    db2 "SET DFS_EXTERNAL_INPUT_LIBRARY = 'JAVA'"
    <run select>
    db2 "SET DFS_EXTERNAL_INPUT_LIBRARY = NULL
    

Problem summary

  • Please see problem description.
    

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    PI92937

  • Reported component name

    INFO BIGINSIGHT

  • Reported component ID

    5725C0900

  • Reported release

    425

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-01-25

  • Closed date

    2020-09-09

  • Last modified date

    2020-09-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"425"}]

Document Information

Modified date:
10 September 2020