IBM Support

Technical Service Bulletin (2022-640) - Apache Hive job fails with large partitioned tables

Flashes (Alerts)


Abstract

Queries against large partitioned tables may encounter a “TTransportException: MaxMessageSize reached” exception from the Apache Thrift library during the query compilation phase.

Content

This is likely to happen if the table has thousands of partitions and the query either does not contain partition pruning filter conditions or has such filters, but a large number of partitions are selected by the filters.
This is fixed by HIVE-26633

Components affected: 

  • Apache Hive

Products affected: 

  • Cloudera Data Platform (CDP) Private Cloud Base 7.1.8

Releases affected: 

  • CDP Private Cloud Base 7.1.8

Severity:

  • High

Impact: 

  • Users upgrading from a lower version of CDP Private Cloud Base to 7.1.8 may experience a regression due to failure of certain queries against large partitioned tables.

Action required:

Option 1 - Workaround:
  • A workaround is to modify the query by adding partition filters that select fewer partitions. 
Option 2 - Hotfix:
  • Apply 7.1.8 Cumulative Hotfix (CHF) 2 on the CDP Private Cloud Base 7.1.8 installation . 
  • CHF 2 introduces a new configuration option hive.thrift.client.max.message.size . This option sets the Apache Thrift client configuration for maximum message size. Its default value is 1 GB. If necessary, this value can be increased up to a maximum of 2147483648 bytes (or 2GB).

Addressed in release/refresh/patch:   

  • CDP Private Cloud Base 7.1.8 CHF 2

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSGPXR","label":"Cloudera Data Platform Private Cloud Plus Add-on with IBM"},"ARM Category":[{"code":"a8m3p0000006ws8AAA","label":"OpenSource DB-\u003ECloudera"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS4LSK","label":"Cloudera Data Platform Private Cloud Base with IBM"},"ARM Category":[{"code":"a8m3p0000006ws8AAA","label":"OpenSource DB-\u003ECloudera"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
27 December 2022

UID

ibm16851713