Creating and inserting Parquet v1 tables in Presto (Java)

This topic describes how to control the Parquet version used by Presto (Java) when ingesting data. By default, Presto (Java) ingests data in Parquet v2 format for Iceberg tables and ORC format for Hive tables. However, this behavior of Presto (Java) poses a challenge if you have to read the tables from Presto (C++), as Presto (C++) can read tables in Parquet v1 format only.

watsonx.data on IBM Software Hub

About this task

Presto (C++) engines cannot read Parquet v2 tables created by Presto (Java). Presto (C++) only supports Parquet v1 format. Therefore, you must set session property <catalog_name>.parquet_writer_version to PARQUET_1_0 before ingesting data with Presto (Java) engine.

Procedure

  1. Log in to Presto (Java) CLI. For instructions to connect to Presto (Java) server from Presto (Java) CLI, see Connecting to a Presto (Java) server.
    bin/presto-cli.sh --catalog <catalog name>
  2. Set the session property before ingesting data to a table:
    set session <catalog_name>.parquet_writer_version = 'PARQUET_1_0';
  3. Run the following command to create a table using CTAS:
    create table <catalog name>.<schema name>.<table name> as (select * from tpch.tiny.customer limit 10);
    Note: Setting the format_version property during CREATE TABLE does not influence the Parquet version. You must set the session property before ingesting data into an existing Parquet v2 table.