Querying Cloudera tables using Presto engine

After setting up the Cloudera integration, you can query Hive tables stored in Cloudera HDFS from watsonx.data using the Presto engine. This guide covers query operations and examples.

Before you begin

For information about setting up the integration, see Setting up Cloudera integration with Presto engine.

Ensure that you have completed the setup process:

  • HDFS storage component is configured in watsonx.data
  • Catalog is created and associated with Presto engine
  • Tables are created in Cloudera Hue editor

Procedure

After creating tables in Cloudera, they will be automatically available in watsonx.data once the catalog is configured.

  1. Navigate to Query Workspace in watsonx.data.
  2. Select your configured catalog.
  3. Expand the schema to see available tables.
  4. Query all schemas from the catalog:
    SHOW SCHEMAS FROM <catalog_name>;

    Example output:

    Schema
    --------------------------
    default
    employee_db
    sales_db
    information_schema
  5. Query tables in a specific schema:
    SHOW TABLES FROM <catalog_name>.<schema_name>;

    Example output:

    Table
    -----------------
    employee
    department
    salary_history
  6. View the structure of a table:
    DESCRIBE <catalog_name>.<schema_name>.<table_name>;

    Example output:

    Column       | Type                  | Extra | Comment
    -------------|-----------------------|-------|--------
    id           | integer               |       |
    name         | varchar               |       |
    department   | varchar               |       |
    salary       | decimal(10,2)         |       |
  7. Query data from Hive tables:
    SELECT * FROM <catalog_name>.<schema_name>.<table_name> LIMIT 10;

    Example output:

    +---------------+--------------+------------+-----------------+
    | id            | name         | department | salary          |
    +---------------+--------------+------------+-----------------+
    | 1             | John Doe     | IT         | 75000.00        |
    | 2             | Jane Smith   | HR         | 65000.00        |
    | 3             | Bob Johnson  | Finance    | 80000.00        |
    +---------------+--------------+------------+-----------------+
  8. Count rows in a table:
    SELECT COUNT(*) FROM <catalog_name>.<schema_name>.<table_name>;
  9. Perform complex queries with filtering and aggregation:
    SELECT
        department,
        COUNT(*) as employee_count,
        AVG(salary) as avg_salary
    FROM <catalog_name>.<schema_name>.<table_name>
    WHERE salary > 60000
    GROUP BY department
    ORDER BY avg_salary DESC;

Results

You can now query Hive tables from Cloudera HDFS using Presto. The queries execute directly on the data in HDFS without copying data into watsonx.data.

What to do next

Related information