Querying Cloudera tables using Presto engine

After setting up the Cloudera integration, you can query Hive tables stored in Cloudera HDFS from watsonx.data using the Presto engine. This guide covers query operations and examples.

Before you begin

For information about setting up the integration, see Setting up Cloudera integration with Presto engine.

Ensure that you have completed the setup process:

HDFS storage component is configured in watsonx.data
Catalog is created and associated with Presto engine
Tables are created in Cloudera Hue editor

Procedure

After creating tables in Cloudera, they will be automatically available in watsonx.data once the catalog is configured.

Navigate to Query Workspace in watsonx.data.
Select your configured catalog.
Expand the schema to see available tables.

Query all schemas from the catalog:

SHOW SCHEMAS FROM <catalog_name>;

Example output:

Schema
--------------------------
default
employee_db
sales_db
information_schema

Query tables in a specific schema:

SHOW TABLES FROM <catalog_name>.<schema_name>;

Example output:

Table
-----------------
employee
department
salary_history

View the structure of a table:

DESCRIBE <catalog_name>.<schema_name>.<table_name>;

Example output:

Column       | Type                  | Extra | Comment
-------------|-----------------------|-------|--------
id           | integer               |       |
name         | varchar               |       |
department   | varchar               |       |
salary       | decimal(10,2)         |       |

Query data from Hive tables:

SELECT * FROM <catalog_name>.<schema_name>.<table_name> LIMIT 10;

Example output:

+---------------+--------------+------------+-----------------+
| id            | name         | department | salary          |
+---------------+--------------+------------+-----------------+
| 1             | John Doe     | IT         | 75000.00        |
| 2             | Jane Smith   | HR         | 65000.00        |
| 3             | Bob Johnson  | Finance    | 80000.00        |
+---------------+--------------+------------+-----------------+

Count rows in a table:

SELECT COUNT(*) FROM <catalog_name>.<schema_name>.<table_name>;

Perform complex queries with filtering and aggregation:

SELECT
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary
FROM <catalog_name>.<schema_name>.<table_name>
WHERE salary > 60000
GROUP BY department
ORDER BY avg_salary DESC;

Results

You can now query Hive tables from Cloudera HDFS using Presto. The queries execute directly on the data in HDFS without copying data into watsonx.data.

What to do next

Related information