Connecting to a Presto server

Presto CLI provides a terminal-based interactive shell to run queries. You can connect to the Presto server either through Presto CLI installed as part of the ibm-lh-client package or through Presto CLI installed separately.

To connect to a Presto Server from a client program or CLI, the following items are required:

  • Hostname and port for the Presto server or workstation where the watsonx.data™ developer or software is installed.
  • Certificates served by the Presto server to establish trust.
  • Authorized user credentials to access the Presto server.
Note: It is important to connect to the Presto server that uses its hostname and not its IP address. Because the TLS certificate that is served by the Presto Server is associated with a fully qualified host and domain-name (FQDN). Client programs, typically, cannot establish trust by using the IP address. With OpenShift® Ingress in particular, DNS entries, based on hostnames, play an important role in routing to the intended Kubernetes Service.
Tip: To confirm there is network access from your client workstation that needs to connect a Presto server you can test the access by using one of the following commands:
 curl -ki https://<presto-hostname>:<presto-portnumber>
nc -v <presto-hostname> <presto-portnumber>
  • The Presto server exposes a HTTPs (TCP) port. Therefore, you can use any convenient HTTP or TCP-based utility to ensure that the Presto server can be accessed from your network.
  • When you run the curl command, the server may return an HTTP/1.1 401 Unauthorized response. This is expected as the server is secured by authentication.
  • When you run the nc command, the server returns a success message.

In watsonx.data, you can connect to the Presto server in multiple ways based on the platform and utilities you are using. See the following sections for more details:

Important: You must specify the location when creating schema by using CLI. For example,
location = s3a://<storage-name>/

Using presto-cli executable (remote) - software

This section provides instructions to connect the Presto server in watsonx.data software edition from a remote presto-cli.

Before you get started, ensure the following:
  • The Red Hat® OpenShift Project namespace administrator or cluster administrator has exposed a route to the Presto (Java) server. The administrators should refer to Exposing secure route to Presto server for instructions.

  • You have received the hostname and port (by default 443) associated with the specific Red Hat OpenShift route from the administrator.

  1. Import the certificate of the Presto server to establish trust. You can import the certificate either by using the cert-mgmt utility or manually. Based on your setup, complete one of the following steps:

    Note: Presto servers in watsonx.data use TLS for securing access from clients. Sometimes, the Presto server may be associated with a Self-signed certificate instead of a certificate authority (CA) signed certificate. In such cases, the client programs that connect via the HTTPs (TLS) protocol must be configured to trust the self-signed certificate configured in the Presto Server.
    1. If you are connecting from the watsonx.data utilities, go to bin/cert-mgmt and run the following command:
      bin/cert-mgmt --op=add --host=<hostname of the Presto server> --port=443
      You see the following output:
      /mnt/infra/tls/truststore.jks updated
      	/mnt/infra/tls/cabundle.crt updated
      Note: The ibm-lh-client installation includes utilities and packages to connect to Presto engines in watsonx.data software deployments. The client installation includes the bin/cert-mgmt executable that simplifies fetching the certificate that is used with a Presto server. It stores such certificates in a Java (JKS) format truststore for Java based clients and also in PEM format for use with nonjava clients.

      The presto-cli and presto-run watsonx.data (ibm-lh-client) utilities are Java-based and automatically use this JKS truststore. JDBC programs (such as those run via the bin/dev-sandbox utility) must be configured to explicitly use this JKS truststore. The cabundle.crt must be used by non-Java programs, such as Python scripts.

    2. If you are connecting from programs or utilities that run outside the watsonx.data client installation, do one of the following:

      i. From your client machine, point or copy the truststore or certificate file present in the ibm-lh-client installation directory. The certificates are at ibm-lh-client/localstorage/volumes/infra/tls/. In this location, you can find:

      truststore.jks- can be used for Java/JDBC based applications.

      cabundle.crt - can be used for non-Java applications.

      aliases/ - is a subdirectory with individual certificates. For example, aliases/<Presto server hostname>:443.crt refers to the certificate for a specific Presto server only.

      ii. Manually import certificates into the truststore. For instructions, see Importing self-signed certificates from a Presto server to a Java truststore.

  2. Authenticate with the Presto server.

    ./presto --server https://<route_name> --truststore-path /<directory>/<certfilename/truststore>.jks --truststore-password changeit --catalog iceberg_data --user admin --password
    Note:
    • The default out-of-the-box user for watsonx.data software is ‘admin’ with a generated password that is accessible by the ibm-lakehouse-manage get-cpd-instance-details command after installation.

    • -—truststore-path and -—truststore-password arguments are optional if the certificates were imported into the default JDK/JRE trust store already

    The Administrator of the watsonx.data instance can configure different Identity Providers to authenticate users or grant them appropriate Admin or User Roles. See the “Managing Users” topic for more details.

    In addition to directly using their credentials as username and password properties to connect to the Presto server, users can choose an API Key as well.

  3. When connecting to the watsonx.data Presto servers, the credential properties to set would then be:

    username: username, password: <password>

    Or

    username: ibmlhapikey, password:<base64 of username:apikey>

    Or

    username: ibmlhtoken, password:<platform token> or < instance token >

Using Java/JDBC – software

This section provides instructions to connect to a Presto server from a Java JDBC application.

Before you get started, ensure the following:
  • the Red Hat OpenShift Project namespace administrator or cluster administrator has exposed the route to the Presto server. The administrators should refer to Exposing secure route to Presto server for instructions.

  • you have the received the hostname and port (by default 443) associated with that specific Red Hat OpenShift route from the administrator.

To connect with watsonx.data Presto server from a Java JDBC application, complete the following steps:

  1. If required, complete the instructions in Using presto-cli executable (remote) - software to get the hostname, port, server certificate, authentication, and test the connectivity.

  2. Download and install the presto-jdbc-0.286.jar (or later) on the client machine.

  3. Add presto-jdbc-0.286.jar (or later) to the class path of your Java application.

  4. Create a Java application by using JDBC interface. Following is a sample JDBC code.

    Note:
    • presto_url - Identifies the jdbc URL to the Presto server

    • SSLTrustStorePath and SSLTrustStorePassword properties are set to identify the location of the truststore and its password (if necessary).

    • Values for user and password properties are the watsonx.data software edition username and password.

    import java.sql.*;
    import java.util.Properties;
    
    public class PrestoJdbcSample
    {
    
        public static void main(String[] args) throws Exception
        {
    
            /**
             * example of fetching the location and credentials needed to connect, from environment variables
             **/
            String username = System.getenv("ENG_USERNAME");
            String password = System.getenv("ENG_PASSWORD");
            String hostname = System.getenv("ENG_HOST");
            String portnumber = System.getenv("ENG_PORT");
    
            String presto_url = "jdbc:presto://" + hostname + ":" + portnumber;
    
            String ts_location =  System.getenv("TRUSTSTORE");
                // example: “ibm-lh-client/localstorage/volumes/infra/tls/truststore.jks";
    
            String ts_password =  System.getenv("TRUSTSTORE_PASSWORD");
            // example: "changeit";
    
            Connection connection = null;
            Statement statement = null;
            ResultSet resultSet = null;
    
            try
            {
                /** load the Presto JDBC Driver class **/
                String driverClass = "com.facebook.presto.jdbc.PrestoDriver";
                Class.forName(driverClass);
    
                /** Set the connection properties**/
                Properties properties = new Properties();
                properties.setProperty("user", username);
                properties.setProperty("password", password);
                properties.setProperty("SSL", "true");
    
                /**
                 * identify where the java truststore is located.
                 * skip if the default JDK truststore already has the Presto Server's certificate
                 * or if the Presto server has signed certificates
                 **/
                properties.setProperty("SSLTrustStorePath", ts_location);
                properties.setProperty("SSLTrustStorePassword", ts_password);
    
                /** Connect **/
                connection = DriverManager.getConnection(presto_url, properties);
                /** Issue a Query **/
                String query = "SELECT * FROM tpch.tiny.customer LIMIT 10";
                statement = connection.createStatement();
                resultSet = statement.executeQuery(query);
    
                /** iterate through the results **/
                while (resultSet.next())
                {
                    String phone = resultSet.getString("phone");
                    String name = resultSet.getString("name");
                    System.out.println("phone = " + phone + ", name = " + name);
                }
            }
            catch (Exception e)
            {
                e.printStackTrace();
            }
            finally
            {
                /** clean up at the end always **/
                if (resultSet != null)
                {
                    resultSet.close();
                }
                if (statement != null)
                {
                    statement.close();
                }
                if (connection != null)
                {
                    connection.close();
                }
            }
    
        }
    }
    

Using Python scripts – software

This section provides instructions to connect to a Presto server from a Python client.

Before you get started, ensure the following:
  • the Red Hat OpenShift Project namespace administrator or cluster administrator has exposed the route to the Presto Server. The administrators should refer to Exposing secure route to Presto server for instructions.

  • you have the received the hostname and port (by default 443) associated with that specific Red Hat OpenShift route from the administrator.

To connect with watsonx.data Presto server from a Python client, complete the following steps:

  1. Install python 3.x (3.10 or later recommended) and pip3 on your client workstation.

  2. Install common python modules to connect to Presto.

    pip3 install SQLAlchemy 'pyhive[presto]' presto-python-client
  3. If required, complete the instructions in Using presto-cli executable (remote) - software to get the hostname, port, server certificate, authentication, and test the connectivity.

  4. Start the sandbox container for the registered Presto engine.

    ibm-lh-client/bin/dev-sandbox --engine=demo-b
  5. In the bash prompt, install the prestodb module.

    export HOME=/tmp
    pip3 install SQLAlchemy 'pyhive[presto]' presto-python-client
    Important: Steps 4 and 5 are required only if you are not using the Prestodb module in ibm-lh-client. Also, currently due to an issue with the Prestodb module in ibm-lh-client, you must complete steps 4 and 5.
  6. Use the DBAPI interface to query the Presto server. Following is a sample Python script:

    Note:
    • See how the "ENG_" environment variables are examples of how the relevant Presto server locations can be used with the dbapi.connect function call.

    • An example of how to point to the location of the Presto server’s certificate file is represented by the cert_location variable.

    • The conn._http_session.verify = cert _location property is used to point to the certificate location (for self-signed cases only).

    import os
    import prestodb
    
    username=os.environ["ENG_USERNAME"]
    password=os.environ["ENG_PASSWORD"]
    hostname=os.environ["ENG_HOST"]
    portnumber=os.environ["ENG_PORT"]
    cert_location='./ibm-lh-client/localstorage/volumes/infra/tls/aliases/' + hostname + ':' + portnumber + '.crt'
    
    with prestodb.dbapi.connect(
    host=hostname,
    port=portnumber,
    user=username,
    catalog='tpch',
    schema='tiny',
    http_scheme='https',
    auth=prestodb.auth.BasicAuthentication(username,password)
    ) as conn:
      conn._http_session.verify = cert_location
      cur = conn.cursor()
      cur.execute('select * from tpch.tiny.customer limit 10')
      rows = cur.fetchall()
      print(rows)