Integrating with watsonx.data stand-alone (Analytics Engine powered by Apache Spark)

You can integrate Analytics Engine powered by Apache Spark with watsonx.data.

Before you begin

Before you can configure an Analytics Engine powered by Spark instance for watsonx.data, you must:

Procedure

To configure an Analytics Engine powered by Spark instance for watsonx.data:

  1. Provision a Cloud Pak for Data storage volume to persist SSL/TLS certificates for the watsonx.data HMS (Hive Metastore server). For more information, see Managing storage volumes.

    1. Create a nodeport service. See Accessing Hive Metastore (HMS) using NodePort.
    2. Import a self-signed certificate. See Importing self-signed certificates from a Hive metastore server to a Java truststore.
    3. Upload the truststore.jks file into the storage volume in Cloud Pak for Data.
  2. Generate an access token to set the Analytics Engine powered by Apache Spark instance default configuration. See Generating an API authorization token.

  3. Run the API to set instance default configuration:

    c. curl -X PATCH --location --header "Authorization: ZenApiKey {ACCESS_TOKEN}" --header "Accept: application/json" --header "Content-Type: application/merge-patch+json" --data '{
    d. <CONFIGURATION_DETAILS>
    }' "<https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/default_configs"
    
  4. CONFIGURATION_DETAILS: Copy the following configuration details and substitute the following values:

    • <infrastructure_node_ip>:<nodeport>: Obtain the Infrastructure Node for the watsonx.data instance.
    • <hms-user-from-watsonx.data>: Cloud Pak for Data cluster admin user or a user with metastore admin access. To grant metastore admin access to a user, see Managing access to the Hive Metastore.
    • <hms-password-from-watsonx.data>: Cloud Pak for Data cluster admin password.
    • <truststore-password>: HMS truststore.jks password that you obtained in Step 2.
    • <watsonx-data-hms-certificate-volume>: Provide the name of the volume you created in step 1.
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.driver.extraClassPath": "/opt/ibm/connectors/iceberg-lakehouse/iceberg-3.3.2-1.2.1-hms-4.0.0-shaded.jar",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "thrift://<infrastructure_node_ip>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<hms-user-from-watsonx.data>", # (for example, admin or metastoreadmin)
    "spark.hive.metastore.client.plain.password": "<hms-password-from-watsonx.data>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///<watsonx-data-hms-certificate-volume>/truststore.jks",
    "spark.hive.metastore.truststore.password": "<truststore-password>"
    }
    

Learn more

Parent topic: Configuring an Analytics Engine powered by Apache Spark instance for watsonx.data