Integrating with watsonx.data on Cloud Pak for Data (Analytics Engine powered by Apache Spark)

You can integrate Analytics Engine powered by Apache Spark with watsonx.data on Cloud Pak for Data.

Before you begin

Before you can configure an Analytics Engine powered by Apache Spark instance for watsonx.data, you must:

Configuring an Analytics Engine powered by Apache Spark instance for watsonx.data

To configure an Analytics Engine powered by Apache Spark instance for watsonx.data

  1. Configure your Analytics Engine powered by Apache Spark instance with your watsonx.data instance:

    1. Generate an access token to set the Analytics Engine powered by Apache Spark instance default configuration. See Generating an API authorization token.

    2. Run the API to set instance default configuration:

      curl -X PATCH --location --header "Authorization: ZenApiKey ${TOKEN}" --header "Accept: application/json" --header "Content-Type: application/merge-patch+json" --data '{
      <CONFIGURATION_DETAILS>
      }' "<https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/default_configs"
      
  2. CONFIGURATION_DETAILS: Copy the following configuration details and substitute the following values:

    • <hms-thrift-endpoint-from-watsonx.data>: Obtain the HMS endpoint from the watsonx.data instance.
    • <hms-user-from-watsonx.data>: Cloud Pak for Data cluster admin user or a user with metastore admin access. To grant metastore admin access to a user, see Managing access to the Hive Metastore.
    • <hms-password-from-watsonx.data>: Cloud Pak for Data cluster admin password.
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.driver.extraClassPath": "/opt/ibm/connectors/iceberg-lakehouse/iceberg-3.3.2-1.2.1-hms-4.0.0-shaded.jar",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "thrift://<hms-thrift-endpoint-from-watsonx.data>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<hms-user-from-watsonx.data>", #(for example, admin or metastoreadmin)
    "spark.hive.metastore.client.plain.password": "<hms-password-from-watsonx.data>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path" : "file:///opt/ibm/jdk/lib/security/cacerts"
    "spark.hive.metastore.truststore.password" : "changeit"
    }
    

Learn more

Parent topic: Configuring an Analytics Engine powered by Apache Spark instance for watsonx.data