Configuring an Analytics Engine

You can configure IBM Analytics Engine instance to connect to the IBM® watsonx.data instance by setting watsonx.data configurations and Spark related configuration as the default configuration for the IBM Analytics Engine instance.

Prerequisites

Ensure that the following instances are up and running:
  • watsonx.data instance
  • Analytics Engine serverless instance

Ensure that you have Admin or Metastore Admin privilege to submit spark jobs to watsonx.data. For more information, see Managing access to metastore.

Ensure to associate a bucket with Hive Metastore (HMS). For more information, see Adding a bucket-catalog pair.

You can configure Analytics Engine instance with default settings in one of the following ways:

Configuring an Analytics Engine instance by using IBM Cloud console

To configure your Analytics Engine instance from the IBM Cloud Resource list, complete the following steps:
  1. Log in to your IBM Cloud account.
  2. Access the IBM Cloud Resource list.
  3. Search your Analytics Engine instance and click the instance to see the details.
  4. Click Manage > Configuration to view the configuration.
  5. In the Default Spark configuration section, click Edit.
  6. Add the following configuration to the Default Spark configuration section.
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<metastore_admin_user>",
    "spark.hive.metastore.client.plain.password": "<metastore_admin_password>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks",
    "spark.hive.metastore.truststore.password": "<trustsore_password>"
    }

    Parameters:

    <public_IP_address>:<nodeport> - public_IP address and nodeport for the watsonx.data instance.

    <metastore_admin_user> - watsonx.data cluster metastore-admin user.

    To submit spark jobs to watsonx.data , you must have Admin or Metastore admin privilege in watsonx.data. The metastore administrator role is preferred. For more information, see Managing access to metastore.

    <metastore_admin_password> - watsonx.data cluster metastore-admin password.

    <trustsore_password> - truststore.jks password.

Configuring an Analytics Engine instance by using Analytics Engine API

To configure your IBM Analytics Engine instance from the Analytics Engine API, complete the following steps:
  1. Generate an IAM token to connect to the Analytics Engine API. For more information about how to generate an IAM token, see IAM token.
  2. Run the API to set instance default configuration:
    curl -X PATCH --location --header "Authorization: Bearer {IAM_TOKEN}" --header "Accept: application/json" --header "Content-Type: application/merge-patch+json" --data '{
    <CONFIGURATION_DETAILS>
    }' "{BASE_URL}/v3/analytics_engines/{INSTANCE_ID/default_configs"
    Parameters:

    IAM_TOKEN - The API token generated for the Analytics Engine API

    INSTANCE_ID - The Analytics Engine instance ID. For more information, see Obtaining the service endpoints.

    CONFIGURATION_DETAILS
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<metastore_admin_user>",
    "spark.hive.metastore.client.plain.password": "<metastore_admin_password>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks",
    "spark.hive.metastore.truststore.password": "<trustsore_password>"
    }

Configuring an Analytics Engine instance by using Analytics Engine CLI

To specify the configuration settings for your IBM Analytics Engine instance from CLI, complete the following steps:

Run the following command:
ibmcloud analytics-engine-v3 instance default-configs-update [--id INSTANCE_ID] --body BODY
Parameters:
  • INSTANCE_ID - The Analytics Engine instance ID. For more information, see Obtaining the service endpoints.

  • BODY - Copy and paste the following configuration information:
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<metastore_admin_user>",
    "spark.hive.metastore.client.plain.password": "<metastore_admin_password>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks",
    "spark.hive.metastore.truststore.password": "<trustsore_password>"
    }