Configuring Analytics Engine

You can configure IBM Analytics Engine instance to connect to the IBM® watsonx.data instance. Set watsonx.data configurations and Spark related configuration as the default configuration for the IBM Analytics Engine instance.

watsonx.data on IBM Software Hub

You can configure Analytics Engine instance with default settings in one of the following ways:

Configuring an Analytics Engine instance by using IBM Cloud console

To configure your Analytics Engine instance from the IBM Cloud Resource list, complete the following steps:
  1. Log in to your IBM Cloud account.
  2. Access the IBM Cloud Resource list.
  3. Find your Analytics Engine instance and click the instance to see the details.
  4. Click Manage > Configuration to view the configuration.
  5. In the Default Spark configuration section, click Edit.
  6. Add the following configuration to the Default Spark configuration section.
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<metastore_admin_user>",
    "spark.hive.metastore.client.plain.password": "<metastore_admin_password>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks",
    "spark.hive.metastore.truststore.password": "<trustsore_password>"
    }
    Parameter values:

    <public_IP_address>:<nodeport> - public_IP address and nodeport for the watsonx.data instance.

    <metastore_admin_user> - watsonx.data cluster metastore-admin user.

    Note:

    To submit spark jobs to watsonx.data , you must have Admin or Metastore admin privilege in watsonx.data. The preferred one is Metastore admin. For more information, see Managing access to metastore.

    <metastore_admin_password> - watsonx.data cluster metastore-admin password.

    <trustsore_password> - truststore.jks password.

Configuring an Analytics Engine instance by using Analytics Engine API

To configure your IBM Analytics Engine instance from the Analytics Engine API, complete the following steps:
  1. Generate an IAM token to connect to the Analytics Engine API. For more information about how to generate an IAM token, see Granting permissions to users.
  2. Run the API to set instance default configuration:
    curl -X PATCH --location --header "Authorization: Bearer {IAM_TOKEN}" --header "Accept: application/json" --header "Content-Type: application/merge-patch+json" --data '{
    <CONFIGURATION_DETAILS>
    }' "{BASE_URL}/v3/analytics_engines/{INSTANCE_ID/default_configs"
    Parameter values:

    IAM_TOKEN - The API token generated for the Analytics Engine API.

    INSTANCE_ID - The Analytics Engine instance ID. For more information, see Obtaining the service endpoints using the IBM Cloud CLI.

    CONFIGURATION_DETAILS
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<metastore_admin_user>",
    "spark.hive.metastore.client.plain.password": "<metastore_admin_password>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks",
    "spark.hive.metastore.truststore.password": "<trustsore_password>"
    }

Configuring an Analytics Engine instance by using Analytics Engine CLI

To specify the configuration settings for your IBM Analytics Engine instance from CLI, complete the following steps:

Run the following command:
ibmcloud analytics-engine-v3 instance default-configs-update [--id INSTANCE_ID] --body BODY
Parameter values:
  • INSTANCE_ID - The Analytics Engine instance ID. For more information, see Obtaining the service endpoints.

  • BODY - Copy and paste the following configuration information:
    {
    "spark.sql.catalogImplementation": "hive",
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.iceberg.vectorization.enabled": "false",
    "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.lakehouse.type": "hive",
    "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>",
    "spark.hive.metastore.client.auth.mode": "PLAIN",
    "spark.hive.metastore.client.plain.username": "<metastore_admin_user>",
    "spark.hive.metastore.client.plain.password": "<metastore_admin_password>",
    "spark.hive.metastore.use.SSL": "true",
    "spark.hive.metastore.truststore.type": "JKS",
    "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks",
    "spark.hive.metastore.truststore.password": "<trustsore_password>"
    }

After you configure Analytics Engine, you can submit the Spark application. For more information, see Run Spark use case.