Configuring an Analytics Engine
You can configure IBM Analytics Engine instance to connect to the IBM® watsonx.data instance by setting watsonx.data configurations and Spark related configuration as the default configuration for the IBM Analytics Engine instance.
Prerequisites
Ensure that the following instances are up and running:- watsonx.data instance
- Analytics Engine serverless instance
Ensure that you have Admin or Metastore Admin privilege to submit spark jobs to watsonx.data. For more information, see Managing access to metastore.
Ensure to associate a bucket with Hive Metastore (HMS). For more information, see Adding a bucket-catalog pair.
Configuring an Analytics Engine instance by using IBM Cloud console
- Log in to your IBM Cloud account.
- Access the IBM Cloud Resource list.
- Search your Analytics Engine instance and click the instance to see the details.
- Click to view the configuration.
- In the Default Spark configuration section, click Edit.
- Add the following configuration to the Default Spark configuration
section.
{ "spark.sql.catalogImplementation": "hive", "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.sql.iceberg.vectorization.enabled": "false", "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.lakehouse.type": "hive", "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>", "spark.hive.metastore.client.auth.mode": "PLAIN", "spark.hive.metastore.client.plain.username": "<metastore_admin_user>", "spark.hive.metastore.client.plain.password": "<metastore_admin_password>", "spark.hive.metastore.use.SSL": "true", "spark.hive.metastore.truststore.type": "JKS", "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks", "spark.hive.metastore.truststore.password": "<trustsore_password>" }Parameters:
<public_IP_address>:<nodeport>- public_IP address and nodeport for the watsonx.data instance.<metastore_admin_user>- watsonx.data cluster metastore-admin user.To submit spark jobs to watsonx.data , you must have Admin or Metastore admin privilege in watsonx.data. The metastore administrator role is preferred. For more information, see Managing access to metastore.
<metastore_admin_password>- watsonx.data cluster metastore-admin password.<trustsore_password>-truststore.jkspassword.
Configuring an Analytics Engine instance by using Analytics Engine API
- Generate an IAM token to connect to the Analytics Engine API. For more information about how to generate an IAM token, see IAM token.
- Run the API to set instance default
configuration:
Parameters:curl -X PATCH --location --header "Authorization: Bearer {IAM_TOKEN}" --header "Accept: application/json" --header "Content-Type: application/merge-patch+json" --data '{ <CONFIGURATION_DETAILS> }' "{BASE_URL}/v3/analytics_engines/{INSTANCE_ID/default_configs"IAM_TOKEN- The API token generated for the Analytics Engine APIINSTANCE_ID- The Analytics Engine instance ID. For more information, see Obtaining the service endpoints.CONFIGURATION_DETAILS{ "spark.sql.catalogImplementation": "hive", "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.sql.iceberg.vectorization.enabled": "false", "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.lakehouse.type": "hive", "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>", "spark.hive.metastore.client.auth.mode": "PLAIN", "spark.hive.metastore.client.plain.username": "<metastore_admin_user>", "spark.hive.metastore.client.plain.password": "<metastore_admin_password>", "spark.hive.metastore.use.SSL": "true", "spark.hive.metastore.truststore.type": "JKS", "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks", "spark.hive.metastore.truststore.password": "<trustsore_password>" }
Configuring an Analytics Engine instance by using Analytics Engine CLI
To specify the configuration settings for your IBM Analytics Engine instance from CLI, complete the following steps:
ibmcloud analytics-engine-v3 instance default-configs-update [--id INSTANCE_ID] --body BODY Parameters:-
INSTANCE_ID- The Analytics Engine instance ID. For more information, see Obtaining the service endpoints. BODY- Copy and paste the following configuration information:{ "spark.sql.catalogImplementation": "hive", "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.sql.iceberg.vectorization.enabled": "false", "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.lakehouse.type": "hive", "spark.sql.catalog.lakehouse.uri": "<public_IP_address>:<nodeport>", "spark.hive.metastore.client.auth.mode": "PLAIN", "spark.hive.metastore.client.plain.username": "<metastore_admin_user>", "spark.hive.metastore.client.plain.password": "<metastore_admin_password>", "spark.hive.metastore.use.SSL": "true", "spark.hive.metastore.truststore.type": "JKS", "spark.hive.metastore.truststore.path": "file:///home/spark/shared/user-libs/wxd-library/custom/truststore.jks", "spark.hive.metastore.truststore.password": "<trustsore_password>" }