Using an external metastore for Spark SQL
Spark SQL uses a Hive metastore to store any metadata related to tables or columns, or any partition information used by your applications. By default, the database that powers this metastore is an embedded Derby instance that comes with the Spark cluster. You can externalize this metastore database to an IBM Cloud Data Engine (formerly SQL Query) instance.
Placing your metadata outside of the Spark cluster will enable you to reference the tables in different applications across your IBM Analytics Engine powered by Apache Spark. This helps persisting data and metadata and allows you to work with this data seamlessly across different Spark workloads.
Enabling Analytics Engine powered by Apache Spark to access external metastore
To enable access to an external metastore:
- Create an IBM Cloud Data Engine (previously SQL Query) instance to store the metadata.
- Configure IBM Analytics Engine powered by Apache Spark to work with the database instance.
- Create a table in one Spark application and then access this table from another Spark application.
Parent topic: Getting started with Spark applications