Adding a Spark engine

Using IBM® watsonx.data, you can add Spark engines.

You can either provision native Spark engine or register external Spark engine. Native Spark engine is a compute engine that resides within watsonx.data. External Spark engines are engines that exist in a different environment from where watsonx.data is available.

watsonx.data on IBM Software Hub

About this task

To add a Spark engine, complete the following steps.

Procedure

  1. Log in to watsonx.data console.
  2. From the navigation menu, select Infrastructure Manager.
  3. To add a Spark engine, click Add component and click Next.
  4. In the Add component page, from the Engines section, select IBM Spark.
  5. In the Add component - IBM Spark page, configure the following details:
    Important: Co-located and self-managed Spark engines are deprecated in the 2.0.2 release and will not be available from the 2.0.3 release onwards. Use native Spark engine for Spark use cases. To start using native Spark engine, see Native Spark engine.
    Field Description
    Display name Enter your compute engine name.
    Registration mode Based on your requirement, you can select one of the following options:
    • Create a native Spark engine : The native Spark engine is a compute engine that resides within watsonx.data. If you select this option, see Provisioning native Spark engine to provision the native Spark engine.
    • Register an external Spark engine : The Spark and watsonx.data instances are located in different clusters. For example, your Spark instance is provisioned on IBM Cloud, and watsonx.data is installed on your computer.
    • Register a co-located Spark engine (deprecated) : The Spark and watsonx.data instances are located in the same cluster.
    Instance If you selected the Register a co-located Spark engine (deprecated) as the Registration mode , select the Spark instance (that is colocated with watsonx.data) from the list. Click Create one to create an instance if you do not have one.
    Management method If you selected Register an external Spark engine as the Registration mode , select the appropriate management method:
    • Fully-managed: Indicates that the Spark instance is owned and managed by IBM Cloud.
    • Self-managed(deprecated): Indicates that the instance is an IBM Analytics Engine Spark on Cloud Pak for Data cluster.
    Instance API endpoint If you selected the Registration mode as Register an external Spark engine and Management method as Fully-managed, enter the IBM Analytics engine instance endpoint. For more information, see Retrieving service endpoints.
    API key If you selected the Registration mode as Register an external Spark engine and Management method as Fully-managed, enter the API key.
    Spark jobs V4 endpoint If you selected the Registration mode as Register an external Spark engine and Management method as Self-managed (deprecated), enter the self-managed IBM Analytics engine endpoint details.
    ZenApiKey If you selected the Registration mode as Register an external Spark engine and Management method as Self-managed (deprecated), enter the self-managed API details.
  6. Click Create. The engine is provisioned and is displayed in the Infrastructure Manager page.