Extending analytics using Spark (Analytics Engine powered by Apache Spark)

You can use Analytics Engine powered by Apache Spark as a compute engine to run analytical and machine learning jobs.

Service The IBM Analytics Engine powered by Apache Spark service is not available by default. An administrator must install this service on the IBM Cloud Pak for Data platform. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.

Each time you submit a job, a dedicated Spark cluster is created for the job. You can specify the size of the Spark driver, the size of the executor, and the number of executors for the job. This enables you to achieve predictable and consistent performance.

When a job completes, the cluster is automatically cleaned up so that the resources are available for other jobs. The service also includes interfaces that enable you to analyze the performance of your Spark applications and debug problems.

In IBM Cloud Pak for Data, you can run Spark workloads in two ways:

  • In a notebook that runs in a Spark environment in a project in Watson Studio
  • Outside Watson Studio, in an IBM Analytics Engine powered by Apache Spark instance using Spark job APIs

Spark environments in projects

If you have the Watson Studio service installed, the IBM Analytics Engine powered by Apache Spark service automatically adds a set of default Spark environment templates to projects. You can also create custom Spark environment templates in a project.

You can see Spark environment templates under Templates on the Environments page on the Manage tab of your project.

For more details, see Spark environments.

Spark APIs

If you don't have Watson Studio installed, you can run Spark workloads directly in IBM Analytics Engine powered by Apache Spark using Spark job APIs.

You can run these types of workloads with Spark jobs APIs:

  • Spark applications that run Spark SQL
  • Data transformation jobs
  • Data science jobs
  • Machine learning jobs

See Getting started with Spark applications.

Learn more

Parent topic: Analyzing data and building models