watsonx.data Spark engine

watsonx.data Spark engine is one of the native Spark engines in IBM watsonx.data. It is a powerful engine capable of processing large scale data, transforming data, and executing analytical workloads.

You can use watsonx.data Spark engine to perform big data analytics seamlessly. With native Spark engine, you can fully manage Spark engine configuration, environment variables, parameters, manage access to Spark engines and run applications that involves complex analytical operations by using watsonx.data UI and REST API endpoints.

To provision a watsonx.data Spark engine, see Provisioning.

Required permissions
To create watsonx.data Spark engine, you must have the Admin role.

Supported Spark versions

IBM® watsonx.data supports the following Spark runtime versions to run Spark workloads.
Name Status Release date End-of-support date Supported languages
Apache Spark 3.4.4 Deprecated JUNE 2023 JUNE 2026

Python 3.11

Scala 2.12

Apache Spark 3.5.4 Supported FEB 2025 FEB 2028

Python 3.11

Scala 2.12

Apache Spark 4.0.0 Supported AUG 2025 AUG 2028

Python 3.11

Scala 2.13

The following examples show you sample payloads for submitting Spark runtime for different languages.
  • Payload for submitting a Spark runtime with Python 3.11:

    {"application_details":{"application":"<your application_file_path>","arguments":["<your_application_arguments>"],"conf":{"spark.app.name":"MyRuntime","spark.eventLog.enabled":"true"},"env":{"RUNTIME_PYTHON_ENV":"python311"}}}
  • Payload for submitting a Spark Scala runtime:

    {"application_details":{"application":"/opt/ibm/spark/examples/jars/spark-examples*.jar","arguments":["1"],"class":"org.apache.spark.examples.SparkPi","conf":{"spark.app.name":"MyRuntime","spark.eventLog.enabled":"true","spark.driver.memory":"4G","spark.driver.cores":1,"spark.executor.memory":"4G","spark.executor.cores":1,"ae.spark.executor.count":1},"env":{"SAMPLE_ENV_KEY":"SAMPLE_VALUE"}}}