Submitting Spark application from Console
You can submit a Spark application that is written in Python, Java, or Scala language from the watsonx.data Console.
It is possible to specify the Spark application details, version to be considered, hardware and volume details, Spark environment properties and Spark configurations in the watsonx.data Console.Applies to :
Spark engine
Apache Gluten accelerated Spark engine
watsonx.data on IBM Software Hub
Before you begin
Procedure
- Log in to the watsonx.data cluster. Go to the Infrastructure manager page.
- Click the name of Spark engine (either from list or topology view). Engine information window opens.
- In the Applications tab, click the Create
application button. The Submit Spark application page opens.
In this page, you can select one of the following tabs based on the method by which you need to submit the Spark application.
- Select the Inputs tab. Configure the following details:
Field Description Application type You have the following option: - Python: If your Spark application is written in Python language, select this option.
- Java or Scala: If your Spark application is written in Java or Scala language, select this option.
Application path Specify the path to your application. This is a mandatory field. Your application must be available in a storage or a mounted IBM Software Hub storage volume.
Examples3a://<application-bucket-name>/iceberg.pyApplication name Specify a name for the application. Arguments Use the Add argument button to specify all arguments required by the application. Spark version Enter the Spark version for running your application. If you are using watsonx.data Spark engine, see Supported Spark versions and if you are using Apache gluten engine, see Supported Spark versions. Spark configuration properties Specify the Spark properties in the form of key-value pair ("<property_name>": "<property_value>") separated by comma. For more information about the different properties, see Properties. Spark environment properties Specify the Spark environment properties as key=value pairs. For more information about the different properties, see Environment properties Hardware configuration Specify the number of CPU cores (Driver and Executor) and memory that is required for the workload. Volume
Specify the details of IBM Storage Hub volume that should be mounted for your Spark application. Click Add volume link to add details of each volume.
Mount Path
The Spark application nodes where the volume is mounted. The files in the volume, which is referenced in your application script must have path relative to the Mount path.
Source sub path
To mount only a specific directory in the volume, specify the path here.
Dependencies
Specify the path to files and names of packages required by your application script or jar.
Import from payload
Click this link to automatically import and furnish all fields under the Inputs tab if you have already specified the payload in the Payload tab.
- Select the Payload tab.
In the Application payload field, specify the application payload JSON that can be accepted by the Spark engine application creation REST API endpoint. You can either manually write the payload here or click the Import from inputs link to automatically build the JSON from the details provided in the Inputs tab.
Sample payload when your Spark application resides in IBM Software Hub storage volume. When using the v2 API, set the
<api_version>parameter tov2; for thev3API, set it tov3.Important:Depending on your Spark application scenario, different cURL commands may be required. For more information about the supported scenarios and commands, see Submitting Spark application by using REST API.
curl --request POST \ --url https://<cpd_host_name>/lakehouse/api/<api_version>/spark_engines/<spark_engine_id>/applications \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --header 'LhInstanceId: <instance_id>' \ --data '{ "application_details": { "application": "/myapp/<python file name>" }, "volumes": [ { "name": "cpd-instance::my-vol-1", "mount_path": "/myapp" } ] }'
- Select the Inputs tab. Configure the following details:
- Click Submit application. The application is successfully submitted and gets listed under the Applications tab.