You can submit a Spark application that is written in Python, Java, or Scala
language from the watsonx.data Console.
It is possible to specify the Spark application details, version
to be considered, hardware and volume details, Spark environment properties and Spark configurations
in the
watsonx.data
Console.
watsonx.data on IBM Software
Hub
Before you begin
Your Spark application must be available in an accessible storage.
Procedure
- Log in to the watsonx.data cluster. Go to the Infrastructure manager page.
- Click the name of Spark engine (either from list or topology view). Engine information
window opens.
- In the Applications tab, click the Create
application button. The Submit Spark application page opens.
In this page, you can select one of the following tabs based on the method by which you need to
submit the Spark application.
- Select the Inputs tab. Configure the following details:
Field |
Description |
Application type |
You have the following option:
- Python: If your Spark application is written in Python language, select
this option.
- Java or Scala: If your Spark application is written in Java or Scala
language, select this option.
|
Application path |
Specify the path to your application. This is a mandatory field. Your application must be
available in a storage or a mounted IBM Software Hub storage volume.
Example
s3a://<application-bucket-name>/iceberg.py |
Application name |
Specify a name for the application. |
Arguments |
Use the Add argument button to specify all arguments required by the
application. |
Spark version |
Enter the Spark version for running your application. Spark 3.4 and 3.5 are the versions
available. |
Spark configuration properties |
Specify the Spark properties in the form of key-value pair ("<property_name>":
"<property_value>") separated by comma. For more information about the different properties, see
Properties. |
Spark environment properties |
Specify the Spark environment properties as key=value pairs. For more information about the
different properties, see Environment properties |
Hardware configuration |
Specify the number of CPU cores (Driver and Executor) and memory that is required for the
workload. |
Volume |
Specify the details of IBM Storage Hub volume that should be mounted for your Spark
application. Click Add volume link to add details of each volume. |
Mount Path |
The Spark application nodes where the volume is mounted. The files in the volume, which is
referenced in your application script must have path relative to the Mount path.
|
Source sub path |
To mount only a specific directory in the volume, specify the path here. |
Dependencies |
Specify the path to files and names of packages required by your application script or jar.
|
Import from payload |
Click this link to automatically import and furnish all fields under the
Inputs tab if you have already specified the payload in the
Payload tab. |
- Select the Payload tab.
In the Application
payload field, specify the application payload JSON that can be accepted by the Spark
engine application creation REST API endpoint. You can either manually write the payload here or
click the Import from inputs link to automatically build the JSON from the
details provided in the Inputs tab.
- Click Submit application. The application is successfully
submitted and gets listed under the Applications tab.