Submitting Spark application from Console

You can submit a Spark application that is written in Python, Java, or Scala language from the watsonx.data Console.

It is possible to specify the Spark application details, version to be considered, hardware and volume details, Spark environment properties and Spark configurations in the watsonx.data Console.

Applies to :

Spark engine

Apache Gluten accelerated Spark engine

watsonx.data on IBM Software Hub

Before you begin

Your Spark application must be available in an accessible storage.

Procedure

  1. Log in to the watsonx.data cluster. Go to the Infrastructure manager page.
  2. Click the name of Spark engine (either from list or topology view). Engine information window opens.
  3. In the Applications tab, click the Create application button. The Submit Spark application page opens.

    In this page, you can select one of the following tabs based on the method by which you need to submit the Spark application.

    • Select the Inputs tab. Configure the following details:
      Field Description
      Application type You have the following option:
      • Python: If your Spark application is written in Python language, select this option.
      • Java or Scala: If your Spark application is written in Java or Scala language, select this option.
      Application path Specify the path to your application. This is a mandatory field.

      Your application must be available in a storage or a mounted IBM Software Hub storage volume.

      Example s3a://<application-bucket-name>/iceberg.py
      Application name Specify a name for the application.
      Arguments Use the Add argument button to specify all arguments required by the application.
      Spark version Enter the Spark version for running your application. If you are using watsonx.data Spark engine, see Supported Spark versions and if you are using Apache gluten engine, see Supported Spark versions.
      Spark configuration properties Specify the Spark properties in the form of key-value pair ("<property_name>": "<property_value>") separated by comma. For more information about the different properties, see Properties.
      Spark environment properties Specify the Spark environment properties as key=value pairs. For more information about the different properties, see Environment properties
      Hardware configuration Specify the number of CPU cores (Driver and Executor) and memory that is required for the workload.

      Volume

      Specify the details of IBM Storage Hub volume that should be mounted for your Spark application. Click Add volume link to add details of each volume.

      Mount Path

      The Spark application nodes where the volume is mounted. The files in the volume, which is referenced in your application script must have path relative to the Mount path.

      Source sub path

      To mount only a specific directory in the volume, specify the path here.

      Dependencies

      Specify the path to files and names of packages required by your application script or jar.

      Import from payload

      Click this link to automatically import and furnish all fields under the Inputs tab if you have already specified the payload in the Payload tab.

    • Select the Payload tab.

      In the Application payload field, specify the application payload JSON that can be accepted by the Spark engine application creation REST API endpoint. You can either manually write the payload here or click the Import from inputs link to automatically build the JSON from the details provided in the Inputs tab.

      Sample payload when your Spark application resides in IBM Software Hub storage volume. When using the v2 API, set the <api_version> parameter to v2; for the v3 API, set it to v3.

      Important:

      Depending on your Spark application scenario, different cURL commands may be required. For more information about the supported scenarios and commands, see Submitting Spark application by using REST API.

      curl --request POST \
        --url https://<cpd_host_name>/lakehouse/api/<api_version>/spark_engines/<spark_engine_id>/applications \
        --header 'Authorization: Bearer <token>' \
        --header 'Content-Type: application/json' \
        --header 'LhInstanceId: <instance_id>' \
        --data '{
        "application_details": {
          "application": "/myapp/<python file name>"
        },
        "volumes": [
          {
            "name": "cpd-instance::my-vol-1",
            "mount_path": "/myapp"
          }
        ]
      }'
  4. Click Submit application. The application is successfully submitted and gets listed under the Applications tab.