Table of contents

Persisting Spark applications

You can choose how to persist your Spark application job files. You can save those files:

Persisting Spark applications in a deployment space

You can only persist Spark applications in a deployment space if the Spark advanced features were enabled. See Using advanced features.

Follow these steps to persist Spark applications as an asset in a deployment space:

  1. Get the deployment space name from the service instance details page. See Managing Analytics Engine powered by Apache Spark instances.
  2. From the Navigation menu on the IBM Cloud Pak for Data web user interface, click Deployments and select your space.
  3. From the Assets page of the space, upload your Spark application.
  4. Run the application as a persisted asset. Use the following Spark job payload as an example:
  {
      "application_details": {
          "application": "/home/spark/space/assets/data_asset/<spark_application_name>",
          "application_arguments": [""],
          "class": "<main_class>"
        }
  }

Persisting Spark applications in Object Storage

The application job files can be stored in a S3 compatible Object Storage bucket. The following steps describe how this can be done for an IBM Cloud Object Storage bucket.

Follow these steps to persist a Spark application in IBM Cloud Object Storage:

  1. Upload the application job file (<OBJECT_NAME>) to an IBM Cloud Object Storage bucket (<BUCKET_NAME>) in a IBM Cloud Object Storage service (<COS_SERVICE_NAME>).
  2. Ensure that the following Spark environment properties are passed in the payload:

     "spark.hadoop.fs.cos.<COS_SERVICE_NAME>.endpoint":"<COS_ENDPOINT>"
     "spark.hadoop.fs.cos.<COS_SERVICE_NAME>.secret.key":"<COS_SECRET_KEY>"
     "spark.hadoop.fs.cos.<COS_SERVICE_NAME>.access.key":”<COS_ACCESS_KEY>"
    
  3. Run the application persisted in IBM Cloud Object Storage. Use the following Spark job payload as an example:
  {  
    "application_details": {
      "application": "cos://<BUCKET_NAME>.<COS_SERVICE_NAME>/<OBJECT_NAME>",
      "application_arguments": ["cos://<BUCKET_NAME>.<COS_SERVICE_NAME>/<OBJECT_NAME>"],
      "class": "<main_class>",
      "conf": {
        "spark.app.name": "MyJob",
        "spark.hadoop.fs.cos.<COS_SERVICE_NAME>.endpoint": "<COS_ENDPOINT>",
        "spark.hadoop.fs.cos.<COS_SERVICE_NAME>.secret.key": "<COS_SECRET_KEY>",
        "spark.hadoop.fs.cos.<COS_SERVICE_NAME>.access.key": "<COS_ACCESS_KEY>"
      }
    }
  }

Persisting Spark applications in a service volume instance

You can persist Spark application job files in any one of the supported IBM Cloud Pak for Data volumes:

  • NFS storage
  • Portworx
  • OCS

To learn how to use a volume instance to create directories and add your application files, see Managing persistent volume instances with the Volumes API.

The following example shows a Spark application that is uploaded under the customApps directory inside the vol1 volume, which is mounted as /myapp on the Spark cluster.

{
  "application_details": {
      "application": "/myapp/<spark_application>",
      "application_arguments": [""],
      "conf": {
        "spark.app.name": "JSFVT",
        "spark.executor.extraClassPath": "/myapp/*",
        "spark.driver.extraClassPath": "/myapp/*"
      }   
  },
  "template_id": "<template_id>",
  "volumes": [{
      "name": "vol1",
      "mount_path": "/myapp",
      "source_sub_path": "customApps"
      }, {
      "name": "vol2",
      "source_sub_path": "",
      "mount_path": "/data"
  }]
}