Submitting a Spark application with GPU RDD

You can submit an application to the instance group that uses GPU Resilient Distributed Dataset (RDD), which supports adaptive GPU scheduling.

Before you begin

  • You must be a cluster or consumer administrator, consumer user, or have the Spark Applications Submit permission to submit Spark applications to an instance group.
  • You must have started the instance group that allocates GPUs for its applications. See Starting instance groupsStarting Spark instance groups.
  • You must have a Spark application to submit to the GPU instance group.
    • You can use either a Python, Scala, or R API that the Spark application uses to create a new GPU RDD. For more information on the APIs, see GPU RDD sample and API examples. For a Python sample on IBM® Cloud, see conductor-gpu-sample.
    • Alternatively, copy the wordcount_gpu.py sample, which uses the Python RDD API to create a new RDD whose tasks run on GPU slots. It is recommended that you use this sample when your cluster is installed to a shared file system, such as IBM Spectrum Scale. When you use the wordcount_gpu.py sample, complete these steps:
      1. Save the sample to the mounted file system, for example: /gpfs/conductorFS.
      2. Create a sample_input subdirectory to save input data in the file system, for example: /gpfs/conductorFS/sample_input.
    Note: If you enabled adaptive scheduling, the SPARK_EGO_WORKLOAD_TYPE environment variable is set internally when the task is run to indicate workload type (either GPU or CPU). You can define different logic for GPU and CPU processing in the application task logic. For example:
    def feature_extractor(path):
      if (os.environ.has_key("SPARK_EGO_WORKLOAD_TYPE”)) and (os.environ[‘SPARK_EGO_WORKLOAD_TYPE’] == ‘GPU’):
        feature = runGPULogical()
      else:
        feature = runCPULogical()
      return feature
    sc.parallelize(...).gpu().map(lambda path: feature_extractor(path)).collect()

About this task

You can write a Spark application that uses a Resilient Distributed Dataset (RDD) API, either in Python, Scala, or R to create a new GPU RDD; whose tasks run on GPU resources in your cluster.

When you submit a Spark application as a batch application to the instance group, you can configure the following parameters and samples:
  • spark.ego.gpu.mode Spark parameter (or the SPARK_EGO_GPU_MODE environment variable): Specifies either an exclusive or default GPU mode so that the Spark executor can be started on the corresponding GPU that has the mode you request.
  • Submit the wordcount_gpu.py sample.

Procedure

  1. From the cluster management console, click My Applications & Notebooks.
  2. Click Run Application.

    To schedule application submission, click Schedule Application. For more information, see Scheduling Spark batch application submission to an instance group.

  3. Click Change master to change the default Spark master URL of the instance group to the Spark master that you created for GPUs.
  4. Enter the GPU Spark on EGO parameters that you want to configure in Other options. For example:
    • To specify either default or exclusive GPU mode so that the Spark executor can be started on the corresponding GPU that has the mode you request, enter:
      --conf  spark.ego.gpu.mode=default OR --conf  spark.ego.gpu.mode=exclusive
    • To submit the wordcount_gpu.py sample from a directory on a shared file system, enter:
      /shared_mount/wordcount_gpu.py /shared_mount/sample_input /shared_mount/sample_output
      where shared_mount specifies the mounted file system (for example: gpfs/conductorFS).
  5. Click Submit.

Results

When the Spark application is submitted, GPU and CPU resources are allocated to the application.

What to do next

Monitor the batch application that is associated with the instance group. See Monitoring Spark applications.

You can drill down from the Spark master web UI to monitor task details. If you enabled adaptive scheduling, additionally, you can use the Workload Type column in the task list to check whether a task is running on a GPU or CPU host.