Submitting an Apache Spark application

You package a project as a Spark application and then you submit the application.

Procedure

  1. In Scala IDE, in the Package Explorer tab, right-click the package and click Export.
  2. Save the package as a JAR file. In the Export window, click Java, and click JAR file.
  3. Use the following commands to upload the application and dependency JAR files to IBM Open Platform management nodes:
    [root@iopmgmt1 /]# cd /datalake
    [root@iopmgmt1 datalake]# ls
    commons-csv-1.2.jar  datalake.jar  spark-csv_2.10-1.3.0.jar
    
  4. Use the following command to grant permission to hdfs user:
    [root@iopmgmt1 datalake]# chmod 777 -R /datalake/
  5. Use the following command to switch to hdfs user:
    [root@iopmgmt1 datalake]# su hdfs
    [hdfs@iopmgmt1 datalake]$
    
  6. Use the following command to submit the Spark application:
    [hdfs@iopmgmt1 root]$ spark-submit --master yarn-client 
    --jars /datalake/spark-csv_2.10-1.3.0.jar,/datalake/commons-csv-1.2.jar 
    --class datalake.spark.EventProcessing /datalake/datalake.jar
  7. View the result from the Spark console:
    Figure 1. Spark console result
    An example of the Spark console result that shows summary, IncomingEventCode, and Temperature.
  8. Check the output folder to see the new files that were created there. Use the cat command to preview the result:
    [hdfs@iopmgmt1 root]$ hdfs dfs -ls /outputdata/
    Found 2 items
    -rw-r--r--   3 hdfs    hdfs        216 2016-04-21 14:56 /outputdata/eventaggregation.csv
    drwxr-xr-x   - hdfs    hdfs          0 2016-04-21 14:56 /outputdata/temp.csv
    [hdfs@iopmgmt1 root]$ hdfs dfs -ls /outputdata/temp.csv
    Found 3 items
    -rw-r--r--   3 hdfs hdfs          0 2016-04-21 14:56 /outputdata/temp.csv/_SUCCESS
    -rw-r--r--   3 hdfs hdfs         99 2016-04-21 14:56 /outputdata/temp.csv/part-00000
    -rw-r--r--   3 hdfs hdfs        117 2016-04-21 14:56 /outputdata/temp.csv/part-00001
    [hdfs@iopmgmt1 root]$ hdfs dfs -cat /outputdata/eventaggregation.csv
    summary,IncomingEventCode,Temperature
    count,362821,362821
    mean,9420.645629663111,25.91079134013243
    summary,IncomingEventCode,Temperature
    stddev,6138.230667617183,7.416425065531075
    min,1,-2.22728
    max,24150,101.355341