Submitting an Apache Spark application
You package a project as a Spark application and then you submit the application.
Procedure
- In Scala IDE, in the Package Explorer tab, right-click the package and click Export.
- Save the package as a JAR file. In the Export window, click Java, and click JAR file.
-
Use the following commands to upload the application and dependency JAR files to IBM Open
Platform management nodes:
[root@iopmgmt1 /]# cd /datalake [root@iopmgmt1 datalake]# ls commons-csv-1.2.jar datalake.jar spark-csv_2.10-1.3.0.jar -
Use the following command to grant permission to hdfs user:
[root@iopmgmt1 datalake]# chmod 777 -R /datalake/ -
Use the following command to switch to hdfs user:
[root@iopmgmt1 datalake]# su hdfs [hdfs@iopmgmt1 datalake]$ -
Use the following command to submit the Spark application:
[hdfs@iopmgmt1 root]$ spark-submit --master yarn-client --jars /datalake/spark-csv_2.10-1.3.0.jar,/datalake/commons-csv-1.2.jar --class datalake.spark.EventProcessing /datalake/datalake.jar -
View the result from the Spark console:
Figure 1. Spark console result 
-
Check the output folder to see the new files that were created there. Use the cat command to
preview the result:
[hdfs@iopmgmt1 root]$ hdfs dfs -ls /outputdata/ Found 2 items -rw-r--r-- 3 hdfs hdfs 216 2016-04-21 14:56 /outputdata/eventaggregation.csv drwxr-xr-x - hdfs hdfs 0 2016-04-21 14:56 /outputdata/temp.csv [hdfs@iopmgmt1 root]$ hdfs dfs -ls /outputdata/temp.csv Found 3 items -rw-r--r-- 3 hdfs hdfs 0 2016-04-21 14:56 /outputdata/temp.csv/_SUCCESS -rw-r--r-- 3 hdfs hdfs 99 2016-04-21 14:56 /outputdata/temp.csv/part-00000 -rw-r--r-- 3 hdfs hdfs 117 2016-04-21 14:56 /outputdata/temp.csv/part-00001 [hdfs@iopmgmt1 root]$ hdfs dfs -cat /outputdata/eventaggregation.csv summary,IncomingEventCode,Temperature count,362821,362821 mean,9420.645629663111,25.91079134013243 summary,IncomingEventCode,Temperature stddev,6138.230667617183,7.416425065531075 min,1,-2.22728 max,24150,101.355341