IBM Support

Custom Spark Streaming Adapter For Reading Weather Data From Weather Company API (IBM Bluemix) - Hadoop Dev

Technical Blog Post


Custom Spark Streaming Adapter For Reading Weather Data From Weather Company API (IBM Bluemix) - Hadoop Dev



Most of the real world price models use weather as one of the dimensions. Weather patterns can influence the pricing decisions for commodities. But how do you obtain the weather data in realtime to use it in spark analytics or persist for offline BI analysis .

In this article I will be discussing about IBM Weather Company API from Bluemix and how to read the data from the API for realtime analytics in Spark .

IBM Bluemix offers a free subscription plan for Weather Company API .

TWC API has various REST URLs which one can use to access weather data for a geo location, post code and so on.. More details about the API can be found in Bluemix documentation site .

Reading the JSON string in Spark Streaming.

Spark streaming does not have an out of box support for reading JSON from remote REST API. Hence users should create a custom adapter . Custom adapters should extend Receiver class from Spark streaming and override start() and stop() function .

Once the adapter is created we have to use the same in SparkStreamingContext as shown in following snippet

  JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(20));  JavaReceiverInputDStream lines = ssc.receiverStream(new WcReceiver(args[0]));  

Here WCReceiver is my custom HTTP adapter to read weather JSON data. The complete project is available in my git repo

The JSON data is further persisted to HDFS so that external BigSQL or Hive tables can be mounted on it and used as a dimension table for analytics .

Steps to run the project

    Compile the program using following maven command
 mvn compile package
    Copy the generated jar file from target folder to spark cluster
    To test the program launch it using spark-submit local mode.The program accepts two parameters Weather data API URL and HDFS output folder as shown following.
     spark-submit --class "spark.wc.AnalyzeWeather" --master local[4] wc-0.0.1-SNAPSHOT.jar  https://<username>:<password> /tmp/output  

Following is an example URI and JSON from Weather company API for Ottawa,ON, Canada

    https://<username>:<password>  {"metadata":{"language":"en-US","transaction_id":"1489641873294:-51413653","version":"1","latitude":45.42,"longitude":75.69,"expire_time_gmt":1489644900,"status_code":200},"observation":{"key":"36821","class":"observation","expire_time_gmt":1489644900,"obs_id":"36821","obs_name":"Bakanas","valid_time_gmt":1489633200,"day_ind":"D","temp":28,...    

[{"Business Unit":{"code":"BU054","label":"Cloud & Data Platform"},"Product":{"code":"SSCRJT","label":"IBM Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]