Most of the real world price models use weather as one of the dimensions. Weather patterns can influence the pricing decisions for commodities. But how do you obtain the weather data in realtime to use it in spark analytics or persist for offline BI analysis .
In this article I will be discussing about IBM Weather Company API from Bluemix and how to read the data from the API for realtime analytics in Spark .
IBM Bluemix offers a free subscription plan for Weather Company API https://console.ng.bluemix.net/catalog/ .
TWC API has various REST URLs which one can use to access weather data for a geo location, post code and so on.. More details about the API can be found in Bluemix documentation site .
Reading the JSON string in Spark Streaming.
Spark streaming does not have an out of box support for reading JSON from remote REST API. Hence users should create a custom adapter . Custom adapters should extend Receiver class from Spark streaming and override start() and stop() function .
Once the adapter is created we have to use the same in SparkStreamingContext as shown in following snippet
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(20)); JavaReceiverInputDStream
lines = ssc.receiverStream(new WcReceiver(args));
Here WCReceiver is my custom HTTP adapter to read weather JSON data. The complete project is available in my git repo
The JSON data is further persisted to HDFS so that external BigSQL or Hive tables can be mounted on it and used as a dimension table for analytics .
Steps to run the project
- Compile the program using following maven command
mvn compile package
- Copy the generated jar file from target folder to spark cluster
- To test the program launch it using spark-submit local mode.The program accepts two parameters Weather data API URL and HDFS output folder as shown following.
spark-submit --class "spark.wc.AnalyzeWeather" --master local wc-0.0.1-SNAPSHOT.jar https://<username>:<password>@twcservice.mybluemix.net:443/api/weather/v1/geocode/45.42/75.69/forecast/hourly/48hour.json?units=m&language=en-US /tmp/output
Following is an example URI and JSON from Weather company API for Ottawa,ON, Canada