April 18, 2016 | Written by: James Thomas
Categorized: Data Analytics | How-tos
Share this post:
With over 500 million tweets sent per day, Twitter provides an amazing data source that has been used for everything from detecting earthquakes to predicting flu outbreaks. IBM Bluemix provides access to this “firehose” through the IBM Insights For Twitter service, allowing developers to build their own analytics solutions utilizing this data stream. Developers can run search queries to match historical or new tweets, with the results enriched with additional content like sentiment using deep natural language processing algorithms.
However, building applications that can process Twitter data streams can be challenging due to the unpredictable nature of the data source.
In 2014, the World Cup final generated 35 million tweets during the ninety minutes, peaking at six hundred thousands tweets per minute.Brazil’s World Cup thrashing breaks Twitter records (BBC News)
Trending topics instantaneously appear, generating huge amounts of traffic before disappearing almost as fast.
In this series of blog posts we’re going to walk through building a scalable architecture for processing “real-time” Twitter streams. Using IBM Bluemix, IBM Insights For Twitter, Apache Kafka and Cloudant, we’ll build a processing pipeline using a series of microservices rather than a monolithic application. We’ll look at designing the architecture to support scaling automatically on response to fluctuating load and how to handle failures without losing work. Node.js and React.js will be used to build the frontend web application to display the “live” analysis results.
What Twitter data will we be analyzing?
Twitter has become an essential “second screen” for many football fans. Millions of messages with fans’ views are sent during matches. Looking at tweets about the matches can be an easier and faster way to understand how the teams and players are performing.
Could we build an application to perform this analysis automatically?
Processing all the tweets sent about matches, using natural language algorithms to calculate sentiment, before displaying the results in “real-time”. Match Tracker is an open-source demo application that does just that!
Over the coming weeks, we’ll be walking through the code and architecture to help you understand how to build similar applications. If you can’t wait, the full source code and installation instructions are now available in the IBM-Bluemix/match_tracker project on GitHub.
Open Match Tracker
If you want to see the application running live, click the button above and see the results.