Analyzing Twitter trends in real time with Apache Kafka and microservices

Share this post:

With over 500 million tweets sent per day, Twitter provides an amazing data source that has been used for everything from detecting earthquakes to predicting flu outbreaks. IBM Bluemix provides access to this “firehose” through the IBM Insights For Twitter service, allowing developers to build their own analytics solutions utilizing this data stream. Developers can run search queries to match historical or new tweets, with the results enriched with additional content like sentiment using deep natural language processing algorithms.

However, building applications that can process Twitter data streams can be challenging due to the unpredictable nature of the data source.

In 2014, the World Cup final generated 35 million tweets during the ninety minutes, peaking at six hundred thousands tweets per minute.Brazil’s World Cup thrashing breaks Twitter records (BBC News)

Trending topics instantaneously appear, generating huge amounts of traffic before disappearing almost as fast.

In this series of blog posts we’re going to walk through building a scalable architecture for processing “real-time” Twitter streams. Using IBM Bluemix, IBM Insights For Twitter, Apache Kafka and Cloudant, we’ll build a processing pipeline using a series of microservices rather than a monolithic application. We’ll look at designing the architecture to support scaling automatically on response to fluctuating load and how to handle failures without losing work. Node.js and React.js will be used to build the frontend web application to display the “live” analysis results.

What Twitter data will we be analyzing?

Twitter has become an essential “second screen” for many football fans. Millions of messages with fans’ views are sent during matches. Looking at tweets about the matches can be an easier and faster way to understand how the teams and players are performing.

Could we build an application to perform this analysis automatically?

Processing all the tweets sent about matches, using natural language algorithms to calculate sentiment, before displaying the results in “real-time”. Match Tracker is an open-source demo application that does just that!

Match Tracker

Over the coming weeks, we’ll be walking through the code and architecture to help you understand how to build similar applications. If you can’t wait, the full source code and installation instructions are now available in the IBM-Bluemix/match_tracker project on GitHub.

If you want to see the application running live, click the button above and see the results.

More How-tos stories
April 30, 2019

Introducing IBM Analytics Engine v1.2 and Announcing the Deprecation of IBM Analytics Engine v1.0

We are excited to inform you about the new version of IBM Analytics Engine v1.2 that will be available starting May 15, 2019. Along with this release, Analytics Engine v1.0 will be retired.

Continue reading

April 16, 2019

Announcing the Deprecation of the Decision Optimization Beta Service

The End of Beta date for the Decision Optimization service is May 17, 2019. The End of Beta Support date is June 20, 2019.

Continue reading

April 2, 2019

Data Refinery and Profiling Changes in Watson Studio and Watson Knowledge Catalog

We'd like to announce data refinery and profiling changes related to Watson Studio and Watson Knowledge Catalog that will take effect on May 17, 2019.

Continue reading