Summary of Euro 2012
GrzegorzPuchawski 2700050UAH Visits (3050)
To summary the analysis performed during Euro 2012, we prepared graphic with the most interesting results.
Number of tweets is highest during the games, and after that it drops significantly. Especially high number of tweets could be observed during opening and final games. Before and after the games, number of tweets related to Euro 2012, was much lower. Also, when we look at detailed information for few games (three games of Polish team and games from semifinal and final phases), we can see, that each goal is correlated with peak of comments about the team.
This leads to observation that tweets can be treated as a heart rate monitor of Euro 2012, expressing the current sentiment to the teams during each game. In our case this is related to the very popular sport activities - Football Championships, but the same rules can be applied in other areas.
The most popular language used during Euro 2012 is English. There is a lot of people that have declared other language in their settings but they tweets in English. Also we observe significant number of tweets in Spanish and French.
We also performed an analysis of predictions from tweets that have been sent before each game. This time it is done for all games. Having predictions and the true result the likelihood of reporting correct outcome is calculated and presented in the table. Exact score of final game between Spain and Italy (4-0) once again was very hard to predict, especially that many fans were hoping that Italy will win this game.
At the bottom, you can see a chart, where we presents how fans write about coaches. The OX axis presents number of tweets in the logarithmic scale while the OY axis presents the Sentiment Index (positive/negative) to each of the coaches. You can see, how the Sentiment Index was changing in time. There are 3 points representing accumulated sentiment: after group stage, after quarter-finals and after finals.
What’s interesting, most positive sentiment could be observed about Cesare Prandelli (ITA), which gained a lot of applause after game with Germany. Negative sentiment can be observed for many coaches, and the lowest one can be observed for Laurent Blanc (France).
Finally, we would like to present some interesting observation about "life" of the tweet. At the right bottom, you can see how fast the most popular tweets spread out among users and how long these tweets are popular (how long they are retweeted by users). Very popular tweets can reach a big group of people very fast in the beginning, and with time the increase in the number of receivers is not that rapid. When looking at retweet patterns, you can see, that when the tweet is born there is a huge peek showing that people have great interest in sharing the news. But with time their fame deteriorates, as the news is no longer new.
All of the analysis during Euro 2012, were performed on data retrieved from Twitter. All tweets were collected between June 1st and July 3rd, 2012 thanks to the Distributed Information Retrieval system implemented within the Netezza Lab. All of the information were parsed and loaded into IBM Netezza 1000 (Netezza Data Warehouse Appliance). After that we used several functions offered by INZA (IBM Netezza Analytics), like:
- we used INZA Map Reduce framework to perform text analysis of tweets (e.g.: remove stop words, convert all of the words to the basic form (steamer), calculate statistics of all words occurance, identify language of the tweet, etc.),
- we used different analytical algorithms offered by INZA to prepare different classifications of tweets and statistical analysis,
- we also heavily used R for additional analysis and visualization.
Finally we also developed several new algorithms that helped us extract and analyze all the information.