This blog is about sharing best practices, reports, case studies and other useful information related to BigData and Petascale analytics. We will focus on the IBM BigData platform: IBM Netezza, IBM InfoSphere Streams and BigInsights, IBM SPSS, R, etc. This blog will be maintained by multiple authors, from Warsaw Netezza department at IBM Poland Software Labs, which share their passion and excitement about big data, data warehouses, databases, analytics, and possible new ways to work with data. We would like to talk about ideas, everyday... [More]
Below we present use case of how IBM Netezza 1000 and statistical package R may be used for exploration of data from social media. In this example we use short text messages from Twitter that refer to two teams that were fighting for the champion title in Premier League, namely, Manchester United and Manchester City. Tweets were uploaded to the IBM Netezza 1000 database and then sentiment analysis and data aggregation was done in R that is embedded in the database. Below we present final visualization and its description: Final round of... [More]
Recently, two my colleagues (Grzegorz Kokosinski and Krzysztof Zarzycki) were at GTC 2012 (GPU Technology Conference) in San Jose, California. They gave a talk about a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU (Graphics Processing Unit), especially in area of DNA sequences. Problem definition: given a reference sequence, how to find the one most similar to it among a large database? The sequences are millions characters long, and their similarity is calculated with a (quadratic) DP algorithm,... [More]
One of the previous post was dedicated to sentiment analysis performed on messages (micro-blogs) published during final round of English premier league. This year (2012) in June, Poland and Ukraine are the host counties of the UEFA European Football Championship – Euro 2012 . Because of that, we thought that it could be interesting to show some additional examples of analysis performed on information retrieved from micro-blogging services during this event. Thanks to IBM technology – IBM BigData platform – we would like to show insight about... [More]
This post presents a one of the available analysis related to the
UEFA EURO 2012 - European Football Championship. The
attached graphic has the form of a cloud of words. Processed input
data comes from blogs posted on micro-blog site. The time frame for
this data is a few days before the beginning of the UEFA European
Football Championship. Data
collecting process is appropriate configured to collecting only post
related to UEFA EURO 2012 event. The
below analysis presents the most used words in micro blogs. Size of
the every word are... [More]
We performed analysis of sentiment to Poland and Russia before today's match. Here are charts showing, how this sentiment did changed in last days. First chart present aggregated value of Sentiment Index to days: and second show data aggregated to minutes: In addition to that we tried also to analyze tweets before the match between Poland and Russia and extract information about score predicted by users. 58% says "Russia beats Poland" and 42% says "Poland beats Russia"
We were monitoring tweets posted during yesterday's match Poland - Russia (and shortly afterwards). The player that was received most positively by users was Yuri Zhirkov. He was praised two times more often than Robert Lewandowski and four times more often than Przemyslaw Tyton. This trio was followed by Andrey Arshavin, Lukasz Piszczek and Alan Dzagoev, who got similar opinions from users. Interestingly, both goal scorers (Alan Dzagoev and Kuba Blaszczykowski) were rated not nearly as good as the top 3. Rafal Murawski, Sebastian Boenisch... [More]
Once more, before match Czech - Poland we performed analysis of tweets sent by football fans. We did analyze information between June 13rd and June 16th 12:00 CET. Below you can see cloud of results most frequently mentioned in tweets. Size of
the every score is calculated and scaled according to the number of
occurrences. CZECH REPUBLIC- POLAND 58% says "Poland beats Czech" Czech will win according to 24% people and every 5th person predicts draw The most frequent score in tweets for Czech-Poland is 1-2, second is 1-1.
The chart below presents a cumulative team sentiment index. Additional vertical lines marks matches that have taken place within each group. Note: The analyzed data cover the period from June 1st to June 16th, 2012 3:00p.m.
The following charts show the Team Sentiment Index within each group. In addition, the graphs show the number of microblogs, which were sent by fans. Note: The analyzed data cover the period from June 1st to June 16th, 2012 3:00p.m.
The most frequently mentioned coach, in tweets related to Euro2012, is Roy Hodgson (England) - 30% of all messages. Less than half of that have next in order: Laurent Blanc (France) - 13.2% and Franciszek Smuda (Poland) - 12,6%. After Saturday's games, we also verified how sentiment changes in time for coaches in group A.
For sure, due to Euro
2012 the visibility of Poland is much larger than a month ago. Especially in heads
of football fans. But if many people are talking about Poland then the question is what there are talking
about? Having a set of
messages from microblogs we filter them in order to present contexts of following
phrases: ‘Poland is’, ‘Poland will’, ‘Poland have’ or ‘Poland must’ . Despite that there are many tweets
and retweets with information that that in Poland the ending ‘ski’ is quite popular, there are other
interesting and... [More]
The last game in group phase was England vs Ukraine. An interesting game especially after the ,,ghost goal’’ for Ukraine. The term ,,ghost goal’’ was used by microblog users and refers to something that looks like goal but does not count in referee opinion. Since it might looks unfair for Ukraine to neglect this goal some rumors related to possible riots were posted. Discussions related to definitive and final decision of line referee and so on. Below you can trace the frequency of occurring following words ‘rooney’, ‘goal’, referee’, ‘error’,... [More]
I have selected several tweets which were most retweeted. A total winner is „RT @NiallOfficial: Wish i was in poland, t see the boys in green this week!” with over 130 000 000 retweets. Among the most retweeted posts, the majority concerned supporting the teams or expressing admiration or criticism for the players. Another kind of popular tweets were those with a hint of irony or humo r for instance: „ RT @Charles_HRH: Sweden have no ikea how to play football. #Euro2012”. Also, it came as a surprise that in the group of most popular tweets... [More]
To summary Group Phase of Euro2012 we also analyzed Coach Sentiment Index. Size of the dots represents number of tweets
created by fans. Additional reports below presents also detailed information about number of tweets related to each of the coaches:
While waiting for semi-finals, we have performed exploratory analysis for data from Twitter that refer to group stage and quarter-finals of Euro 2012. The report below presents this data from different perspectives. Few comments: as you may notice the number of tweets is highest during the games, and after that it drops significantly. This leads to observation that tweets can be treated as a heart rate monitor of Euro 2012, expressing the current sentiment to the teams during each game. Number of tweets is very high for almost every game.... [More]
50% of Twitter users, predicting score of Today's game between Spain and Italy, predicts that Spain will be the winner of Euro 2012. 39% indicates, that Italy will be a winner and only 11% expects a draw . Here is analysis of exact scores mentioned in tweets: 2-1 (18%), 1-2 (18%), 1-0 (11%), 0-1 (9%), 1-3(7%), 2-0 (7%), 1-1 (7%), 3-1 (6%), 2-0 (6%).
This time we focus on how far can a tweet go. Before, in this post we have been analysing how long does a popular tweet live. Here we want to show what is the range of people a tweet can reach and how does it change in time. For eight selected tweets from the top 50 most popular tweets we have prepared a chart showing how many people can see them. This shows that a tweet can reach a big group of people very fast in the beginning, and with time the increase in the number of receivers is not that rapid. There seems to be two different schemas,... [More]
The play-off phase of EURO 2012 tournament was widely commented by Twitter users. As we can see, the goals are depicted in the peeks of graphs. Also, tweets seem to show who was controling the game in the time. In two games there were penalties (Italy – England, Spain – Portugal). At the end of this two graphs we can see that the shoot-out winner gained boost of popularity.
To summary the analysis performed during Euro 2012, we prepared graphic with the most interesting results. Number of tweets is highest during the games, and
after that it drops significantly. Especially high number of tweets
could be observed during opening and final games. Before and after the
games, number of tweets related to Euro 2012, was much lower. Also,
when we look at detailed information for few games (three games of
Polish team and games from semifinal and final phases), we can see, that
goal is correlated with... [More]
The Euro championship 2012 in Poland and Ukraine has ended. On this blog we have created a lot of blog entries related to football, but this entry is different. This one is related to Polish host cities. From millions of tweets harvested during Euro we would like to squeeze out information what football fans say about Poland and Polish host cities. The infographics under consideration was created with the use of the Revolution R and IBM Netezza 1000 analytics suite. The full size version of the poster is available here . Let’s have a look... [More]
Here is a short summary of some additional types of analyses one can perform on the data from microblogs. Unlike with Euro Championships, for the London Summer Olympics we were able to collect a substantial amount of geo-tagged data, probably because more people allow for use of geo-location in their mobile devices. The graphic below summarizes some of the interesting outcomes. The first circle presents the Olympic Park of London with tweets marked as black dots. Even without the map underneath, the location of the Olympic Stadium can be... [More]