Posted in: Big Data Analytics, IBM Research-India

Cricket Fans Score Big with Data

All-rounder James Faulkner was scoring well before his double wicket maiden that clinched Australia’s 2015 Cricket World Cup finals win over New Zealand. He was scoring with data, or maybe more appropriately, with #ScoreWithData, IBM’s social media insight into players, teams, matches, brands, cities, and fans during the Cricket World Cup – which is played every four years and rivals FIFA’s World Cup in popularity. By the end of the six-week-long event played across Australia and New Zealand, Faulkner’s 30 percent “buzz” of 1 million tweets made him the online MVP, well before he earned player of the match versus co-host New Zealand.

Cricket AllroundersIBM BigInsights, using our Social Data Accelerator plugin, scanned Twitter for all things “Cricket World Cup” to analyze sentiment about teams, rivalries, players, play on the field, and other factors to give fans a new dimension to the 14 countries and 49 matches. We used between 700 and 800 keywords per match, ranging from obvious ones like names of players, referees, and stadiums, to cricket-specific technology like “spidercam,” and “UDRS.” We also tracked several hashtags like #cwc15, and #INDvPAK, and followed twitter handles of popular cricket players, sports journalists covering the Cup, retired players and cricket organizations. And at every hour, every day of the Cup from February 14 to March 29, we ingested relevant tweets, and analyzed about 100,000 tweets per match on an average, reaching a peak of 1 million during the semi-final and final matches.

The Research team in India used advanced data curation and integration capabilities, coupled with text mining and natural language processing to continuously deliver insights through the @scorewithdata Twitter handle and to CNN-IBN, a leading English news TV channel  tracking the Cup. We performed fine-grained temporal analytics around short-lived in-game action like boundaries, sixes and wickets to generate insights about which particular event generated more social media attention.

In addition to temporal analytics and social sentiment analytics, we ranked celebrity tweets during the course of the match using an Influencer Index – a metric that helped predict which tweet will generate the maximum retweets. We routinely generated interesting insights too, such as the buzz around the retirement “farewell” for Pakistan’s Misbah-ul-Haq and Shahid Afridi. They generated more than four times as much online chatter as the farewells for Sri Lanka’s Kumar Sangakkara and Mahela Jayawardene. And controversial umpires gained their fair share of unwanted attention. On-field umpire Aleem Dar became the most-talked-about referee during the quarter-final week (with 61 percent of Cup social media chatter) due to controversial “excessive height,” and “no ball” decisions in the India-Bangladesh match.

Cricket Analytics

This analysis gave media agency Ogilvy and CNN-IBN splashy graphics to show during matches, like when India’s batsman Virat Kohli had the most buzz in their match against Australia at 46 percent – but 14 percent of it was negative (also the highest among players in this match). It was a clear reflection of his sub-par performance in the eyes of a billion of his countrymen. We also shared more specific statistics about in-game actions, for example, New Zealand’s Grant Elliot hitting the game-winning “towering six” off of South Africa’s Dale Steyn. This event generated more tweets than any other boundary, six or wicket in that match, as it signaled an end to a nail-biting thriller match!

Sports: The Perfect Data Generator for Sentiment Analysis

More data, especially as opinionated natural language, means more machine learning opportunities. IBM measured “Brazil’s passion for soccer” last year, and served up social sentiment analysis during Wimbledon, too. Real time analysis gives fans a closer connection to their favorite teams and players. In this new multimedia world of the second screen, they can rub digital elbows with celebrities, like TV personality Harsha Bhogle during India’s match with Australia.

And they can see how their reactions  stack up with others – like the fact that, despite the India-Bangladesh match being the most popular match amongst the four quarter-finals, the bowler and an all-rounder with the  most buzz was neither an Indian nor a Bangladeshi, but rather two Pakistanis (Wahab Riaz and Shahid Afridi respectively).

IBM wants to do even more with social media data and sentiment analysis. Improving the ability to understand human language will help machines provide better information for everything from a doctor’s medical diagnosis, to a meteorologist’s weather forecast.

Mitesh Vasa

IBM Research