Data Mining Patterns Derived from SMS Mobile Data
Christian Karasiewicz 270005XS4E Visits (5330)
This blog post is contributed by John N. Ryan. John Ryan has worked in the SMS communication sector since 2001 within an Irish company, Go2mobile Solutions.
I was asked to write a blog entry on identifying patterns within mobile data. Since my background is in SMS (Text Messages), the mobile data used for pattern identification for this blog is in relation to finding nuggets of information within a text messaging corpus. Just as a point of interest, all the images produced for this entry were generated through IBM’s excellent ManyEyes Visualisation tool (www-958.ibm.com).
SMS generates huge volumes of data, due to its popularity. This is especially the case within Ireland; over 2.9 billion messages were sent in the fourth quarter of 2012 (Comreg, 2013). Thus, there is potentially valid data mining trends for businesses to uncover such as analysing customer SMS originated feedback and reviews. Using data analytics could inform the business if their customers are satisfied or displeased with a service.
SMS has featured in various types of data analytic research. Some of which is briefly listed below to give you a flavour of what has been accomplished in this area:
SMS has featured in vari
In a nutshell, as referenced from these previous research examples, effective patterns can be discerned and summarisation of text messaging content is possible. These text mining methods can include understanding the polarity (for example was the text positive or negative) as well as visually referencing the information using images such as Wordclouds.
If you do not have access to corporate warehoused SMS data for your research then obtaining a publicly available SMS corpus can be challenging due to privacy concerns (Chen et al ). However, there is a solution; Chen et al  generated a SMS dataset that is freely available to download from http
Using Wordcloud visualisations allows summarisation of this SMS corpus to be garnered by displaying the more emphasised words (also called “terms”) in larger font sizes than less used terms, as represented in Figure 1. By removing some of those key terms (as in Figure 2) allows time intervals to be clearly emphasised as a trend; time, late, now, morning, later. Thus, it gives you a flavour of the context of the corpus; this would be especially beneficial if compiling a SMS feedback survey, for example, in order to visually comprehend the most emphasised concerns/issues through the most significantly used words.
Emotional polarity can be deduced from the sentiment analysis package (Jurka, 2012) that is used within R (htt
Figure 1 SMS Corpus
Figure 2 SMS Corpus with some key terms removed
There are many other visualisation methods that could be used to mine the content further, such as defining PhraseNets (Figure 3) and Word Trees (where emoticons could be analysed as in Figure 4). Six emotions (Surprise, Anger, Joy, Sadness, Disgust and Fear) could also be scored within the sentiment package in R. Another beneficial text mining package within R is called tm (Feinerer et al., 2008); it should be noted that tm is required when using the previously mentioned sentiment package.
Table 1 SMS Corpus
Figure 3 PhraseNet SMS Corpus
Figure 4 Word Tree SMS Corpus - in this case using Emoticon
In conclusion, while this is a very brief overview, it does provide a flavour of what is possible. Just to recap, there are extremely powerful tools for data analysis such as RapidMiner and R, and excellent visualisation tools such as IBM’s ManyEyes; which in fact has a new Version 2 currently in a beta stage. You should definitely spend some time trying these out.
Some issues to watch out for when analysing SMS include possibly requiring specialised dictionaries to overcome SMS abbreviated text (these dictionaries help translate the abbreviations such as gr8t to great) and emoticons ( : ) as happy/smile). These should help to maximise your pattern outcome. Of course, an extra benefit of SMS is that due to the size of the 160 character message, the mining model derived for analysing it could also be potentially used for reviewing social media platforms such as Twitter.
If you wish to review any of the previously mentioned references, the following reports and research papers are listed below: