A well-accepted theory of psychology, marketing, and other fields is that human language reflects personality, thinking style, social connections, and emotional states. The frequency with which we use certain categories of words can provide clues to these characteristics. Several researchers found that variations in word usage in writings such as blogs, essays, and tweets can predict aspects of personality (Fast & Funder, 2008; Gill et al., 2009; Golbeck et al., 2011; Hirsh & Peterson, 2009; and Yarkoni, 2010).
IBM conducted a set of studies to understand whether personality characteristics inferred from social media data can predict people's behavior and preferences. IBM found that people with specific personality characteristics responded and re-tweeted in higher numbers in information-collection and -spreading tasks. For example, people who score high on excitement-seeking are more likely to respond, while those who score high on cautiousness are less likely to do so (Mahmud et al., 2013). Similarly, people who score high on modesty, openness, and friendliness are more likely to spread information (Lee et al., 2014).
IBM also found that people with high openness and low emotional range (neuroticism) as inferred from social media language responded more favorably (for example, by clicking an advertisement link or following an account), results that have been corroborated with survey-based, ground-truth checking. For example, targeting the top 10 percent of users in terms of high openness and low emotional range resulted in increases in click rate from 6.8 percent to 11.3 percent and in follow rate from 4.7 percent to 8.8 percent.
Multiple recent studies disclosed similar results for characteristics that were computed from social media data. One recent study with retail store data found that people who score high in orderliness, self-discipline, and cautiousness and low in immoderation are 40 percent more likely to respond to coupons than the random population. A second study found that people with specific values showed specific reading interests (Hsieh et al. 2014). For example, people with a higher self-transcendence value demonstrated an interest in reading articles about the environment, and people with a higher self-enhancement value showed an interest in reading articles about work. A third study of more than 600 Twitter users found that a person's personality characteristics can predict their brand preference with 65 percent accuracy.
The following sections expand upon these high-level findings to describe the research and development behind the Personality Insights service. For more information about studies that apply the service to tangible scenarios, see The service in action.
For the Personality Insights service, IBM developed models to infer scores for Big Five dimensions and facets, Needs, and Values from textual information. The models reported by the service are based on research in the fields of psychology, psycholinguistics, and marketing:
Big Five is one of the best studied of the personality models developed by psychologists (Costa & McCrae, 1992, and Norman, 1963). It is the most widely used personality model to describe how a person generally engages with the world. The service computes the five dimensions and thirty facets of the model. The dimensions are often referred to by the mnemonic OCEAN, where O stands for Openness, C for Conscientiousness, E for Extraversion, A for Agreeableness, and N for Neuroticism. (Because the term Neuroticism can have a specific clinical meaning, the service presents such insights under the more generally applicable heading Emotional Range.)
Needs are an important aspect of human behavior. Research literature suggests that several types of human needs are universal and directly influence consumer behavior (Kotler & Armstrong, 2013, and Ford, 2005). The twelve categories of needs that are reported by the service are described in marketing literature as desires that a person hopes to fulfill when considering a product or service.
Values convey what is most important to an individual. They are "desirable, trans-situational goals, varying in importance, that serve as guiding principles in people's lives" (Schwartz, 2006). Schwartz summarizes five features that are common to all values: (1) values are beliefs; (2) values are a motivational construct; (3) values transcend specific actions and situations; (4) values guide the selection or evaluation of actions, policies, people, and events; and (5) values vary by relative importance and can be ranked accordingly. The service computes the five basic human values proposed by Schwartz and validated in more than twenty countries (Schwartz, 1992).
The Personality Insights service infers personality characteristics from textual information based on an open-vocabulary approach. This method reflects the latest trend in the research about personality inference (Schwartz et al., 2013, and Plank & Hovy, 2015).
The service first tokenizes the input text to develop a representation in an n-dimensional space. The service uses an open-source word-embedding technique called GloVe to obtain a vector representation for the words in the input text (Pennington et al., 2014). It then feeds this representation to a machine-learning algorithm that infers a personality profile with Big Five, Needs, and Values characteristics. To train the algorithm, the service uses scores obtained from surveys conducted among thousands of users along with data from their Twitter feeds.
IBM developed the models for all supported languages in an identical way. The models were developed independent of user demographics such as age, gender, or culture. In the future, IBM might develop models that are specific to different demographic categories.
Earlier versions of the service used the Linguistic Inquiry and Word Count (LIWC) psycholinguistic dictionary with its machine-learning model. However, the open-vocabulary approach just described outperforms the LIWC-based model. For more information about the service's precision for each language in terms of average Mean Absolute Error (MAE) and correlation, see How precise is the service. For general guidelines about providing input text to achieve the most accurate results, see Guidelines for providing input text.
IBM conducted a validation study to understand the accuracy of the service's approach to inferring a personality profile, comparing scores that were derived by its models with survey-based scores for Twitter users. To establish ground truth, participants took four sets of standard psychometric tests:
50-item Big Five derived from the International Personality Item Pool (IPIP)
120-item Facet derived from the IPIP Neuroticism, Extraversion & Openness (IPIP-NEO)
52-item fundamental Needs developed by IBM
26-item basic Values developed by Schwartz
IBM collected survey responses and Twitter feeds from between 1500 and 2000 participants for all characteristics and languages. Based on these results, IBM determined the average MAE and the correlation between inferred and actual scores for the different categories of personality characteristics. These results place the service at the cutting edge of personality inference from textual data as indicated by Schwartz et al. (2013) and Plank and Hovy (2015).
Average MAE /
Average MAE /
Average MAE /
Average MAE /
|Big Five dimensions||0.12 / 0.33||0.10 / 0.35||0.11 / 0.27||0.09 / 0.17|
|Big Five facets||0.12 / 0.28||0.12 / 0.21||0.12 / 0.22||0.12 / 0.14|
|Needs||0.11 / 0.22||0.12 / 0.24||0.11 / 0.25||0.11 / 0.13|
|Values||0.11 / 0.24||0.11 / 0.19||0.11 / 0.19||0.10 / 0.14|
To compute the percentile scores, IBM collected a very large data set of Twitter users (one million users for English, 100,000 users for each of Arabic and Japanese, and 80,000 users for Spanish) and computed their personality portraits. IBM then compared the raw scores of each computed profile to the distribution of profiles from those data sets to determine the percentiles.
For Arabic input, the service is unable to produce meaningful percentiles and raw scores for a number of personality characteristics. For more information, see Limitations for Arabic input.
The relationship between personality and purchasing behavior has been studied across a variety of products and services:
Chen (2007), while testing preferences concerning organic foods, indicated that an individual's personality characteristics play an important role in establishing personal food-choice criteria.
Schlegelmilch at al. (1996) explored the relationship between personality variables and pro-environmental purchasing behavior. The authors showed that consumers' overall environmental consciousness has a positive impact on green purchasing decisions.
Hymbaugh and Garrett (2007) investigated the relationship between personality and skydiving and found that people who score high in adventurousness and excitement-seeking generally indulge in skydiving. (For more information, see Risk profiling.)
Applying these known relations between consumption behaviors and personality is challenging: Most of these works used personality data derived from surveys, and their models are not publicly available. IBM therefore decided to learn these consumption preference models directly. When training the models, IBM used personality scores returned from the Personality Insights service as features. As a result, when you apply these models to calculate a user's personality characteristics with the service, the predictions are likely to be more accurate.
The Personality Insights service infers consumption preferences based on the results of its personality profile for the author of the input text. From existing literature, IBM identified 104 consumption preferences that have proved to be correlated with personality. These include preferences related to shopping, movies, music, and other categories. IBM then created a psychometric survey to assess an individual's inclination for each consumption behavior.
IBM obtained responses to its survey from about 600 individuals for whom it also had Twitter data (more than 200 self-authored tweets for each user). IBM submitted the tweets to the service to gather a personality profile for each individual. It then built a classifier for each consumption preference, where the input feature set was the personality information.
For inclusion with the service, IBM selected only those consumption preferences for which personality-based classification performed at least 9 percent better than random classification. Of the original 104 preferences, 42 satisfied this criterion and are exposed as consumption preferences by the service.
When developing the Personality Insights service, IBM relied on personality surveys to establish ground-truth data for personality inference. Ground truth refers to the factual data obtained through direct observation rather than through inference. A typical measure of accuracy for any machine-learning model is to compare the scores inferred by the model with ground-truth data; the previous sections describe how IBM used surveys to validate the accuracy of the service.
The following notes clarify the use of personality surveys and survey-based personality estimation:
Personality surveys are long and time-consuming to complete. The results are therefore constrained by the number of Twitter users who were willing and available to participate in IBM's study. IBM plans to conduct validation studies with more users, as well as with users of other online media such as email, blogs, and forums.
Survey-based personality estimation is based on self-reporting, which might not always be a true reflection of one's personality: Some users might give noisy answers to such surveys. To reduce the noise, IBM filtered survey responses by including attention-checking questions and by discarding surveys that were completed too quickly.
While the correlation between inferred and survey-based scores is both positive and significant, the results imply that inferred scores might not always correlate with survey-based results. Researchers from outside of IBM have also done experiments to compare how well inferred scores match those obtained from surveys, and none reported a fully consistent match:
Golbeck et al. (2011) reported an error rate of 10 to 18 percent when matching inferred scores with survey-based scores.
Sumner et al. (2012) reported approximately 65-percent accuracy for personality prediction.
Mairesse and Walker (2006) reported 60- to 70-percent accuracy for Big Five personality prediction.
In general, it is widely accepted in research literature that self-reported scores from personality surveys do not always fully match scores that are inferred from text. What is more important, however, is that IBM found that characteristics inferred from text can reliably predict a variety of real-world behavior.