Output: Understanding a profile

When you use the profile method to analyze content, you can select the format of the profile that the service sends in response: JSON or Comma-Separated Values (CSV). Both formats return the same requested data based on the parameters of the call.

The following sections describe the content of a response in both formats. For more information about calling the profile method, see Input: Requesting a profile. For complete details about the method, see the API reference.

JSON response content

The service returns the results of its analysis as a JSON Profile object by default or when you specify application/json with the Accept header of a request. The scope of the JSON output depends on the parameters you specified with the request and on whether the input text represents timestamped data, such as the text associated with a Twitter feed. The Profile object has the following fields:

  • word_count (integer) provides the number of words from the input content that were used to generate the profile. This can be less than the number of words in the input if the request submitted a large amount of content. If the number of words fails to meet a minimum threshold, the output also includes a word_count_message field that provides additional guidance.
  • processed language (string) describes the language model that the service used to process the input: ar (Arabic), en (English), es (Spanish), or ja (Japanese).
  • personality is a recursive array of TraitTreeNode objects that describes the Big Five dimensions and facets inferred from the input text.
  • needs is an array of TraitTreeNode objects that describes the Needs inferred from the input text.
  • values is an array of TraitTreeNode objects that describes the Values inferred from the input text.
  • behavior is an array of BehaviorNode objects that describes the distribution of the content over the days of the week and the hours of the day. The service returns the field only for JSON input that is timestamped.
  • consumption_preferences is an array of ConsumptionPreferenceCategoryNode objects that provides results for each category of consumption preferences. The elements of the array provide information for the individual preferences of that category. The service returns the field only if the consumption_preferences query parameter of the request is set to true.
  • warnings is an array of Warning objects that provides messages associated with the input text. The array is empty if the input generated no warnings.

The following example output shows the high-level structure of a Profile object. The input content is timestamped JSON, so the response includes the behavior field. The request also asks for consumption preferences, so the response includes the consumption_preferences field.

{
  "word_count": 15223,
  "processed_language": "en",
  "personality": [
     . . .
  ],
  "needs": [
     . . .
  ],
  "values": [
     . . .
  ],
  "behavior": [
     . . .
  ],
  "consumption_preferences": [
     . . .
   ],
  "warnings": []
}

Personality characteristics output

The Profile object always includes personality, needs, and values fields for all types of input. Each of these fields contains an array of TraitTreeNode objects that describes the personality characteristics for the attributes of that type of characteristic. For Needs and Values characteristics, the array has a single level that describes the characteristics. For Big Five characteristics, a top-level array describes the dimensions, and second-level arrays describe the facets of each dimension.

  • trait_id (string) is the unique ID of the characteristic to which the results pertain:

    • big5_characteristic for Big Five personality dimensions
    • facet_characteristic for Big Five personality facets
    • need_characateristic for Needs
    • value_characateristic for Values
  • name (string) is the user-visible name of the characteristic.

  • category (string) is the category of the characteristic:

    • personality for Big Five personality characteristics
    • needs for Needs
    • values for Values
  • percentile (double) is the normalized percentile score for the characteristic. For more information, see Percentiles for personality characteristics.

  • raw_score (double) is the raw score for the characteristic. The field is returned only if you request raw scores by setting the raw_scores query parameter to true. For more information, see Raw scores for personality characteristics.

  • children is an array of TraitTreeNode objects that provides more detailed results for the facets of each Big Five dimension as inferred from the input text. The array is returned only for Big Five dimensions.

The following example output shows snippets of the output for the Big Five, Needs, and Values characteristics. As described, only the Big Five characteristics have an array of children for their respective facets.

{
  . . .
  "personality": [
    {
      "trait_id": "big5_openness",
      "name": "Openness",
      "category": "personality",
      "percentile": 0.8011555009553,
      "raw_score": 0.77565404255038,
      "children": [
        {
          "trait_id": "facet_adventurousness",
          "name": "Adventurousness",
          "category": "personality",
          "percentile": 0.89755869047319,
          "raw_score": 0.54990704031219
        },
        . . .
      ]
    },
    {
      "trait_id": "big5_conscientiousness",
      "name": "Conscientiousness",
      "category": "personality",
      "percentile": 0.81001753184176,
      "raw_score": 0.66899984888815,
      "children": [
        {
          "trait_id": "facet_achievement_striving",
          "name": "Achievement striving",
          "category": "personality",
          "percentile": 0.84613299226628,
          "raw_score": 0.74240118454888
        },
        . . .
      ]
    },
    {
      "trait_id": "big5_extraversion",
      "name": "Extraversion",
      "category": "personality",
      "percentile": 0.64980796071382,
      "raw_score": 0.56817738781166,
      "children": [
        {
          "trait_id": "facet_activity_level",
          "name": "Activity level",
          "category": "personality",
          "percentile": 0.88220584913965,
          "raw_score": 0.60106995926143
        },
      ]
    },
    {
      "trait_id": "big5_agreeableness",
      "name": "Agreeableness",
      "category": "personality",
      "percentile": 0.94786124793821,
      "raw_score": 0.80677815631809,
      "children": [
        {
          "trait_id": "facet_altruism",
          "name": "Altruism",
          "category": "personality",
          "percentile": 0.99241983824205,
          "raw_score": 0.79028406290747
        },
        . . .
      ]
    },
    {
      "trait_id": "big5_neuroticism",
      "name": "Emotional range",
      "category": "personality",
      "percentile": 0.5008224041628,
      "raw_score": 0.46748200007024,
      "children": [
        {
          "trait_id": "facet_anger",
          "name": "Fiery",
          "category": "personality",
          "percentile": 0.17640022058508,
          "raw_score": 0.48490315691802
        },
        . . .
      ]
    }
  ],
  "needs": [
    {
      "trait_id": "need_challenge",
      "name": "Challenge",
      "category": "needs",
      "percentile": 0.67362332054511,
      "raw_score": 0.75196348037675
    },
    . . .
  ],
  "values": [
    {
      "trait_id": "value_conservation",
      "name": "Conservation",
      "category": "values",
      "percentile": 0.89268222856139,
      "raw_score": 0.72135308187423
    },
    . . .
  ],
  . . .
}

Behavioral output

If the input to the service is JSON that has timestamps for the individual content items, the Profile object includes a behavior field. The field includes a BehaviorNode object for each day of the week and hour of the day.

  • trait_id (string) is the unique ID of the characteristic to which the results pertain:

    • behavior_day for days of the week (for example, behavior_sunday).
    • behavior_hour for hours of the day (for example, behavior_0000).
  • name (string) is the user-visible name of the characteristic.

  • category (string) is the category of the characteristic, which is always behavior.

  • percentage (double) is the percentage of content items that occurred during that day of the week or hour of the day. For more information, see Percentages for behavioral characteristics.

The following output shows snippets of the behavioral output for temporal characteristics.

{
  . . .
  "behavior": [
    {
      "trait_id": "behavior_sunday",
      "name": "Sunday",
      "category": "behavior",
      "percentage": 0.21392532795156
    },
    {
      "trait_id": "behavior_monday",
      "name": "Monday",
      "category": "behavior",
      "percentage": 0.42583249243189
    },
    . . .
    {
      "trait_id": "behavior_saturday",
      "name": "Saturday",
      "category": "behavior",
      "percentage": 0.077699293642785
    },
    {
      "trait_id": "behavior_0000",
      "name": "0:00 am",
      "category": "behavior",
      "percentage": 0.4561049445005
    },
    {
      "trait_id": "behavior_0100",
      "name": "1:00 am",
      "category": "behavior",
      "percentage": 0.12209889001009
    },
    . . .
    {
      "trait_id": "behavior_2300",
      "name": "11:00 pm",
      "category": "behavior",
      "percentage": 0.12310797174571
    }
  ],
  . . .
}

Consumption preferences output

If the consumption_preferences query parameter is set to true, the Profile object includes a consumption_preferences field. The field includes a ConsumptionPreferencesCategoryNode object for each category of preferences.

  • consumption_preference_category_id (string) is the unique ID of the consumption preferences category to which the results pertain in the form consumption_preferences_category.
  • name (string) is the user-visible name of the consumption preferences category.
  • consumption_preferences is an array of ConsumptionPreferencesNode objects that provides results for the individual preferences of the category.

Each individual preference for a category is described via a ConsumptionPreferencesNode object. Some categories have only a single preference, others have many more.

  • consumption_preference_id (string) is the unique ID of the consumption preference to which the results pertain in the form consumption_preferences_preference.
  • name (string) is the user-visible name of the consumption preference.
  • score (double) is a score that indicates the author's likelihood of preferring the item. For more information, see Scores for consumption preferences.

The following output shows snippets of the output for consumption preferences.

{
  . . .
  "consumption_preferences": [
    {
      "consumption_preference_category_id": "consumption_preferences_shopping",
      "name": "Purchasing Preferences",
      "consumption_preferences": [
        {
          "consumption_preference_id": "consumption_preferences_automobile_ownership_cost",
          "name": "Prefers automobile ownership cost",
          "score": 0
        },
        . . .
      ]
    },
    {
      "consumption_preference_category_id": "consumption_preferences_health_and_activity",
      "name": "Health & Activity Preferences",
      "consumption_preferences": [
        {
          "consumption_preference_id": "consumption_preferences_eat_out",
          "name": "Prefers to eat out",
          "score": 1
        },
        . . .
      ]
    },
    . . .
    {
      "consumption_preference_category_id": "consumption_preferences_volunteering",
      "name": "Volunteering Preferences",
      "consumption_preferences": [
        {
          "consumption_preference_id": "consumption_preferences_volunteer",
          "name": "Have volunteering experience",
          "score": 0
        }
      ]
    }
  ],
  . . .
}

CSV response content

The service returns the results of its analysis in CSV format when you specify text/csv with the Accept header of the request. CSV output provides information that is similar to the information provided by JSON output. As with JSON, the information in the CSV output depends on whether the input represents timestamped data and whether the user requests raw scores and consumption preferences.

Unlike JSON, however, CSV output is returned as a fixed number of columns. The first row of the output consists of optional column labels, which are included only if you set the csv_headers query parameter of the request to true. The second row of the output, which is always present, contains the results of the analysis.

The following sections list and briefly describe all columns of the CSV output in the exact order in which they appear in the results. To make the information easier to follow, the tables describe the columns by logical grouping, including the number of columns in each group and their optional labels. Other than the word count, all numeric data are returned as double values.

Basic characteristics and metadata

The following columns are always present in the CSV output for all requests.

Table 1. CSV columns for basic characteristics and metadata
Grouping
(number of columns)
Optional labels Description
Big Five Agreeableness
percentiles
(7 columns)
big5_agreeableness
facet_altruism
facet_cooperation
facet_modesty
facet_morality
facet_sympathy
facet_trust
Normalized percentile score for the author of the text with respect to the named dimension or facet.
Big Five Conscientiousness
percentiles
(7 columns)
big5_conscientiousness
facet_achievement_striving
facet_cautiousness
facet_dutifulness
facet_orderliness
facet_self_discipline
facet_self_efficacy
Normalized percentile score for the author of the text with respect to the named dimension or facet.
Big Five Extraversion
percentiles
(7 columns)
big5_extraversion
facet_activity_level
facet_assertiveness
facet_cheerfulness
facet_excitement_seeking
facet_friendliness
facet_gregariousness
Normalized percentile score for the author of the text with respect to the named dimension or facet.
Big Five Emotional range
percentiles
(7 columns)
big5_neuroticism
facet_anger
facet_anxiety
facet_depression
facet_immoderation
facet_self_consciousness
facet_vulnerability
Normalized percentile score for the author of the text with respect to the named dimension or facet.
Big Five Openness
percentiles
(7 columns)
big5_openness
facet_adventurousness
facet_artistic_interests
facet_emotionality
facet_imagination
facet_intellect
facet_liberalism
Normalized percentile score for the author of the text with respect to the named dimension or facet.
Needs percentiles
(12 columns)
need_liberty
need_ideal
need_love
need_practicality
need_self_expression
need_stability
need_structure
need_challenge
need_closeness
need_curiosity
need_excitement
need_harmony
Normalized percentile score for the author of the text with respect to the named need.
Values percentiles
(5 columns)
value_conservation
value_hedonism
value_openness_to_change
value_self_enhancement
value_self_transcendence
Normalized percentile score for the author of the text with respect to the named value.
Days of the week
percentages
(7 columns)
behavior_sunday
behavior_monday
behavior_tuesday
behavior_wednesday
behavior_thursday
behavior_friday
behavior_saturday
If the input text is timestamped, the percentage of the input associated with each day of the week; if the input is not timestamped, the percentages are all 0.0.
Hours of the day
percentages
(24 columns)
behavior_0000
through
behavior_2300
If the input text is timestamped, the percentage of the input associated with each hour of the day; if the input is not timestamped, the percentages are all 0.0.
Word count and
language
(2 columns)
word_count
processed_language
An integer that indicates the number of words present in the input text, and a two-letter identifier for the language model that the service used to analyze the text.

Raw scores

The following columns are present only if you request raw scores by setting the raw_scores query parameter to true.

Table 2. CSV columns for raw scores
Grouping
(number of columns)
Optional labels Description
Big Five Agreeableness
raw scores
(7 columns)
big5_agreeableness_raw
facet_altruism_raw
facet_cooperation_raw
facet_modesty_raw
facet_morality_raw
facet_sympathy_raw
facet_trust_raw
Raw score for the author of the text with respect to the named dimension or facet.
Big Five Conscientiousness
raw scores
(7 columns)
big5_conscientiousness_raw
facet_achievement_striving_raw
facet_cautiousness_raw
facet_dutifulness_raw
facet_orderliness_raw
facet_self_discipline_raw
facet_self_efficacy_raw
Raw score for the author of the text with respect to the named dimension or facet.
Big Five Extraversion
raw scores
(7 columns)
big5_extraversion_raw
facet_activity_level_raw
facet_assertiveness_raw
facet_cheerfulness_raw
facet_excitement_seeking_raw
facet_friendliness_raw
facet_gregariousness_raw
Raw score for the author of the text with respect to the named dimension or facet.
Big Five Emotional range
raw scores
(7 columns)
big5_neuroticism_raw
facet_anger_raw
facet_anxiety_raw
facet_depression_raw
facet_immoderation_raw
facet_self_consciousness_raw
facet_vulnerability_raw
Raw score for the author of the text with respect to the named dimension or facet.
Big Five Openness
raw scores
(7 columns)
big5_openness_raw
facet_adventurousness_raw
facet_artistic_interests_raw
facet_emotionality_raw
facet_imagination_raw
facet_intellect_raw
facet_liberalism_raw
Raw score for the author of the text with respect to the dimension or facet.
Needs raw scores
(12 columns)
need_liberty_raw
need_ideal_raw
need_love_raw
need_practicality_raw
need_self_expression_raw
need_stability_raw
need_structure_raw
need_challenge_raw
need_closeness_raw
need_curiosity_raw
need_excitement_raw
need_harmony_raw
Raw score for the author of the text with respect to the named need.
Values raw scores
(5 columns)
value_conservation_raw
value_hedonism_raw
value_openness_to_change_raw
value_self_enhancement_raw
value_self_transcendence_raw
Raw score for the author of the text with respect to the named value.

Consumption preferences

The following columns are present only if you request consumption preferences by setting the consumption_preferences query parameter to true. In all cases, the column reports the likelihood that the author of the text prefers the named consumption topic.

Table 3. CSV columns for consumption preferences
Grouping
(number of columns)
Optional labels
Purchasing preferences
category scores
(12 columns)
consumption_preferences_spur_of_moment
consumption_preferences_credit_card_payment
consumption_preferences_influence_brand_name
consumption_preferences_influence_utility
consumption_preferences_online_ads
consumption_preferences_social_media
consumption_preferences_family_members
consumption_preferences_clothes_quality
consumption_preferences_clothes_style
consumption_preferences_clothes_comfort
consumption_preferences_automobile_ownership_cost
consumption_preferences_automobile_safety
Music preferences
category scores
(9 columns)
consumption_preferences_music_rap
consumption_preferences_music_country
consumption_preferences_music_r_b
consumption_preferences_music_hip_hop
consumption_preferences_music_live_event
consumption_preferences_music_playing
consumption_preferences_music_latin
consumption_preferences_music_rock
consumption_preferences_music_classical
Health and activity preferences
category scores
(3 columns)
consumption_preferences_gym_membership
consumption_preferences_outdoor
consumption_preferences_eat_out
Movie preferences
category scores
(10 columns)
consumption_preferences_movie_romance
consumption_preferences_movie_adventure
consumption_preferences_movie_horror
consumption_preferences_movie_musical
consumption_preferences_movie_historical
consumption_preferences_movie_science_fiction
consumption_preferences_movie_war
consumption_preferences_movie_drama
consumption_preferences_movie_action
consumption_preferences_movie_documentary
Reading preferences
category scores
(5 columns)
consumption_preferences_read_frequency
consumption_preferences_books_entertainment_magazines
consumption_preferences_books_non_fiction
consumption_preferences_books_financial_investing
consumption_preferences_books_autobiographies
Volunteering preferences
category scores
(1 columns)
consumption_preferences_volunteer
Environmental concern preferences
category scores
(1 column)
consumption_preferences_concerned_environment
Entrepreneurship preferences
category scores
(1 column)
consumption_preferences_start_business

Interpreting the numeric results

The Personality Insights service returns numeric results for each of the personality and behavioral characteristics and for each consumption preference. The values differ in the information they provide, as described in the following sections.

Note: For Arabic input, the service is unable to produce meaningful percentiles and raw scores for a number of personality characteristics. For more information, see Limitations for Arabic input.

Percentiles for personality characteristics

For each request, the service always reports a normalized score as a percentile for each Big Five, Values, and Needs personality characteristic. Normalized scores represent a percentile ranking for each characteristic based on qualities inferred from the input text. The service computes normalized scores by comparing the raw score for the author's text with results from a sample population. The service reports each percentile as a double in the range of 0 to 1.

For example, a percentile of 0.64980796071382 for the personality characteristic big5_extraversion indicates that the author of the text scored in the 65th percentile for that characteristic. The author's writing exhibits the tendency to an extent that is greater than 64 percent and less than 34 percent of the sample population. The precision of the percentile depends on the number of words that were submitted as input with the request; for more information, see Guidelines for providing sufficient input.

Note: No mathematical relationship exists between the percentiles reported for Big Five dimensions and facets. The service calculates the normalized percentile for each dimension and facet independently based on correlations between survey participants' scores for that dimension or facet and the words they use. Therefore, even though facets provide finer-grained descriptions of dimensions, adding the scores for the six facets of a dimension does not necessarily yield the percentile for that dimension. The same is true of raw scores.

Raw scores for personality characteristics

If you specify true for the raw_scores query parameter of the request, the service reports a raw_score for each personality characteristic. Raw scores represent the score for the specific characteristic based solely on the author's text and the model for that characteristic, without comparing the results to a sample population. Raw scores can be interpreted as the scores the author would receive from taking a personality test.

The service reports each raw score as a double in the range of 0 to 1. A higher score generally indicates a greater likelihood that the author has that characteristic. However, raw scores must be considered in aggregate: The range of values in practice might be much smaller than 0 to 1, so an individual score must be considered in the context of the overall scores and their range. But in general, a raw score, for instance, of 0.56817738781166 for the personality characteristic big5_extraversion indicates that the author would likely have achieved this score on a personality test. Compare this raw score with the normalized percentile reported for the same author and characteristic in the previous section.

The service makes raw scores available for users who want to apply a custom normalization for a specific scenario or who do not require a comparison with a sample population. Users who want to know how the author's characteristics compare with a large sample population can use the normalized scores. Users who want to derive their own normalized percentile scores from the raw data can compare the raw scores against a different sample population and apply a different approach to normalization.

To normalize a raw score to a percentile for a specific characteristic, compare the raw score with a sample population for which the mean and the standard deviation for the characteristic are known. For example, IBM conducted studies to gather data from a large sample population of Twitter users. IBM computed the users' scores for each of the personality characteristics and then established the mean and standard deviation for each characteristic. To compute the percentile score for a raw score inferred from its analysis of input text, the service uses the mean and standard deviation derived from its sample Twitter population for that characteristic.

Percentages for behavioral characteristics

If you submit JSON data whose content items have timestamps, the service reports a percentage for each behavioral characteristic. Behavioral characteristics identify the temporal distribution of the input. The percentage indicates how many of the content items occurred during each day of the week and time of day. For example, a percentage of 0.4561049445005 for the behavioral characteristic behavior_0000 means that roughly 46 percent of the content items were created between the hours of midnight and 1:00 a.m.

Scores for consumption preferences

If you specify true for the consumption_preferences query parameter of the request, the service reports consumption preferences that include a score for each preference. The service derives the score from the personality characteristics that it infers from the input text. The score is a double that indicates how likely the author of the text is to prefer the item. It is an indication of preference, not a normalized percentage.

For some preferences, the score is one of the following three values:

  • 0.0: The author is very unlikely to prefer the item. For some preferences, you can interpret the value as meaning that the author has a very low level of interest.
  • 0.5: The author is neutral with respect to the item. For some preferences, the value can mean that the author has a medium level of interest.
  • 1.0: The author is very likely to prefer the item. For some preferences, the value indicates a high level of interest.

For other preferences, the score represents a binary value. The author of the input text is either unlikely (0.0) or likely (1.0) to have an interest in the item, or the author has either a low or high level of interest. In some cases, the score can represent a simple yes or no response (for instance, is the author likely to have experience volunteering for social causes).

For a complete list of all preferences by category and the range of their results, see Consumption preferences.

Limitations for Arabic input

For Arabic input, the service's models are unable to produce meaningful results for a subset of personality characteristics. For the following characteristics, normalized percentile scores are always 0.5 and raw scores are always the mean of the original distribution. Do not rely on the results for these characteristics as part of the personality profile of the author.

  • Big Five dimensions: Emotional range

  • Big Five facets:

    • Agreeableness: Altruism, Cooperation, Modesty, and Trust
    • Conscientiousness: Achievement striving and Dutifulness
    • Extraversion: Cheerfulness and Friendliness (Outgoing)
    • Emotional range: Anger (Fiery), Prone to worry, Immoderation, and Self-consciousness
    • Openness: Adventurousness, Imagination, and Intellect
  • Needs: Ideal, Liberty, Love, Practicality, and Structure

  • Values: Self-enhancement