Input: Requesting a profile

To analyze content, you use the HTTP POST request method to call the profile method of the Personality Insights service. You can pass the service a maximum of 20 MB of content to be analyzed via the body of the request, but the service requires far less input to produce an accurate personality profile; for more information, see Guidelines for providing sufficient input.

The profile method includes parameters that let you specify the type of content to be passed to and returned by the service, as well as the language of each type of content. The service always returns a profile that provides insight into the personality characteristics of the author of the input text. You can also request that the service return raw scores not compared with a sample population and consumption preferences that indicate the types of products, services, and activities the author is likely to prefer.

The following sections describe the parameters of the profile method. For information about the results of a request, see Output: Understanding a profile. For detailed information about the profile method, see the API reference.

Specifying request and response formats

You can use the Content-Type and Accept header parameters to indicate the format of the content you are passing to the method and the format of the service's response. You can use any combination of supported formats for the request and response.

Table 1. Specifying request and response formats
Format Argument Supported by
Content-Type
Supported by
Accept
Plain text text/plain Yes (default)

The service processes plain text without modification.
No
HTML text/html Yes

The service strips tags from the content before processing it.
No
JSON application/json Yes

Content must conform to the model defined by the ContentListContainer object, which is an array of ContentItem objects.
Yes (default)

The service returns its results as a Profile object.
CSV text/csv No Yes

By default, the service returns a single row of numeric results. Set the optional csv_headers query parameter to true to request headers for each column of the output.

Specifying JSON input

To pass JSON input content, you use the ContentListContainer object. The object includes an array of ContentItem objects, each of which contains an element of the content. The only required field of the object is content, which provides the text to be analyzed. Other optional fields let you specify the following:

  • id (string) is a unique ID for the content item.
  • created (integer) is a UNIX timestamp that indicates when the content item was created.
  • updated (integer) is a UNIX timestamp that indicates when the content item was last updated.
  • contenttype (string) indicates the type of the content item, text/plain or text/html.
  • language (string) indicates the language of the content item: ar (Arabic), en (English), es (Spanish), or ja (Japanese). See Specifying request and response languages.
  • parentid (string) is the id of the content item's parent item.
  • reply (boolean) indicates whether the content item is a reply to another item.
  • forward (boolean) indicates whether the content item is a forward or copy of another item.

JSON input is well suited for content from Twitter or other social networks that consist of multiple conversations or posts. Instead of concatenating all of the author's text into a single string, you can use JSON to submit the data as it exists. This has the further advantage of letting the service know which pieces of text are related.

Specifying the character set

By default, the service uses the following character sets for input content:

  • For plain text and HTML content, the service uses the International Standards Organization (ISO)-8859-1 character set (effectively the ASCII character set) per the HTTP version 1.1 specification.
  • For JSON content, the service effectively always uses the Unicode Transformation Format (UTF)-8 character set per Section 8.1 of the International Engineering Task Force (IETF) Request for Comment (RFC) 7159.

When submitting plain text or HTML content, include the charset parameter with the Content-Type header to indicate the character encoding of the input text. The following example specifies UTF-8 character encoding:

Content-Type: text/plain;charset=utf-8

By using the charset parameter, you can avoid potential problems associated with non-ASCII or non-printable characters. If you pass UTF-8 data without specifying the character set, special characters can result in incorrect results or in HTTP 4nn or 5nn errors.

Using cURL

When using cURL, similar issues can occur if you pass the wrong type of data to the service. To preserve any UTF-8 encoding for the content, always pass the content via the --data-binary option of the curl command. If you use the --data option to pass the content as ASCII, the command can process the input, which can cause problems for data encoded in UTF-8.

For example, the following curl command correctly uses the --data-binary option to post the content of the specified filename as it exists, with no additional processing. The command uses the charset parameter with the Content-Type header. The command explicitly requests the default JSON response format.

curl -X POST --user {username}:{password}
--header "Content-Type: text/plain;charset=utf-8"
--header "Accept: application/json"
--data-binary @<filename>
"https://gateway.watsonplatform.net/personality-insights/api/v3/profile"

For additional examples of calling the service with different request and response formats, see Getting started.

Specifying request and response languages

You can use the Content-Language and Accept-Language header parameters to indicate the language of the input content and the language of the service's response. You can use any combination of supported languages for the request and response. If you do not indicate a language, the service uses its English-trained models for its analysis and English for its results. The following table lists the supported input and output languages and identifies the arguments that you use with the language-related parameters.

Table 2. Specifying request and response languages
Language Argument Supported by
Content-Langauge
Supported by
Accept-Langauge
Arabic ar Yes Yes
English en Yes Yes
Japanese ja Yes Yes
Spanish es Yes Yes
Brazilian Portuguese pt-br No Yes
French fr No Yes
German de No Yes
Italian it No Yes
Korean ko No Yes
Simplified Chinese zh-cn No Yes
Traditional Chinese zh-tw No Yes

Submit all text in the same language; do not mix multiple languages in the same request. For two-character language arguments, the service treats regional variants as their parent language; for example, it interprets en-US as en.

Specifying a language for JSON content

For plain text and HTML input, the Content-Language header is the only way to specify the language. For JSON input, you can also specify the language of each individual content item by using the language parameter of the ContentItem object. However, a language specified with the Content-Language header overrides a language specified for a content item; the service ignores content items that specify a different language.

Omit the Content-Type header to base the language solely on the specification of the content items. The service uses the most prevalent language from among the content items, which yields the best possible results. It counts the number of content items for each language and selects the language with the highest frequency. If multiple languages have the same maximum frequency, the service uses the language that reaches that value first. Again, the service ignores content items that specify a different language.

Language considerations

Consider the following when submitting input in English or Arabic:

  • For English, results are based on US cultural norms. If you analyze English text from a different culture, you might need to adjust the results accordingly.
  • For Arabic, the service can trim the amount of input text for performance reasons. At a certain threshold, the accuracy of the results for Arabic does not improve with more words. If the service trims Arabic input, it returns a warning message to inform you that it reduced the amount of input text that it used for the profile.

For information about using translated text, see Inferring personality from translated text.

Requesting raw scores

The service always returns normalized scores for each personality characteristic (Big Five dimension and facet, Need, and Value) as part of its response. The service can also report a raw_score for each characteristic if you set the raw_scores query parameter to true. Raw scores represent the scores for the characteristics based solely on the author's text and the model for that characteristic, without comparing the results to a sample population. For more information about using raw scores, see Raw scores for personality characteristics.

Requesting consumption preferences

The service always returns results for the personality models. When you set the consumption_preferences query parameter to true, the service also returns scores for a variety of consumption preferences based on the personality characteristics it infers from the input text. These results indicate the author's tendency to prefer different products, services, and activities. Businesses can use the results to better understand the author's inclinations and to personalize communications and offers for the author.

For more information about the different consumption preferences, see Consumption preferences. For information about interpreting the numeric results for a preference, see Scores for consumption preferences.

Specifying the interface version

All calls to the profile method must include the version query parameter to indicate the version of the service's API and response format that you want to use. You specify the version as a date of the form YYYY-MM-DD; for example, specify 2016-10-20 for October 20, 2016. The parameter allows the service to update its API and response format for new versions without breaking existing clients.

The date that you specify does not need to match a version of the service exactly; the service uses the version that is no later than the date you provide. If you specify a date that is earlier than the initial release of version 3, the service uses version 3 of the API. If you specify a date that is in the future or otherwise later than the most recent version, the service uses the latest version.