Unicode Mode (Python)

When IBM® SPSS® Statistics is in Unicode mode (controlled by the UNICODE subcommand of the SET command) the following conversions are automatically done when passing and receiving strings through the functions available with the spss module:

  • Strings received by Python from IBM SPSS Statistics are converted from UTF-8 to Python Unicode, which is UTF-16.
  • Strings passed from Python to IBM SPSS Statistics are converted from UTF-16 to UTF-8.

Note: Changing the locale and/or the unicode setting during an OMS request may result in incorrectly transcoded text.

Command Syntax Files

Special care must be taken when working in Unicode mode with command syntax files. Specifically, Python string literals used in command syntax files need to be explicitly expressed as UTF-16 strings. This is best done by using the u() function from the spssaux module (installed with IBM SPSS Statistics - Essentials for Python). The function has the following behavior:

  • If IBM SPSS Statistics is in Unicode mode, the input string is converted to UTF-16.
  • If IBM SPSS Statistics is not in Unicode mode, the input string is returned unchanged.

Note: If the string literals in a command syntax file only consist of plain roman characters (7-bit ascii), the u() function is not needed.

The following example demonstrates some of this behavior and the usage of the u() function.

set unicode on locale=english.
BEGIN PROGRAM.
import spss, spssaux
from spssaux import u
literal = "âbc"
try:
   print "literal without conversion:", literal
except:
   print "can't print literal"
try:
   print "literal converted to utf-16:", u(literal)
except:
   print "can't print literal"
END PROGRAM.

Following are the results:

literal without conversion: can't print literal 
literal converted to utf-16: âbc

Truncating Unicode Strings

When working in Unicode mode, use the truncatestring function from the spssaux module (installed with IBM SPSS Statistics - Essentials for Python) to correctly truncate a string to a specified maximum length in bytes. This is especially useful for truncating strings to be used as IBM SPSS Statistics variable names, which have a maximum allowed length of 64 bytes.

The truncatestring function takes two arguments--the string to truncate, and the maximum number of bytes, which is optional and defaults to 64. For example:

import spss, spssaux
newstring = spssaux.truncatestring(string,8)