Unicode Mode (Python)
When IBM® SPSS® Statistics is
in Unicode mode (controlled by the UNICODE
subcommand
of the SET
command) the following conversions are
automatically done when passing and receiving strings through the
functions available with the spss
module:
- Strings received by Python from IBM SPSS Statistics are converted from UTF-8 to Python Unicode, which is UTF-16.
- Strings passed from Python to IBM SPSS Statistics are converted from UTF-16 to UTF-8.
Note: Changing the locale and/or the unicode setting during
an OMS
request may result in incorrectly transcoded
text.
Command Syntax Files
Special care must be taken when working in Unicode mode with command
syntax files. Specifically, Python string literals used in command
syntax files need to be explicitly expressed as UTF-16 strings. This
is best done by using the u()
function from the spssaux
module
(installed with IBM SPSS Statistics - Essentials for Python).
The function has the following behavior:
- If IBM SPSS Statistics is in Unicode mode, the input string is converted to UTF-16.
- If IBM SPSS Statistics is not in Unicode mode, the input string is returned unchanged.
Note: If the string literals in a command syntax file only
consist of plain roman characters (7-bit ascii), the u()
function
is not needed.
The following example demonstrates some of this behavior and the
usage of the u()
function.
set unicode on locale=english.
BEGIN PROGRAM.
import spss, spssaux
from spssaux import u
literal = "âbc"
try:
print "literal without conversion:", literal
except:
print "can't print literal"
try:
print "literal converted to utf-16:", u(literal)
except:
print "can't print literal"
END PROGRAM.
Following are the results:
literal without conversion: can't print literal
literal converted to utf-16: âbc
Truncating Unicode Strings
When working in Unicode mode, use the truncatestring
function
from the spssaux
module (installed with IBM SPSS Statistics - Essentials for Python)
to correctly truncate a string to a specified maximum length in bytes.
This is especially useful for truncating strings to be used as IBM SPSS Statistics variable
names, which have a maximum allowed length of 64 bytes.
The truncatestring
function takes two arguments--the
string to truncate, and the maximum number of bytes, which is optional
and defaults to 64. For example:
import spss, spssaux
newstring = spssaux.truncatestring(string,8)