Using Non-ASCII characters

In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters. The following script will compile because the Python compiler has been set to UTF-8 by SPSS Modeler.

Scripting example showing Japanese characters. The node that is created has an incorrect label.

However, the resulting node will have an incorrect label.

Figure 1. Node label containing non-ASCII characters, displayed incorrectly
Node label containing non-ASCII characters, displayed incorrectly

The label is incorrect because the string literal itself has been converted to an ASCII string by Python.

Python allows Unicode string literals to be specified by adding a u character prefix before the string literal:

Scripting example showing Japanese characters. The node that is created has the correct label.

This will create a Unicode string and the label will be appear correctly.

Figure 2. Node label containing non-ASCII characters, displayed correctly
Node label containing non-ASCII characters, displayed correctly

Using Python and Unicode is a large topic which is beyond the scope of this document. Many books and online resources are available that cover this topic in great detail.