UNICODE Subcommand

SET UNICODE NO|YES controls the default behavior for determining the encoding for reading and writing data files and syntax files.

NO. Use the current locale setting to determine the encoding for reading and writing data and command syntax files. This is referred to as code page mode. The alias is OFF. For information on the current locale setting, see LOCALE Subcommand (SET command).

YES. Use Unicode encoding (UTF-8) for reading and writing data and command syntax files. This is referred to as Unicode mode. The alias is ON. This is the default.

  • You can change the UNICODE setting only when there are no open data sources.
  • The UNICODE setting persists across sessions and remains in effect until it is explicitly changed.

There are a number of important implications regarding Unicode mode and Unicode files:

  • Data and syntax files saved in Unicode encoding should not be used in releases prior to 16.0.
  • When code page data files are read in Unicode mode, the defined width of all string variables is tripled. You can use ALTER TYPE to automatically adjust the width of all string variables.
  • The GET command determines the file encoding for IBM® SPSS® Statistics data files from the file itself, regardless of the current mode setting (and defined string variable widths in code page files are tripled in Unicode mode).See the topic GET for more information.
  • For syntax files, the encoding is changed after execution of the block of commands that includes SET UNICODE. For example, if your are currently in code page mode, you must run SET UNICODE=YES separately from subsequent commands that contain Unicode characters not recognized by the local encoding in effect prior to switching to Unicode.