ENCODING Subcommand (SAVE TRANSLATE command)

The ENCODING subcommand specifies the character encoding for SAS, Stata, tab-delimited text, and CSV data files.

  • The ENCODING subcommand is only valid with TYPE=SAS, TYPE=STATA, TYPE=TAB, and TYPE=CSV.
  • The subcommand name is preceded by a slash and followed by an optional equals sign and a quoted value.
  • For SAS and Stata files, the quoted value can be LOCALE or SYSTEM or one of the values in the Encodings column in Character Encoding table. For SAS 9 files, the value can also be UTF8.
  • For tab-delimited text files and CSV files, the quoted value can be LOCALE, UTF8, UTF16, UTF16BE, UTF16LE, a numeric Windows code page value (for example, '1252'), or an IANA code page value (for example, 'iso8859-1' or cp1252)
  • For SAS 9, tab-delimited text, and CSV files, the default is UTF8 in Unicode mode and LOCALE in code page mode. For Stata and earlier releases of SAS, the default is always LOCALE.
  • The ENCODING setting also applies to the value labels file specified on the optional VALFILE subcommand for TYPE=SAS.

Example

SAVE TRANSLATE
  /OUTFILE='/data/sasdata.sas7bdat'
  /VALFILE='/data/saslabels.sas'
  /TYPE=SAS /VERSION=7 /PLATFORM=WINDOWS
  /ENCODING='Windows-1252'.

BOM Keyword

By default, files encoded in any of the UTF formats include a byte order mark (BOM). Some applications cannot interpret the byte order mark. You can use the BOM keyword to suppress the byte order mark.

BOM=YES
Include the byte order mark in UTF files. This option is the default.
BOM=NO
No not include the byte order mark in UTF files.

Character encoding values for SAS and Stata

Table 1. Character Encoding
Character Set Encoding
IBM® SPSS® Statistics Locale Locale
Operating System Locale System
Western ISO-8859-1
Western ISO-8859-15
Western IBM850
Western Windows-1252
Celtic ISO-8859-14
Greek ISO-8859-7
Greek Windows-1253
Nordic ISO-8859-10
Baltic Windows-1257
Central European IBM852
Central European ISO-8859-2
Cyrillic IBM855
Cyrillic ISO-8859-5
Cyrillic Windows-1251
Cyrillic/Russian CP-866
Chinese Simplified GBK
Chinese Simplified ISO-2022-CN
Chinese Traditional Big5
Chinese Traditional EUC-TW
Japanese EUC-JP
Japanese ISO-2022-JP
Japanese Shift-JIS
Korean EUC-KR
Thai Windows-874
Turkish IBM857
Turkish ISO-8859-9
Arabic Windows-1256
Arabic IBM864
Hebrew ISO-8859-8
Hebrew Windows-1255
Hebrew IBM862