ENCODING Subcommand (SAVE TRANSLATE command)

The ENCODING subcommand specifies the character encoding for SAS, Stata, tab-delimited text, and CSV data files.

The ENCODING subcommand is only valid with TYPE=SAS, TYPE=STATA, TYPE=TAB, and TYPE=CSV.
The subcommand name is preceded by a slash and followed by an optional equals sign and a quoted value.
For SAS and Stata files, the quoted value can be LOCALE or SYSTEM or one of the values in the Encodings column in Character Encoding table. For SAS 9 files, the value can also be UTF8.
For tab-delimited text files and CSV files, the quoted value can be LOCALE, UTF8, UTF16, UTF16BE, UTF16LE, a numeric Windows code page value (for example, '1252'), or an IANA code page value (for example, 'iso8859-1' or cp1252)
For SAS 9, tab-delimited text, and CSV files, the default is UTF8 in Unicode mode and LOCALE in code page mode. For Stata and earlier releases of SAS, the default is always LOCALE.
The ENCODING setting also applies to the value labels file specified on the optional VALFILE subcommand for TYPE=SAS.

Example

SAVE TRANSLATE
  /OUTFILE='/data/sasdata.sas7bdat'
  /VALFILE='/data/saslabels.sas'
  /TYPE=SAS /VERSION=7 /PLATFORM=WINDOWS
  /ENCODING='Windows-1252'.

BOM Keyword

By default, files encoded in any of the UTF formats include a byte order mark (BOM). Some applications cannot interpret the byte order mark. You can use the BOM keyword to suppress the byte order mark.

BOM=YES: Include the byte order mark in UTF files. This option is the default.

BOM=NO: No not include the byte order mark in UTF files.

Character encoding values for SAS and Stata

Table 1. Character Encoding
Character Set	Encoding
IBM® SPSS® Statistics Locale	Locale
Operating System Locale	System
Western	ISO-8859-1
Western	ISO-8859-15
Western	IBM850
Western	Windows-1252
Celtic	ISO-8859-14
Greek	ISO-8859-7
Greek	Windows-1253
Nordic	ISO-8859-10
Baltic	Windows-1257
Central European	IBM852
Central European	ISO-8859-2
Cyrillic	IBM855
Cyrillic	ISO-8859-5
Cyrillic	Windows-1251
Cyrillic/Russian	CP-866
Chinese Simplified	GBK
Chinese Simplified	ISO-2022-CN
Chinese Traditional	Big5
Chinese Traditional	EUC-TW
Japanese	EUC-JP
Japanese	ISO-2022-JP
Japanese	Shift-JIS
Korean	EUC-KR
Thai	Windows-874
Turkish	IBM857
Turkish	ISO-8859-9
Arabic	Windows-1256
Arabic	IBM864
Hebrew	ISO-8859-8
Hebrew	Windows-1255
Hebrew	IBM862