ENCODING Subcommand (SAVE TRANSLATE command)
The ENCODING
subcommand specifies the character
encoding for SAS, Stata, tab-delimited text, and CSV data files.
- The
ENCODING
subcommand is only valid withTYPE=SAS
,TYPE=STATA
,TYPE=TAB
, andTYPE=CSV
. - The subcommand name is preceded by a slash and followed by an optional equals sign and a quoted value.
- For SAS and Stata files, the quoted value can be
LOCALE
orSYSTEM
or one of the values in the Encodings column in Character Encoding table. For SAS 9 files, the value can also beUTF8
. - For tab-delimited text files and CSV files, the quoted value can
be
LOCALE
,UTF8
,UTF16
,UTF16BE
,UTF16LE
, a numeric Windows code page value (for example, '1252'), or an IANA code page value (for example, 'iso8859-1' or cp1252) - For SAS 9, tab-delimited text, and CSV files, the default is
UTF8
in Unicode mode andLOCALE
in code page mode. For Stata and earlier releases of SAS, the default is alwaysLOCALE
. - The
ENCODING
setting also applies to the value labels file specified on the optionalVALFILE
subcommand forTYPE=SAS
.
Example
SAVE TRANSLATE
/OUTFILE='/data/sasdata.sas7bdat'
/VALFILE='/data/saslabels.sas'
/TYPE=SAS /VERSION=7 /PLATFORM=WINDOWS
/ENCODING='Windows-1252'.
BOM Keyword
By default, files encoded in
any of the UTF formats include a byte order mark (BOM). Some applications
cannot interpret the byte order mark. You can use the BOM
keyword
to suppress the byte order mark.
- BOM=YES
- Include the byte order mark in UTF files. This option is the default.
- BOM=NO
- No not include the byte order mark in UTF files.
Character encoding values for SAS and Stata
Character Set | Encoding |
---|---|
IBM® SPSS® Statistics Locale | Locale |
Operating System Locale | System |
Western | ISO-8859-1 |
Western | ISO-8859-15 |
Western | IBM850 |
Western | Windows-1252 |
Celtic | ISO-8859-14 |
Greek | ISO-8859-7 |
Greek | Windows-1253 |
Nordic | ISO-8859-10 |
Baltic | Windows-1257 |
Central European | IBM852 |
Central European | ISO-8859-2 |
Cyrillic | IBM855 |
Cyrillic | ISO-8859-5 |
Cyrillic | Windows-1251 |
Cyrillic/Russian | CP-866 |
Chinese Simplified | GBK |
Chinese Simplified | ISO-2022-CN |
Chinese Traditional | Big5 |
Chinese Traditional | EUC-TW |
Japanese | EUC-JP |
Japanese | ISO-2022-JP |
Japanese | Shift-JIS |
Korean | EUC-KR |
Thai | Windows-874 |
Turkish | IBM857 |
Turkish | ISO-8859-9 |
Arabic | Windows-1256 |
Arabic | IBM864 |
Hebrew | ISO-8859-8 |
Hebrew | Windows-1255 |
Hebrew | IBM862 |