ENCODING Subcommand (SAVE TRANSLATE command)
The ENCODING subcommand specifies the character encoding for SAS, Stata, tab-delimited text, and CSV data files.
- The ENCODING subcommand is only valid with TYPE=SAS, TYPE=STATA, TYPE=TAB, and TYPE=CSV.
- The subcommand name is preceded by a slash and followed by an optional equals sign and a quoted value.
- For SAS and Stata files, the quoted value can be LOCALE or SYSTEM or one of the values in the Encodings column in Character Encoding table. For SAS 9 files, the value can also be UTF8.
- For tab-delimited text files and CSV files, the quoted value can be LOCALE, UTF8, UTF16, UTF16BE, UTF16LE, a numeric Windows code page value (for example, '1252'), or an IANA code page value (for example, 'iso8859-1' or cp1252)
- For SAS 9, tab-delimited text, and CSV files, the default is UTF8 in Unicode mode and LOCALE in code page mode. For Stata and earlier releases of SAS, the default is always LOCALE.
- The ENCODING setting also applies to the value labels file specified on the optional VALFILE subcommand for TYPE=SAS.
Example
SAVE TRANSLATE
/OUTFILE='/data/sasdata.sas7bdat'
/VALFILE='/data/saslabels.sas'
/TYPE=SAS /VERSION=7 /PLATFORM=WINDOWS
/ENCODING='Windows-1252'.
BOM Keyword
By default, files encoded in any of the UTF formats include a byte order mark (BOM). Some applications cannot interpret the byte order mark. You can use the BOM keyword to suppress the byte order mark.
- BOM=YES
- Include the byte order mark in UTF files. This option is the default.
- BOM=NO
- No not include the byte order mark in UTF files.
Character encoding values for SAS and Stata
| Character Set | Encoding |
|---|---|
| IBM® SPSS® Statistics Locale | Locale |
| Operating System Locale | System |
| Western | ISO-8859-1 |
| Western | ISO-8859-15 |
| Western | IBM850 |
| Western | Windows-1252 |
| Celtic | ISO-8859-14 |
| Greek | ISO-8859-7 |
| Greek | Windows-1253 |
| Nordic | ISO-8859-10 |
| Baltic | Windows-1257 |
| Central European | IBM852 |
| Central European | ISO-8859-2 |
| Cyrillic | IBM855 |
| Cyrillic | ISO-8859-5 |
| Cyrillic | Windows-1251 |
| Cyrillic/Russian | CP-866 |
| Chinese Simplified | GBK |
| Chinese Simplified | ISO-2022-CN |
| Chinese Traditional | Big5 |
| Chinese Traditional | EUC-TW |
| Japanese | EUC-JP |
| Japanese | ISO-2022-JP |
| Japanese | Shift-JIS |
| Korean | EUC-KR |
| Thai | Windows-874 |
| Turkish | IBM857 |
| Turkish | ISO-8859-9 |
| Arabic | Windows-1256 |
| Arabic | IBM864 |
| Hebrew | ISO-8859-8 |
| Hebrew | Windows-1255 |
| Hebrew | IBM862 |