ENCODING Subcommand (SAVE TRANSLATE command)
The ENCODING subcommand specifies the character
encoding for SAS, Stata, tab-delimited text, and CSV data files.
- The
ENCODINGsubcommand is only valid withTYPE=SAS,TYPE=STATA,TYPE=TAB, andTYPE=CSV. - The subcommand name is preceded by a slash and followed by an optional equals sign and a quoted value.
- For SAS and Stata files, the quoted value can be
LOCALEorSYSTEMor one of the values in the Encodings column in Character Encoding table. For SAS 9 files, the value can also beUTF8. - For tab-delimited text files and CSV files, the quoted value can
be
LOCALE,UTF8,UTF16,UTF16BE,UTF16LE, a numeric Windows code page value (for example, '1252'), or an IANA code page value (for example, 'iso8859-1' or cp1252) - For SAS 9, tab-delimited text, and CSV files, the default is
UTF8in Unicode mode andLOCALEin code page mode. For Stata and earlier releases of SAS, the default is alwaysLOCALE. - The
ENCODINGsetting also applies to the value labels file specified on the optionalVALFILEsubcommand forTYPE=SAS.
Example
SAVE TRANSLATE
/OUTFILE='/data/sasdata.sas7bdat'
/VALFILE='/data/saslabels.sas'
/TYPE=SAS /VERSION=7 /PLATFORM=WINDOWS
/ENCODING='Windows-1252'.BOM Keyword
By default, files encoded in
any of the UTF formats include a byte order mark (BOM). Some applications
cannot interpret the byte order mark. You can use the BOM keyword
to suppress the byte order mark.
- BOM=YES
- Include the byte order mark in UTF files. This option is the default.
- BOM=NO
- No not include the byte order mark in UTF files.
Character encoding values for SAS and Stata
| Character Set | Encoding |
|---|---|
| IBM® SPSS® Statistics Locale | Locale |
| Operating System Locale | System |
| Western | ISO-8859-1 |
| Western | ISO-8859-15 |
| Western | IBM850 |
| Western | Windows-1252 |
| Celtic | ISO-8859-14 |
| Greek | ISO-8859-7 |
| Greek | Windows-1253 |
| Nordic | ISO-8859-10 |
| Baltic | Windows-1257 |
| Central European | IBM852 |
| Central European | ISO-8859-2 |
| Cyrillic | IBM855 |
| Cyrillic | ISO-8859-5 |
| Cyrillic | Windows-1251 |
| Cyrillic/Russian | CP-866 |
| Chinese Simplified | GBK |
| Chinese Simplified | ISO-2022-CN |
| Chinese Traditional | Big5 |
| Chinese Traditional | EUC-TW |
| Japanese | EUC-JP |
| Japanese | ISO-2022-JP |
| Japanese | Shift-JIS |
| Korean | EUC-KR |
| Thai | Windows-874 |
| Turkish | IBM857 |
| Turkish | ISO-8859-9 |
| Arabic | Windows-1256 |
| Arabic | IBM864 |
| Hebrew | ISO-8859-8 |
| Hebrew | Windows-1255 |
| Hebrew | IBM862 |