Character Set Encoding in Syntax Files
The character set encoding of a syntax file can be either Unicode or code page encoding. A Unicode file can contain characters from many different character sets. Code page files are restricted to characters supported in a specific language or locale. For example, a code page file in a western European encoding cannot contain Japanese or Chinese characters.
Reading syntax files
To read syntax files correctly, the syntax editor needs to know the character encoding of the file.
- Files with a Unicode UTF-8 byte order mark are read as Unicode UTF-8 encoding, regardless of any encoding selection you make. This byte order mark is at the beginning of the file, but it is not displayed.
- By default, files without any encoding information are read as Unicode UTF-8 in Unicode mode or the current locale character encoding in code page mode. To override the default behavior, select Unicode (UTF-8) or Local Encoding.
- As Declared is enabled if the syntax file contains a code
page encoding identifier at the top of the file. Starting with release 23, a comment is
automatically inserted in syntax files that are saved in code page encoding. For example, the first
line in the file could be:
If you select As Declared, that encoding is used to read the file.* Encoding: en_US.windows-1252.
Saving syntax files
By default, syntax files are saved as Unicode UTF-8 in Unicode mode or the current locale character encoding in code page mode. To override the default behavior, select Unicode (UTF-8) or Local Encoding in the Save Syntax As dialog.
- If you save a new syntax file or save the file in a different encoding, a comment is inserted at the top of the file that identifies the encoding. If an encoding comment is already present, it is replaced.
- If you save a syntax file and then save it again without closing it, it is saved in the same encoding.