Byte Order Mark
Unicode in the 16-bit UTF-16 form has no prescribed endian orientation for interchange. This requires communication processes to evaluate the endian orientation correctly. To aid in this, the character U+FEFF ZERO WIDTH NO-BREAK SPACE can be used as a Byte Order Mark (BOM). When interpreted in the incorrect endian orientation, it evaluates to U+FFFE, which is defined as NOT A CHARACTER.
Some applications, particularly on Windows systems, write a BOM character to the start of a file. In UTF-8, the BOM is the sequence of bytes EF BB BF. As a byte-oriented encoding, there are no endian issues with UTF-8, but some applications (primarily on Windows) write the BOM to the start of a UTF-8 encoded file. A system does not load the BOM code point; you can use the -bom switch to remove an initial BOM code point.
nzconvert -f utf8 -t utf8 -bom -df input_file -of output_file
- UTF16
- As input, Netezza Performance Server checks for a BOM to indicate endianness; otherwise, Netezza Performance Server interprets the input as big-endian. As output, Netezza Performance Server writes a BOM and outputs in the native endianness of the machine. When converting from UTF-16 to any other encoding, such as UTF-8, the BOM is removed.
- UTF16le
- As input, interprets the input as little-endian. As output, Netezza Performance Server outputs as little-endian without a BOM. Any BOM is treated as data and converted, such as to UTF-8.
- UTF16be
- As input, interprets all input as big-endian. As output, Netezza Performance Server converts as big-endian without a BOM. Any BOM is treated as data and converted, such as to UTF-8.