Codesets and translation
All text that exists in the Python interpreter is represented as UTF-8. Support
for explicit conversion of the text in IBM® Open Enterprise SDK
for Python is enabled through both the built-in codecs library and the provided
EBCDIC package. Additional information about the codecs module can
be found at codecs in the Python official documentation.
- If a file or pipe is untagged, IBM Open Enterprise SDK for Python attempts to automatically determine the encoding and run the source file only in read mode. In binary mode, no attempts to determine encoding is done.
- If a file or pipe is tagged, IBM Open Enterprise SDK for Python attempts to decode it by using the tagged encoding. In binary mode, no attempts to determine encoding are made.
- If the encoding parameter is specified during the open operation, IBM Open Enterprise SDK for Python will ignore the source tagged encoding, and use the specified encoding. For more details about tagging behavior, see Tagging behaviors.
You should note that while the source file might be EBCDIC, all I/O continues to
be in UTF-8 unless explicit conversions are performed.
For more information about supported codesets, see Supported codesets. For
more information about tagging behaviors, see Tagging behaviors. For more
information about Automatic EBCDIC detection and decoding for bytes.decode(), see
Automatic EBCDIC detection and decoding for bytes.decode().
Examples
>>> f = open('./test', mode='w+', encoding='cp1047')
>>> lines = f.readlines()
>>> f.write('hello world')
>>> for line in lines:
. . . f.write(line)
>>> f.close()>>> s = "Hello World".encode("cp1047") # this converts our internally UTF-8 string into a bytes object with the ebcdic character values
>>> print(s)
b'\xc8\x85\x93\x93\x96@\xe6\x96\x99\x93\x84'