z/TPF - Group home

Handling Unicode data on z/TPF

  

The XML, JSON, and DFDL parsers have been updated by PJ43545 and PJ43302 to support Unicode data.  Previously, XML documents had to be translated to either EBCDIC 500 or 1047 code pages to use tpf_doc* APIs.  Many Unicode characters, however, have no mapping in these EBCDIC code pages.  For JSON, Unicode characters could be read from a JSON document but there was no method for adding Unicode data when creating a JSON document.  For DFDL, only EBCDIC 1047 strings were supported.

 

With PJ43545, Unicode UTF-EBCDIC encoded XML documents are supported by the tpf_doc* APIs.  UTF-EBCDIC provides a mapping for all Unicode characters that is also compatible with EBCDIC similar to how UTF-8 is compatible with ASCII.  Documents can be translated to UTF-EBCDIC through the tpf_convert_unicode_to_ebcdic API.  Unicode data can also be translated to UTF-EBCDIC to be able to add Unicode data to an XML document.

 

With PJ43302, the TYPE_RAW_TEXT data type is supported by the tpf_doc* APIs for JSON.  Unicode data can be added to a JSON document by specifying TYPE_RAW_TEXT for the XML_DATA_TYPE parameter through the various extended tpf_doc* APIs.  Since JSON documents are Unicode encoded, this allows Unicode data to be added without translation from EBCDIC.

 

Additionally with PJ43302, the DFDL "encoding" attribute is supported on z/TPF for "x-ibm-1047-s390", "ibm500", and "utf-8" encodings.  The tpf_dfdl_parseData API can be used to incorporate UTF-8 string data in creating XML or JSON infonode structures for use with the tpf_doc* APIs for either XML or JSON.