IBM Support

Some XML entities are interpreted as 2 characters by the Xerces parser.

Troubleshooting


Problem

Some XML entities are interpreted as 2 characters by the Xerces parser.

Symptom

Some XML entities are interpreted as 2 characters by the Xerces parser.

Cause

The incorrect length returned for certain characters is a limitation with Xerces parser that occurs with surrogate characters.

The XML Parser uses an internal UCS format to convert all the XML data. Nearly all of the characters fit into 2 bytes. However some use 4 bytes these are called surrogates. UCS uses surrogates to address characters outside the initial Basic Multilingual Plane.

Environment

For example:

Using IBM Transformation Extender (ITX) with &#119808 as XML Input.
The code &#119808 is rejected.
This code represents a mathematical bold capital A. It should be UTF-8 code f0 9d 90 80.

Diagnosing The Problem

The following error is contained with the TX XML trace log file.


Error (-1), "XMLParser: Input XML data is invalid."
SAXParseException, Error [line: 29186 column: 28] Datatype error: Type:InvalidDatatypeValueException,
Message:Value '' with length '2' exceeds maximum length facet of '1' .

Resolving The Problem

The incorrect length returned for certain characters is a limitation with Xerces parser that occurs with surrogate characters.

[{"Product":{"code":"SSVSD8","label":"IBM Transformation Extender"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"},{"code":"PF035","label":"z\/OS"}],"Version":"8.3.0.0;8.3.0.1;8.3.0.2;8.3.0.3;8.3.0.4;8.3.0.5;8.3.0.6;8.4.0.0;8.4.0.1;8.4.0.2;8.4.0.3;8.4.0.4;8.4.0.5;8.4.1.0;8.4.1.1;8.4.1.2;8.4.1.3;9.0.0.0","Edition":"","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

Document Information

More support for:
IBM Transformation Extender

Software version:
8.3.0.0, 8.3.0.1, 8.3.0.2, 8.3.0.3, 8.3.0.4, 8.3.0.5, 8.3.0.6, 8.4.0.0, 8.4.0.1, 8.4.0.2, 8.4.0.3, 8.4.0.4, 8.4.0.5, 8.4.1.0, 8.4.1.1, 8.4.1.2, 8.4.1.3, 9.0.0.0

Operating system(s):
AIX, HP-UX, Linux, Solaris, Windows, z/OS

Document number:
545121

Modified date:
16 June 2018

UID

swg21979084