Troubleshooting
Problem
Some XML entities are interpreted as 2 characters by the Xerces parser.
Symptom
Some XML entities are interpreted as 2 characters by the Xerces parser.
Cause
The incorrect length returned for certain characters is a limitation with Xerces parser that occurs with surrogate characters.
The XML Parser uses an internal UCS format to convert all the XML data. Nearly all of the characters fit into 2 bytes. However some use 4 bytes these are called surrogates. UCS uses surrogates to address characters outside the initial Basic Multilingual Plane.
Environment
For example:
Using IBM Transformation Extender (ITX) with 𝐀 as XML Input.
The code 𝐀 is rejected.
This code represents a mathematical bold capital A. It should be UTF-8 code f0 9d 90 80.
Diagnosing The Problem
The following error is contained with the TX XML trace log file.
Error (-1), "XMLParser: Input XML data is invalid."
SAXParseException, Error [line: 29186 column: 28] Datatype error: Type:InvalidDatatypeValueException,
Message:Value '' with length '2' exceeds maximum length facet of '1' .
Resolving The Problem
The incorrect length returned for certain characters is a limitation with Xerces parser that occurs with surrogate characters.
Was this topic helpful?
Document Information
More support for:
IBM Transformation Extender
Software version:
8.3.0.0, 8.3.0.1, 8.3.0.2, 8.3.0.3, 8.3.0.4, 8.3.0.5, 8.3.0.6, 8.4.0.0, 8.4.0.1, 8.4.0.2, 8.4.0.3, 8.4.0.4, 8.4.0.5, 8.4.1.0, 8.4.1.1, 8.4.1.2, 8.4.1.3, 9.0.0.0
Operating system(s):
AIX, HP-UX, Linux, Solaris, Windows, z/OS
Document number:
545121
Modified date:
16 June 2018
UID
swg21979084