Back in 2010 I gave WSTE webcast WebSphere DataPower SOA Appliances and XSLT (Part 2 of 2) - Tips and Tricks.
On slide 7 of that webcast I did provide solutions on how to convert Non-XML input from legacy systems to XML by
- wrapping with a <wrapper> element
- replacing all bytes in the ASCII control character range (0x00-0x1F) by space character (0x20)
- prepending xml declaration with ISO-8859-1 encoding for easily dealing with bytes (0x80-0xFF)
Last July I did blog posting on Doing recursion right on how I had to improve my smtp blog posting because I did not follow my own "Doing recursion right" slide 5 from above webcast. Better testing would have uncovered the problem in the first place.
Coming back to this blog posting, in last Saturday's developerWorks DataPower forum posting Non-xml to XML convertion issue Vishnu identified an issue.
Again, better testing above metioned slide 7 stylesheets would have resulted in correct stylesheets in the first place.
The whole purpose of stylesheet conversion-wrapper2.xsl was to convert (arbitrary) Non-XML to XML.
While the output looks like XML it is not always, and Vishnu identified the root cause:
ampersand as well as less than characters were not escaped, and whenever they occur on input, invalid XML is the result.
The simple fix is just have the recursive stylesheet replace the 0x26 byte (ampersand) and the 0x3C byte (less than) by their escaped representations:
$ diff conversion-wrapper2.xsl conversion-wrapper2a.xsl
15a16,17
> <xsl:when test="$char='26'">26616D703B</xsl:when>
> <xsl:when test="$char='3c'">266C743B</xsl:when>
$
You can find old and new version of the stylesheet as well as the included hexBinary.ffd attached to this forum posting.
Files gen.xsl, test.table and identity.xsl are attached there in addition.
The test that should fave been made back in 2010 is just to pass test.table to the conversion (contains all bytes from 0x00-0xFF) and then check that valid XML results. Since I once had problems with "tidy -q -xml" telling me (incorrectly) that a Non-XML file was XML I just do XML validation by running file to test through identity transform.
$ xsltproc identity.xsl all.xml
<?xml version="1.0"?>
<wrapper>
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
</wrapper>
$
This test can be done by any XSLT processor (xsltproc above).
There are arbitrarily many ways to create the Non-XML sample file test.table, stylesheet gen.xsl is the (DataPower) XSLT way of doing that.
Of course it does not have to be done by a stylesheet at all, but it can:
Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>