Topic
  • 10 replies
  • Latest Post - ‏2012-10-09T21:38:34Z by HermannSW
Lakh
Lakh
22 Posts

Pinned topic dp:parse error on utf-16 encoding messages

‏2012-10-05T15:31:22Z |
Hi,

we are parsing http post decoded xml to xml using dp:parse function using below code

<!DOCTYPE xsl:stylesheet [
<!ENTITY argXML "/request/args/arg>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dp="http://www.datapower.com/extensions"
xmlns:regexp="http://exslt.org/regular-expressions"
extension-element-prefixes="dp"
>
<xsl:output omit-xml-declaration="yes" />

<xsl:template match="/">
<xsl:copy-of
select="dp:parse(regexp:replace(&argXML;, '&amp;', 'g', '&amp;amp;'))"/>
</xsl:template>

</xsl:stylesheet>

so far working fine since we got only utf-8 encoding messages.

now we are receving utf-16 encoding messages and getting failed with below error
dp:parse() error: illegal character 0xe3 at offset 0 of dp:parse
Updated on 2012-10-09T21:38:34Z at 2012-10-09T21:38:34Z by HermannSW
  • HermannSW
    HermannSW
    4725 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-05T19:10:14Z  
    The regexp:replace() indicates that you do not receive XML, but have to convert what you get.
    Please attach a UTF-16 sample file you receive producing parsing problems here, do not copy into a posting.

     
    Hermann<myXsltBlog/> <myXsltTweets/>
  • Lakh
    Lakh
    22 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-08T01:34:04Z  
    • HermannSW
    • ‏2012-10-05T19:10:14Z
    The regexp:replace() indicates that you do not receive XML, but have to convert what you get.
    Please attach a UTF-16 sample file you receive producing parsing problems here, do not copy into a posting.

     
    Hermann<myXsltBlog/> <myXsltTweets/>
    Hi HermannSW,

    I am attching sample xml which giving above same error in dp:parse function.
  • HermannSW
    HermannSW
    4725 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-08T08:10:10Z  
    • Lakh
    • ‏2012-10-08T01:34:04Z
    Hi HermannSW,

    I am attching sample xml which giving above same error in dp:parse function.
    The file you attached is Non-XML.
    It claims to be UTF-16 encoded by the xml-declaration, but it is not:
    
    $ od -Ax -tcx1 dpparse_error.xml | head 000000   <   ?   x   m   l       v   e   r   s   i   o   n   =   
    '   1 3c  3f  78  6d  6c  20  76  65  72  73  69  6f  6e  3d  27  31 000010   .   0   
    '       e   n   c   o   d   i   n   g   =   '   u   t 2e  30  27  20  65  6e  63  6f  64  69  6e  67  3d  27  75  74 000020   f   -   8   
    '   ?   >      \r  \n   <   T   E   S   T   >  \r 66  2d  38  27  3f  3e  20  0d  0a  3c  54  45  53  54  3e  0d 000030  \n   <S U B>   T   h   i   s       t   o       t   e 0a  3c  53  55  42  3e  54  68  69  73  20  74  6f  20  74  65 000040   s   t       d   p       p   a   r   s   e       o   n       u 73  74  20  64  70  20  70  61  72  73  65  20  6f  6e  20  75 $
    


    This is how your file looks like when converted to UTF-16:
    
    $ od -tcx1 d16.xml | head 0000000  \0   <  \0   ?  \0   x  \0   m  \0   l  \0      \0   v  \0   e 00  3c  00  3f  00  78  00  6d  00  6c  00  20  00  76  00  65 0000020  \0   r  \0   s  \0   i  \0   o  \0   n  \0   =  \0   
    "  \0   1 00  72  00  73  00  69  00  6f  00  6e  00  3d  00  22  00  31 0000040  \0   .  \0   0 \0   
    "  \0      \0   e  \0   n  \0   c  \0   o 00  2e  00  30  00  22  00  20  00  65  00  6e  00  63  00  6f 0000060  \0   d  \0   i  \0   n  \0   g  \0   =  \0   
    "  \0   U  \0   T 00  64  00  69  00  6e  00  67  00  3d  00  22  00  55  00  54 0000100  \0   F  \0   -  \0   1  \0   6  \0   
    "  \0   ?  \0   >  \0  \n 00  46  00  2d  00  31  00  36  00  22  00  3f  00  3e  00  0a $
    


     
    Hermann<myXsltBlog/> <myXsltTweets/>
  • Lakh
    Lakh
    22 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-08T20:00:48Z  
    • HermannSW
    • ‏2012-10-08T08:10:10Z
    The file you attached is Non-XML.
    It claims to be UTF-16 encoded by the xml-declaration, but it is not:
    <pre class="jive-pre"> $ od -Ax -tcx1 dpparse_error.xml | head 000000 < ? x m l v e r s i o n = ' 1 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 27 31 000010 . 0 ' e n c o d i n g = ' u t 2e 30 27 20 65 6e 63 6f 64 69 6e 67 3d 27 75 74 000020 f - 8 ' ? > \r \n < T E S T > \r 66 2d 38 27 3f 3e 20 0d 0a 3c 54 45 53 54 3e 0d 000030 \n <S U B> T h i s t o t e 0a 3c 53 55 42 3e 54 68 69 73 20 74 6f 20 74 65 000040 s t d p p a r s e o n u 73 74 20 64 70 20 70 61 72 73 65 20 6f 6e 20 75 $ </pre>

    This is how your file looks like when converted to UTF-16:
    <pre class="jive-pre"> $ od -tcx1 d16.xml | head 0000000 \0 < \0 ? \0 x \0 m \0 l \0 \0 v \0 e 00 3c 00 3f 00 78 00 6d 00 6c 00 20 00 76 00 65 0000020 \0 r \0 s \0 i \0 o \0 n \0 = \0 " \0 1 00 72 00 73 00 69 00 6f 00 6e 00 3d 00 22 00 31 0000040 \0 . \0 0 \0 " \0 \0 e \0 n \0 c \0 o 00 2e 00 30 00 22 00 20 00 65 00 6e 00 63 00 6f 0000060 \0 d \0 i \0 n \0 g \0 = \0 " \0 U \0 T 00 64 00 69 00 6e 00 67 00 3d 00 22 00 55 00 54 0000100 \0 F \0 - \0 1 \0 6 \0 " \0 ? \0 > \0 \n 00 46 00 2d 00 31 00 36 00 22 00 3f 00 3e 00 0a $ </pre>

     
    Hermann<myXsltBlog/> <myXsltTweets/>
    yes i took uit-8 message and changed to utf-16. i tested this and getting same error.
  • Lakh
    Lakh
    22 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-09T03:42:25Z  
    • Lakh
    • ‏2012-10-08T20:00:48Z
    yes i took uit-8 message and changed to utf-16. i tested this and getting same error.
    More clear,

    below is decoded xml from HTTP Post.

    <request>

    <url>/_b2b/servlet/com.hcsc.b2b.XmlEntryServlet</url>
    <base-url>/_b2b/servlet/com.hcsc.b2b.XmlEntryServlet</base-url>
    <args src="url" />
    <args src="body">
    <arg name="XMLREQUEST"><?xml version='1.0' encoding='utf-16'?> <TEST> This to test dp parse on utf16 </TEST></arg>
    </args>

    </request>

    i tried below xsl to parse to well form xml

    <!DOCTYPE xsl:stylesheet [
    <!ENTITY argXML "/request/args/arg>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dp="http://www.datapower.com/extensions"
    xmlns:regexp="http://exslt.org/regular-expressions"
    extension-element-prefixes="dp"
    >
    <xsl:output omit-xml-declaration="yes" />

    <xsl:template match="/">
    <xsl:copy-of
    select="dp:parse(regexp:replace(&argXML;, '&amp;', '&gt', '&amp;amp;'))"/>
    </xsl:template>

    </xsl:stylesheet>

    and getting below error
    local:///parsing.xsl: illegal character ''' at line 14 of local:///parsing.xsl
  • HermannSW
    HermannSW
    4725 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-09T18:57:03Z  
    • Lakh
    • ‏2012-10-09T03:42:25Z
    More clear,

    below is decoded xml from HTTP Post.

    <request>

    <url>/_b2b/servlet/com.hcsc.b2b.XmlEntryServlet</url>
    <base-url>/_b2b/servlet/com.hcsc.b2b.XmlEntryServlet</base-url>
    <args src="url" />
    <args src="body">
    <arg name="XMLREQUEST"><?xml version='1.0' encoding='utf-16'?> <TEST> This to test dp parse on utf16 </TEST></arg>
    </args>

    </request>

    i tried below xsl to parse to well form xml

    <!DOCTYPE xsl:stylesheet [
    <!ENTITY argXML "/request/args/arg>
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dp="http://www.datapower.com/extensions"
    xmlns:regexp="http://exslt.org/regular-expressions"
    extension-element-prefixes="dp"
    >
    <xsl:output omit-xml-declaration="yes" />

    <xsl:template match="/">
    <xsl:copy-of
    select="dp:parse(regexp:replace(&argXML;, '&amp;', '&gt', '&amp;amp;'))"/>
    </xsl:template>

    </xsl:stylesheet>

    and getting below error
    local:///parsing.xsl: illegal character ''' at line 14 of local:///parsing.xsl
    OK, now I can see where your problem might come from.

    After the convert-http output gets parsed in your stylesheet, it claims to be UTF-16 encoded.
    But it is not at that point anymore, since DataPower internal encoding is UTF-8, see slide 11 of this WSTE webcast:
    http://www-01.ibm.com/support/docview.wss?uid=swg27022977

    Slide 14 of this webcast does show how to parse in that situation:
    http://www-01.ibm.com/support/docview.wss?uid=swg27022979

    In your stylesheet this should work:
    ...
    <xsl:copy-of
    select="dp:parse(substring-after(regexp:replace(&argXML;, '&amp;', '&gt', '&amp;amp;'),'?>'))"/>
    ...
    


     
    Hermann <myXsltBlog/> <myXsltTweets/>
    Updated on 2014-03-25T02:48:04Z at 2014-03-25T02:48:04Z by iron-man
  • Lakh
    Lakh
    22 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-09T19:14:39Z  
    • HermannSW
    • ‏2012-10-09T18:57:03Z
    OK, now I can see where your problem might come from.

    After the convert-http output gets parsed in your stylesheet, it claims to be UTF-16 encoded.
    But it is not at that point anymore, since DataPower internal encoding is UTF-8, see slide 11 of this WSTE webcast:
    http://www-01.ibm.com/support/docview.wss?uid=swg27022977

    Slide 14 of this webcast does show how to parse in that situation:
    http://www-01.ibm.com/support/docview.wss?uid=swg27022979

    In your stylesheet this should work:
    <pre class="java dw" data-editor-lang="java" data-pbcklang="java" dir="ltr">... <xsl:copy-of select="dp:parse(substring-after(regexp:replace(&argXML;, '&amp;', '&gt', '&amp;amp;'),'?>'))"/> ... </pre>

     
    Hermann <myXsltBlog/> <myXsltTweets/>
    Hi HermannSW,

    still getting same error
    local:///parsing.xsl: illegal character ''' at line 14 of local:///parsing.xsl
  • HermannSW
    HermannSW
    4725 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-09T19:37:52Z  
    • Lakh
    • ‏2012-10-09T19:14:39Z
    Hi HermannSW,

    still getting same error
    local:///parsing.xsl: illegal character ''' at line 14 of local:///parsing.xsl
    Please
    • take a packet capture of a transaction
    • do a "Follow TCP stream" in Wireshark
    • and show us the data sent to DataPower in hexadecimal format.

     
    Hermann<myXsltBlog/> <myXsltTweets/>
  • Lakh
    Lakh
    22 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-09T21:11:37Z  
    • HermannSW
    • ‏2012-10-09T19:37:52Z
    Please
    • take a packet capture of a transaction
    • do a "Follow TCP stream" in Wireshark
    • and show us the data sent to DataPower in hexadecimal format.

     
    Hermann<myXsltBlog/> <myXsltTweets/>
    0000 02 01 d7 c5 02 04 00 a0 8e c0 8e 51 81 00 06 b6 ........ ...Q....
    0010 08 00 45 00 00 bc d3 da 40 00 79 06 80 0e 0a 4c ..E..... @.y....L
    0020 38 4a 0a 86 60 37 0a df 00 50 91 b4 ae fa 41 55 8J..`7.. .P....AU
    0030 f3 48 50 18 ff ff 0a 3d 00 00 58 4d 4c 52 45 51 .HP....= ..XMLREQ
    0040 55 45 53 54 3d 25 33 43 25 33 46 78 6d 6c 2b 76 UEST=%3C %3Fxml+v
    0050 65 72 73 69 6f 6e 25 33 44 25 32 37 31 2e 30 25 ersion%3 D%271.0%
    0060 32 37 2b 65 6e 63 6f 64 69 6e 67 25 33 44 25 32 27+encod ing%3D%2
    0070 37 75 74 66 2d 31 36 25 32 37 25 33 46 25 33 45 7utf-16% 27%3F%3E
    0080 2b 2b 25 33 43 54 45 53 54 25 33 45 2b 25 33 43 ++%3CTES T%3E+%3C
    0090 53 55 42 25 33 45 54 68 69 73 2b 74 6f 2b 74 65 SUB%3ETh is+to+te
    00a0 73 74 2b 64 70 2b 70 61 72 73 65 2b 6f 6e 2b 75 st+dp+pa rse+on+u
    00b0 74 66 31 36 25 33 43 25 32 46 53 55 42 25 33 45 tf16%3C% 2FSUB%3E
    00c0 2b 25 33 43 25 32 46 54 45 53 54 25 33 45 01 14 +%3C%2FT EST%3E..
    00d0 00 01 00 06 76 69 70 5f 31 30 2e 31 33 34 2e 39 ....vip_ 10.134.9
    00e0 36 2e 35 35 6.55
  • HermannSW
    HermannSW
    4725 Posts

    Re: dp:parse error on utf-16 encoding messages

    ‏2012-10-09T21:38:34Z  
    • Lakh
    • ‏2012-10-09T21:11:37Z
    0000 02 01 d7 c5 02 04 00 a0 8e c0 8e 51 81 00 06 b6 ........ ...Q....
    0010 08 00 45 00 00 bc d3 da 40 00 79 06 80 0e 0a 4c ..E..... @.y....L
    0020 38 4a 0a 86 60 37 0a df 00 50 91 b4 ae fa 41 55 8J..`7.. .P....AU
    0030 f3 48 50 18 ff ff 0a 3d 00 00 58 4d 4c 52 45 51 .HP....= ..XMLREQ
    0040 55 45 53 54 3d 25 33 43 25 33 46 78 6d 6c 2b 76 UEST=%3C %3Fxml+v
    0050 65 72 73 69 6f 6e 25 33 44 25 32 37 31 2e 30 25 ersion%3 D%271.0%
    0060 32 37 2b 65 6e 63 6f 64 69 6e 67 25 33 44 25 32 27+encod ing%3D%2
    0070 37 75 74 66 2d 31 36 25 32 37 25 33 46 25 33 45 7utf-16% 27%3F%3E
    0080 2b 2b 25 33 43 54 45 53 54 25 33 45 2b 25 33 43 ++%3CTES T%3E+%3C
    0090 53 55 42 25 33 45 54 68 69 73 2b 74 6f 2b 74 65 SUB%3ETh is+to+te
    00a0 73 74 2b 64 70 2b 70 61 72 73 65 2b 6f 6e 2b 75 st+dp+pa rse+on+u
    00b0 74 66 31 36 25 33 43 25 32 46 53 55 42 25 33 45 tf16%3C% 2FSUB%3E
    00c0 2b 25 33 43 25 32 46 54 45 53 54 25 33 45 01 14 +%3C%2FT EST%3E..
    00d0 00 01 00 06 76 69 70 5f 31 30 2e 31 33 34 2e 39 ....vip_ 10.134.9
    00e0 36 2e 35 35 6.55
    Sorry, the file posted to DataPower
    • claims to be UTF-16 encoded
    • but it is not UTF-16 encoded.

    Your packet capture proves that, see my previous posting
    https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14895603#14894778

    on how UTF-16 encoded data has to look (every 2nd byte has to be "00" for encoded ASCII characters).

    If you will send correct UTF-16 encoded data it should work.

    The easiest way to generate UTF-16 encoded data is using a stylesheet toutf-16.xsl (attached):
    
    $ cat toutf16.xsl <xsl:stylesheet version=
    "1.0" xmlns:xsl=
    "http://www.w3.org/1999/XSL/Transform" > <xsl:output method=
    "xml" encoding=
    "utf-16"/>   <xsl:template match=
    "/"> <xsl:copy-of select=
    "."/> </xsl:template>   </xsl:stylesheet> $ $ echo 
    "<inp>test 1234</inp>" | coproc2 toutf16.xsl - http:
    //dp3-l3:2223 -s | od -tcx1 0000000  \0   <  \0   ?  \0   x  \0   m  \0   l  \0      \0   v  \0   e 00  3c  00  3f  00  78  00  6d  00  6c  00  20  00  76  00  65 0000020  \0   r  \0   s  \0   i  \0   o  \0   n  \0   =  \0   
    "  \0   1 00  72  00  73  00  69  00  6f  00  6e  00  3d  00  22  00  31 0000040  \0   .  \0   0  \0   
    "  \0      \0   e  \0   n  \0   c  \0   o 00  2e  00  30  00  22  00  20  00  65  00  6e  00  63  00  6f 0000060  \0   d  \0   i  \0   n  \0   g  \0   =  \0   
    "  \0   U  \0   T 00  64  00  69  00  6e  00  67  00  3d  00  22  00  55  00  54 0000100  \0   F  \0   -  \0   1  \0   6  \0   
    "  \0   ?  \0   >  \0  \n 00  46  00  2d  00  31 00  36  00  22  00  3f  00  3e  00  0a 0000120  \0   <  \0   i  \0   n  \0   p  \0   >  \0   t  \0   e  \0   s 00  3c  00  69  00  6e  00  70  00  3e  00  74  00  65  00  73 0000140  \0   t  \0      \0   1  \0   2  \0   3  \0   4  \0   <  \0   /  00  74  00  20  00  31  00  32  00  33  00  34  00  3c  00  2f 0000160  \0   i  \0   n  \0   p  \0   > 00  69  00  6e  00  70  00  3e 0000170 $
    


     
    Hermann<myXsltBlog/> <myXsltTweets/>