Topic
  • 4 replies
  • Latest Post - ‏2013-07-12T17:06:32Z by HermannSW
souciance
souciance
202 Posts

Pinned topic Datapower parsing XHTML documents

‏2013-07-12T14:06:25Z |

Hello..

What is the best way to parse XHTML documents that contain DOCTYPE like 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Currently, even though I have set the response-type on the MPGW to be non-xml it fails to parse the XHTML document with the following error:

Incomplete markup or missing document element at offset 0 of http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

I simply want to access the HTTP status response header but cannot do so as it fails to parse the XHTML.

This main reason is that the server is returning a 401 together with the XHTML payload where it says unauthorized access.

 

Thanks in advance.

Moeed.

 
  • kenhygh
    kenhygh
    1528 Posts
    ACCEPTED ANSWER

    Re: Datapower parsing XHTML documents

    ‏2013-07-12T14:12:35Z  

    You shouldn't need to parse. Make sure your transform action's input context is NULL rather than INPUT

  • kenhygh
    kenhygh
    1528 Posts

    Re: Datapower parsing XHTML documents

    ‏2013-07-12T14:12:35Z  

    You shouldn't need to parse. Make sure your transform action's input context is NULL rather than INPUT

  • souciance
    souciance
    202 Posts

    Re: Datapower parsing XHTML documents

    ‏2013-07-12T14:19:56Z  
    • kenhygh
    • ‏2013-07-12T14:12:35Z

    You shouldn't need to parse. Make sure your transform action's input context is NULL rather than INPUT

    Thanks Kenhygh! You are a life saver! It worked like charm!

    Just out of curiousity, if one was interested in parts of the document, would I have to create a FFD file to parse it?

    Thanks.

    Moeed.

  • kenhygh
    kenhygh
    1528 Posts

    Re: Datapower parsing XHTML documents

    ‏2013-07-12T14:26:31Z  
    • souciance
    • ‏2013-07-12T14:19:56Z

    Thanks Kenhygh! You are a life saver! It worked like charm!

    Just out of curiousity, if one was interested in parts of the document, would I have to create a FFD file to parse it?

    Thanks.

    Moeed.

    There may be an easier way. Make sure your datapower box can access "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"

  • HermannSW
    HermannSW
    4657 Posts

    Re: Datapower parsing XHTML documents

    ‏2013-07-12T17:06:32Z  
    • souciance
    • ‏2013-07-12T14:19:56Z

    Thanks Kenhygh! You are a life saver! It worked like charm!

    Just out of curiousity, if one was interested in parts of the document, would I have to create a FFD file to parse it?

    Thanks.

    Moeed.

    > Just out of curiousity, if one was interested in parts of the document, would I have to create a FFD file to parse it?
    >

    here you go, attached stylesheet does conditionally remove "<!DOCTYPE html ....>" and returns the rest if present or all if not:

    $ coproc2 remove-doctype-html.xsl tst.html http://dp1-l3:2224

    <html>
    <body>
    test <b>bold</b> <br/>
    line2
    </body>
    </html>
    $
    $ cat tst.html
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html>
    <body>
    test <b>bold</b> <br/>
    line2
    </body>
    </html>
    $
    $ cat remove-doctype-html.xsl
    <xsl:stylesheet version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:regexp="http://exslt.org/regular-expressions"
      xmlns:dp="http://www.datapower.com/extensions"
      extension-element-prefixes="dp"
    >
      <dp:input-mapping  href="store:///pkcs7-convert-input.ffd" type="ffd"/>
      <dp:output-mapping href="store:///pkcs7-convert-input.ffd" type="ffd"/>
        

      <xsl:variable name="doctype-html" select=" '3C21444F43545950452068746D6C' "/>

      <!-- any sequence of hexadecimal bytes not containing a 0x3E byte -->
      <xsl:variable name="no3E" select="'((3[0-9A-DF])|([0-24-9A-F][0-9A-F]))*'"/>

      <xsl:variable name="from" select="concat('^',$no3E,'3E(.*)$')" />

      <xsl:variable name="to"   select=" '$4' " />


      <xsl:template match="/">
        <xsl:variable name="input64"
          select="dp:binary-encode(/object/message/node())"
        />

        <xsl:variable name="input16" select="dp:radix-convert($input64, 64, 16)"/>

        <xsl:variable name="res16">
          <xsl:choose>
            <xsl:when test="starts-with($input16, $doctype-html)">
              <xsl:value-of select="regexp:replace($input16, $from, 'g', $to)"/>
            </xsl:when>

            <xsl:otherwise>
              <xsl:value-of select="$input16"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>

        <xsl:variable name="res64" select="dp:radix-convert($res16, 16, 64)" />

        <object>
          <message>
            <xsl:value-of select="dp:binary-decode($res64)"/>
          </message>
        </object>
      </xsl:template>
     
    </xsl:stylesheet>
    $

     

    Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

     

    Attachments

    Updated on 2013-07-12T17:17:29Z at 2013-07-12T17:17:29Z by HermannSW