Topic
  • 6 replies
  • Latest Post - ‏2010-11-06T17:27:51Z by HermannSW
HermannSW
HermannSW
4901 Posts

Pinned topic Efficient XSLT range lookups

‏2010-10-05T16:47:01Z |
Hello,

I got notice of an interesting scenario needing a huge amount of range lookups in XSLT
(billions per year with more than 20000 different ranges).

The ranges are
  • not distributed evenly
  • have non-uniform sizes
  • do not cover the complete number range
  • the overall covered value range if huge (more than 8 digits).

It tried two approaches of doing binary search
  • loading an XML document containing the binary search tree
  • having a binary search structure whithin xsl:choose/xsl:when.

Find below the numbers for
  • DataPower XI50 9004 appliance
  • Saxon9he on 2CPU 4core â 3GHz server

Also find attached the stylesheets which
  • generate test data
  • generate binary search trees
  • generate binary search stylesheets
  • generate linear list and stylesheet for comparison.

Each range has
  • a minimal number
  • a maximal number
  • a value which should be returned on lookup

What can be read from the measured data:
  • binary outperforms linear
  • binary stylesheets outperform binary XML searchtrees
  • measuring on DataPower box with dp:time-value() function shows 2.11/0.98 (nanoseconds) per lookup regardless on the number of lookups
  • the findings can be seen for Saxon XSLT processor, too
  • in case the XSLT processor supports document and/or stylesheet caching the lookup performance remains good for single lookups

DP XI50 9004  data          xsl                        Saxon9he     data         xsl    2CPU quad core â 3GHz lookups   lin   bin    bin       lin          lookups            lin   bin   bin     lin 4094   0.483  0.022  0.016     0.257 s       4094(10)          6.440 1.458 1.419   5.410 s 8190        1.193  0.037  0.022     1.018 s       8190(11)         19.213 1.733 1.557  16.366 s 16382       7.880  0.066  0.034     4.315 s       16382(12)        72.163 2.460 1.962  61.930 s 32766      32.202  0.126  0.062    17.856 s       32766(13)       272.620 3.693 2.622 221.292 s 117.978    5.374  3.908    62.775 ns                        1.573 0.356 0.347   1.321 ms 145.665   4.518  2.686   124.298 ns                        2.346 0.212 0.190   1.998 ms 481.016   4.029  2.075   263.399 ns                        4.405 0.150 0.120   3.780 ms 982.787   3.845  1.892   544.955 ns                        8.320 0.113 0.080   6.754 ms dp:time-value() 2.11   0.98 ns


Test data for level 3:

<ts> <t>2</t> <t>3</t> <t>4</t> ... <t>29</t> <t>30</t> </ts>


Sample output for level 3:

, 4, 4, 4, , 8, 8, 8, , 12, ... 24, , 28, 28, 28, ,


Binary search "tree" for level 3:

<es> <e min=
"15" max=
"17" val=
"16"> <e min=
"7" max=
"9" val=
"8"> <e min=
"3" max=
"5" val=
"4" /> <e min=
"11" max=
"13" val=
"12" /> </e> <e min=
"23" max=
"25" val=
"24"> <e min=
"19" max=
"21" val=
"20" /> <e min=
"27" max=
"29" val=
"28" /> </e> </e> </es>


Binary search stylesheet for level3:

<xsl:stylesheet xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform" version=
"1.0"> <xsl:template match=
"/ts"> <xsl:for-each select=
"t"> <xsl:choose> <xsl:when test=
". &lt; 15"> <xsl:choose> <xsl:when test=
". &lt; 7"> <xsl:choose> <xsl:when test=
". &lt; 3"> <xsl:text> , </xsl:text> </xsl:when> <xsl:when test=
". &gt; 5"> <xsl:text> , </xsl:text> </xsl:when> <xsl:otherwise>4 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test=
". &gt; 9"> <xsl:choose> <xsl:when test=
". &lt; 11"> <xsl:text> , </xsl:text> </xsl:when> <xsl:when test=
". &gt; 13"> <xsl:text> , </xsl:text> </xsl:when> <xsl:otherwise>12 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise>8 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test=
". &gt; 17"> <xsl:choose> <xsl:when test=
". &lt; 23"> <xsl:choose> <xsl:when test=
". &lt; 19"> <xsl:text> , </xsl:text> </xsl:when> <xsl:when test=
". &gt; 21"> <xsl:text> , </xsl:text> </xsl:when> <xsl:otherwise>20 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test=
". &gt; 25"> <xsl:choose> <xsl:when test=
". &lt; 27"> <xsl:text> , </xsl:text> </xsl:when> <xsl:when test=
". &gt; 29"> <xsl:text> , </xsl:text> </xsl:when> <xsl:otherwise>28 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise>24 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise>16 , <xsl:text /> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:template> </xsl:stylesheet>


Archive structure with comments:

range/  range/xsl/  range/xsl/gen0.xsl       generate linear search stylesheet range/xsl/gen.xsl        generate binary search stylesheet range/xsl/3.xml range/xml/  range/xml/gen0.xsl       generate file x0.xml needed by tst0.xsl range/xml/tst.xsl        xml binary search tree search range/xml/2.0/ range/xml/2.0/gen0.xsl   

this folder contains stylesheets converted range/xml/2.0/tst.xsl    to 2.0 (xsl:function instead func:function) range/xml/2.0/gen.xsl range/xml/2.0/3.xml range/xml/2.0/tst0.xsl range/xml/gen.xsl        generate file x.xml needed by tst.xsl range/xml/3.xml range/xml/tst0.xsl       xml sequential search range/gent.xsl           generate input file range/3.xml


Hermann.
Updated on 2010-11-06T17:27:51Z at 2010-11-06T17:27:51Z by HermannSW
  • arun_tcs
    arun_tcs
    144 Posts

    Re: Efficient XSLT range lookups

    ‏2010-10-06T11:24:08Z  
    Hi Herman ,

    The above topic sounds very intresting, But unfortunately iam not able to understand the given details.

    I looked into the attachment , but without understanding the problem i could not dig in further.

    Can you explain the problem and how you are trying to address it.
    Thanks,
    Arun
  • HermannSW
    HermannSW
    4901 Posts

    Re: Efficient XSLT range lookups

    ‏2010-10-06T19:39:37Z  
    • arun_tcs
    • ‏2010-10-06T11:24:08Z
    Hi Herman ,

    The above topic sounds very intresting, But unfortunately iam not able to understand the given details.

    I looked into the attachment , but without understanding the problem i could not dig in further.

    Can you explain the problem and how you are trying to address it.
    Thanks,
    Arun
    First a correction, in my previous posting the lookup times are
    2.1 nano micro seconds / 0.98 nano micro seconds
    per lookup on a 9004 with
    binary search tree / binary decision stylesheet.

    > Can you explain the problem and how you are trying to address it.
    >
    Yes.
    In addition I was asked on how to create the binary decision stylesheet.
    Below you may find stylesheet genBin.xsl which generates it from a simple list.

    So the range lookup problem is:
    • given a set of ranges {R_1, R_2, ..., R_k} with R_i=(min_i, max_i, value_i)
    • and a value x
    • decide efficiently whether x lies inside one of the ranges (x in R_i iff min_i <= x <= max_i)
    • and return its value (value_i) if so.

    For the stylesheet genBin.xsl the required input looks like this file x7.xml:
    
    <es> <e min=
    "1926" max=
    "1990" value=
    "G" /> <e min=
    "1901" max=
    "1905" value=
    "F" /> <e min=
    "1521" max=
    "1825" value=
    "E" /> <e min=
    "1001" max=
    "1010" value=
    "A" /> <e min=
    "1021" max=
    "1115" value=
    "B" /> <e min=
    "1131" max=
    "1235" value=
    "C" /> <e min=
    "1246" max=
    "1248" value=
    "D" /> </es>
    


    Now stylesheet genBin.xsl (find at bottom) generates this file x7.xsl from x7.xml:
    
    <xsl:stylesheet xmlns:xsl=
    "http://www.w3.org/1999/XSL/Transform" version=
    "1.0 
    "><func:function name="func:rangeLookup
    " xmlns:func="http:
    //exslt.org/functio ns
    "><xsl:param name="x
    "/><xsl:choose><xsl:when test="$x &lt; 1246
    "><xsl:choos e><xsl:when test=
    "$x &lt; 1021"><xsl:choose><xsl:when test=
    "$x &lt; 1001"><fu nc:result select=
    "'-'"/></xsl:when><xsl:when test=
    "$x > 1010"><func:result se lect=
    "'-'"/></xsl:when><xsl:otherwise><func:result select=
    "'A'"/></xsl:otherw ise></xsl:choose></xsl:when><xsl:when test=
    "$x > 1115"><xsl:choose><xsl:when test=
    "$x &lt; 1131"><func:result select=
    "'-'"/></xsl:when><xsl:when test=
    "$x  > 1235
    "><func:result select="
    '-'
    "/></xsl:when><xsl:otherwise><func:result sel ect=
    "'C'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:otherwise><func:resul t select=
    "'B'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:when test=
    "$x >  1248
    "><xsl:choose><xsl:when test="$x &lt; 1901
    "><xsl:choose><xsl:when test="$ x &lt; 1521
    "><func:result select="
    '-'
    "/></xsl:when><xsl:when test="$x > 1825
    " ><func:result select=
    "'-'"/></xsl:when><xsl:otherwise><func:result select=
    "'E 
    '"/></xsl:otherwise></xsl:choose></xsl:when><xsl:when test="$x > 1905"><xsl:c hoose><xsl:when test=
    "$x &lt; 1926"><func:result select=
    "'-'"/></xsl:when><xs l:when test=
    "$x > 1990"><func:result select=
    "'-'"/></xsl:when><xsl:otherwise> <func:result select=
    "'G'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:other wise><func:result select=
    "'F'"/></xsl:otherwise></xsl:choose></xsl:when><xsl: otherwise><func:result select=
    "'D'"/></xsl:otherwise></xsl:choose></func:func tion></xsl:stylesheet>
    


    This stylesheet is doing a binary search as described in the previous posting.

    Now x7.xsl can be included in another stylesheet to provide efficient range lookup:
    
    <xsl:stylesheet version=
    "1.0" xmlns:xsl=
    "http://www.w3.org/1999/XSL/Transform" xmlns:func=
    "http://exslt.org/functions" > <xsl:output omit-xml-declaration=
    "yes" />   <xsl:include href=
    "x7.xsl"/> <xsl:template match=
    "/"> <xsl:value-of select=
    "func:rangeLookup(.)"/> </xsl:template> </xsl:stylesheet>
    


    Sending input "<q>1248</q>" results in output of "D" quickly.

    Now this technique is most useful if many ranges exist.
    Here is the output of three times the same request for a sample with 25000 ranges.
    As you can see the very first request takes more than 3 seconds -- why?
    I flushed the stylesheet cache and the stylesheet needed to be compiled.
    Compile time for the 6MB stylesheet large.xsl is 2793 msec (from Stylesheet Status page).
    The 2nd and 3rd call need 8ms only, and these times include the network overhead.
    From the tables of the previous posting we know that a single lookup for 25000 ranges
    takes 0.98 microseconds.

    
    $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real    0m3.143s user    0m0.001s sys     0m0.004s $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real    0m0.008s user    0m0.001s sys     0m0.003s $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real    0m0.007s user    0m0.001s sys     0m0.002s $ xpath++ 
    "count(//xsl:otherwise)" large.xsl 25001 $
    


    And here is stylesheet genBin.xsl for generating binary decision stylesheets:
    
    <!-- genBin.xsl: - generate binary decision stylesheet - allows to provide 
    "not found" value   input: any (unsorted) XML file of the form <any1> <any2 min=
    "..." max=
    "..." value=
    "..."/> ... <any2 min=
    "..." max=
    "..." value=
    "..."/> </any1>   output: - stylesheet 
    
    for inclusion - provides 
    "func:rangeLookup($x)" --> <xsl:stylesheet version=
    "1.0" xmlns:xsl=
    "http://www.w3.org/1999/XSL/Transform" xmlns:func=
    "http://exslt.org/functions" xmlns:exslt=
    "http://exslt.org/common" exclude-result-prefixes=
    "exslt" > <xsl:output omit-xml-declaration=
    "yes" method=
    "xml" />   <func:function name=
    "func:gen"> <xsl:param name=
    "es" /> <xsl:param name=
    "nf" /> <xsl:variable name=
    "l" select=
    "count($es)"/>   <func:result> <xsl:choose> <xsl:when test=
    "$l &gt; 0"> <xsl:variable name=
    "mid" select=
    "floor(($l + 1) div 2)"/> <xsl:element name=
    "xsl:choose"> <xsl:element name=
    "xsl:when"> <xsl:attribute name=
    "test">$x < <xsl:value-of select=
    "$es[$mid]/@min" /></xsl:attribute> <xsl:copy-of select=
    "func:gen($es[position() &lt; $mid], $nf)" /> </xsl:element> <xsl:element name=
    "xsl:when"> <xsl:attribute name=
    "test">$x > <xsl:value-of select=
    "$es[$mid]/@max" /></xsl:attribute> <xsl:copy-of select=
    "func:gen($es[position() &gt; $mid], $nf)" /> </xsl:element> <xsl:element name=
    "xsl:otherwise"> <xsl:element name=
    "func:result"> <xsl:attribute name=
    "select">
    '<xsl:value-of select="$es[$mid]/@value" />'</xsl:attribute> </xsl:element> </xsl:element> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:element name=
    "func:result"> <xsl:attribute name=
    "select">
    '<xsl:value-of select="$nf"/>'</xsl:attribute> </xsl:element> </xsl:otherwise> </xsl:choose> </func:result> </func:function>     <xsl:template match=
    "/*"> <xsl:variable name=
    "notFound" select=
    "'-'"/>   <xsl:variable name=
    "sorted"> <xsl:for-each select=
    "*"> <xsl:sort select=
    "@min" data-type=
    "number"/> <xsl:copy-of select=
    "."/> </xsl:for-each> </xsl:variable>   <xsl:element name=
    "xsl:stylesheet"> <xsl:attribute name=
    "version">1.0</xsl:attribute>   <func:function name=
    "func:rangeLookup"> <xsl:element name=
    "xsl:param"> <xsl:attribute name=
    "name">x</xsl:attribute> </xsl:element>   <xsl:copy-of select=
    "func:gen(exslt:node-set($sorted)/*,$notFound)" /> </func:function> </xsl:element>   </xsl:template> </xsl:stylesheet>
    


    Efficient range lookup!

    Hermann.
  • HermannSW
    HermannSW
    4901 Posts

    Re: Efficient XSLT range lookups

    ‏2010-10-07T08:52:33Z  
    • HermannSW
    • ‏2010-10-06T19:39:37Z
    First a correction, in my previous posting the lookup times are
    2.1 nano micro seconds / 0.98 nano micro seconds
    per lookup on a 9004 with
    binary search tree / binary decision stylesheet.

    > Can you explain the problem and how you are trying to address it.
    >
    Yes.
    In addition I was asked on how to create the binary decision stylesheet.
    Below you may find stylesheet genBin.xsl which generates it from a simple list.

    So the range lookup problem is:
    • given a set of ranges {R_1, R_2, ..., R_k} with R_i=(min_i, max_i, value_i)
    • and a value x
    • decide efficiently whether x lies inside one of the ranges (x in R_i iff min_i <= x <= max_i)
    • and return its value (value_i) if so.

    For the stylesheet genBin.xsl the required input looks like this file x7.xml:
    <pre class="jive-pre"> <es> <e min= "1926" max= "1990" value= "G" /> <e min= "1901" max= "1905" value= "F" /> <e min= "1521" max= "1825" value= "E" /> <e min= "1001" max= "1010" value= "A" /> <e min= "1021" max= "1115" value= "B" /> <e min= "1131" max= "1235" value= "C" /> <e min= "1246" max= "1248" value= "D" /> </es> </pre>

    Now stylesheet genBin.xsl (find at bottom) generates this file x7.xsl from x7.xml:
    <pre class="jive-pre"> <xsl:stylesheet xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" version= "1.0 "><func:function name="func:rangeLookup " xmlns:func="http: //exslt.org/functio ns "><xsl:param name="x "/><xsl:choose><xsl:when test="$x &lt; 1246 "><xsl:choos e><xsl:when test= "$x &lt; 1021"><xsl:choose><xsl:when test= "$x &lt; 1001"><fu nc:result select= "'-'"/></xsl:when><xsl:when test= "$x > 1010"><func:result se lect= "'-'"/></xsl:when><xsl:otherwise><func:result select= "'A'"/></xsl:otherw ise></xsl:choose></xsl:when><xsl:when test= "$x > 1115"><xsl:choose><xsl:when test= "$x &lt; 1131"><func:result select= "'-'"/></xsl:when><xsl:when test= "$x > 1235 "><func:result select=" '-' "/></xsl:when><xsl:otherwise><func:result sel ect= "'C'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:otherwise><func:resul t select= "'B'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:when test= "$x > 1248 "><xsl:choose><xsl:when test="$x &lt; 1901 "><xsl:choose><xsl:when test="$ x &lt; 1521 "><func:result select=" '-' "/></xsl:when><xsl:when test="$x > 1825 " ><func:result select= "'-'"/></xsl:when><xsl:otherwise><func:result select= "'E '"/></xsl:otherwise></xsl:choose></xsl:when><xsl:when test="$x > 1905"><xsl:c hoose><xsl:when test= "$x &lt; 1926"><func:result select= "'-'"/></xsl:when><xs l:when test= "$x > 1990"><func:result select= "'-'"/></xsl:when><xsl:otherwise> <func:result select= "'G'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:other wise><func:result select= "'F'"/></xsl:otherwise></xsl:choose></xsl:when><xsl: otherwise><func:result select= "'D'"/></xsl:otherwise></xsl:choose></func:func tion></xsl:stylesheet> </pre>

    This stylesheet is doing a binary search as described in the previous posting.

    Now x7.xsl can be included in another stylesheet to provide efficient range lookup:
    <pre class="jive-pre"> <xsl:stylesheet version= "1.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" xmlns:func= "http://exslt.org/functions" > <xsl:output omit-xml-declaration= "yes" /> <xsl:include href= "x7.xsl"/> <xsl:template match= "/"> <xsl:value-of select= "func:rangeLookup(.)"/> </xsl:template> </xsl:stylesheet> </pre>

    Sending input "<q>1248</q>" results in output of "D" quickly.

    Now this technique is most useful if many ranges exist.
    Here is the output of three times the same request for a sample with 25000 ranges.
    As you can see the very first request takes more than 3 seconds -- why?
    I flushed the stylesheet cache and the stylesheet needed to be compiled.
    Compile time for the 6MB stylesheet large.xsl is 2793 msec (from Stylesheet Status page).
    The 2nd and 3rd call need 8ms only, and these times include the network overhead.
    From the tables of the previous posting we know that a single lookup for 25000 ranges
    takes 0.98 microseconds.

    <pre class="jive-pre"> $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real 0m3.143s user 0m0.001s sys 0m0.004s $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real 0m0.008s user 0m0.001s sys 0m0.003s $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real 0m0.007s user 0m0.001s sys 0m0.002s $ xpath++ "count(//xsl:otherwise)" large.xsl 25001 $ </pre>

    And here is stylesheet genBin.xsl for generating binary decision stylesheets:
    <pre class="jive-pre"> <!-- genBin.xsl: - generate binary decision stylesheet - allows to provide "not found" value input: any (unsorted) XML file of the form <any1> <any2 min= "..." max= "..." value= "..."/> ... <any2 min= "..." max= "..." value= "..."/> </any1> output: - stylesheet for inclusion - provides "func:rangeLookup($x)" --> <xsl:stylesheet version= "1.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" xmlns:func= "http://exslt.org/functions" xmlns:exslt= "http://exslt.org/common" exclude-result-prefixes= "exslt" > <xsl:output omit-xml-declaration= "yes" method= "xml" /> <func:function name= "func:gen"> <xsl:param name= "es" /> <xsl:param name= "nf" /> <xsl:variable name= "l" select= "count($es)"/> <func:result> <xsl:choose> <xsl:when test= "$l &gt; 0"> <xsl:variable name= "mid" select= "floor(($l + 1) div 2)"/> <xsl:element name= "xsl:choose"> <xsl:element name= "xsl:when"> <xsl:attribute name= "test">$x < <xsl:value-of select= "$es[$mid]/@min" /></xsl:attribute> <xsl:copy-of select= "func:gen($es[position() &lt; $mid], $nf)" /> </xsl:element> <xsl:element name= "xsl:when"> <xsl:attribute name= "test">$x > <xsl:value-of select= "$es[$mid]/@max" /></xsl:attribute> <xsl:copy-of select= "func:gen($es[position() &gt; $mid], $nf)" /> </xsl:element> <xsl:element name= "xsl:otherwise"> <xsl:element name= "func:result"> <xsl:attribute name= "select"> '<xsl:value-of select="$es[$mid]/@value" />'</xsl:attribute> </xsl:element> </xsl:element> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:element name= "func:result"> <xsl:attribute name= "select"> '<xsl:value-of select="$nf"/>'</xsl:attribute> </xsl:element> </xsl:otherwise> </xsl:choose> </func:result> </func:function> <xsl:template match= "/*"> <xsl:variable name= "notFound" select= "'-'"/> <xsl:variable name= "sorted"> <xsl:for-each select= "*"> <xsl:sort select= "@min" data-type= "number"/> <xsl:copy-of select= "."/> </xsl:for-each> </xsl:variable> <xsl:element name= "xsl:stylesheet"> <xsl:attribute name= "version">1.0</xsl:attribute> <func:function name= "func:rangeLookup"> <xsl:element name= "xsl:param"> <xsl:attribute name= "name">x</xsl:attribute> </xsl:element> <xsl:copy-of select= "func:gen(exslt:node-set($sorted)/*,$notFound)" /> </func:function> </xsl:element> </xsl:template> </xsl:stylesheet> </pre>

    Efficient range lookup!

    Hermann.
    Ups, the Forum software modified stylesheet genBin.xsl.
    In previous posting "& lt;" is displayed as "<" which cannot be compiled ('test="$x < ...' is non-XML).

    Find stylesheet genBin.xsl attached -- getting that works.

    Hermann.
  • arun_tcs
    arun_tcs
    144 Posts

    Re: Efficient XSLT range lookups

    ‏2010-10-07T10:09:31Z  
    • HermannSW
    • ‏2010-10-06T19:39:37Z
    First a correction, in my previous posting the lookup times are
    2.1 nano micro seconds / 0.98 nano micro seconds
    per lookup on a 9004 with
    binary search tree / binary decision stylesheet.

    > Can you explain the problem and how you are trying to address it.
    >
    Yes.
    In addition I was asked on how to create the binary decision stylesheet.
    Below you may find stylesheet genBin.xsl which generates it from a simple list.

    So the range lookup problem is:
    • given a set of ranges {R_1, R_2, ..., R_k} with R_i=(min_i, max_i, value_i)
    • and a value x
    • decide efficiently whether x lies inside one of the ranges (x in R_i iff min_i <= x <= max_i)
    • and return its value (value_i) if so.

    For the stylesheet genBin.xsl the required input looks like this file x7.xml:
    <pre class="jive-pre"> <es> <e min= "1926" max= "1990" value= "G" /> <e min= "1901" max= "1905" value= "F" /> <e min= "1521" max= "1825" value= "E" /> <e min= "1001" max= "1010" value= "A" /> <e min= "1021" max= "1115" value= "B" /> <e min= "1131" max= "1235" value= "C" /> <e min= "1246" max= "1248" value= "D" /> </es> </pre>

    Now stylesheet genBin.xsl (find at bottom) generates this file x7.xsl from x7.xml:
    <pre class="jive-pre"> <xsl:stylesheet xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" version= "1.0 "><func:function name="func:rangeLookup " xmlns:func="http: //exslt.org/functio ns "><xsl:param name="x "/><xsl:choose><xsl:when test="$x &lt; 1246 "><xsl:choos e><xsl:when test= "$x &lt; 1021"><xsl:choose><xsl:when test= "$x &lt; 1001"><fu nc:result select= "'-'"/></xsl:when><xsl:when test= "$x > 1010"><func:result se lect= "'-'"/></xsl:when><xsl:otherwise><func:result select= "'A'"/></xsl:otherw ise></xsl:choose></xsl:when><xsl:when test= "$x > 1115"><xsl:choose><xsl:when test= "$x &lt; 1131"><func:result select= "'-'"/></xsl:when><xsl:when test= "$x > 1235 "><func:result select=" '-' "/></xsl:when><xsl:otherwise><func:result sel ect= "'C'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:otherwise><func:resul t select= "'B'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:when test= "$x > 1248 "><xsl:choose><xsl:when test="$x &lt; 1901 "><xsl:choose><xsl:when test="$ x &lt; 1521 "><func:result select=" '-' "/></xsl:when><xsl:when test="$x > 1825 " ><func:result select= "'-'"/></xsl:when><xsl:otherwise><func:result select= "'E '"/></xsl:otherwise></xsl:choose></xsl:when><xsl:when test="$x > 1905"><xsl:c hoose><xsl:when test= "$x &lt; 1926"><func:result select= "'-'"/></xsl:when><xs l:when test= "$x > 1990"><func:result select= "'-'"/></xsl:when><xsl:otherwise> <func:result select= "'G'"/></xsl:otherwise></xsl:choose></xsl:when><xsl:other wise><func:result select= "'F'"/></xsl:otherwise></xsl:choose></xsl:when><xsl: otherwise><func:result select= "'D'"/></xsl:otherwise></xsl:choose></func:func tion></xsl:stylesheet> </pre>

    This stylesheet is doing a binary search as described in the previous posting.

    Now x7.xsl can be included in another stylesheet to provide efficient range lookup:
    <pre class="jive-pre"> <xsl:stylesheet version= "1.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" xmlns:func= "http://exslt.org/functions" > <xsl:output omit-xml-declaration= "yes" /> <xsl:include href= "x7.xsl"/> <xsl:template match= "/"> <xsl:value-of select= "func:rangeLookup(.)"/> </xsl:template> </xsl:stylesheet> </pre>

    Sending input "<q>1248</q>" results in output of "D" quickly.

    Now this technique is most useful if many ranges exist.
    Here is the output of three times the same request for a sample with 25000 ranges.
    As you can see the very first request takes more than 3 seconds -- why?
    I flushed the stylesheet cache and the stylesheet needed to be compiled.
    Compile time for the 6MB stylesheet large.xsl is 2793 msec (from Stylesheet Status page).
    The 2nd and 3rd call need 8ms only, and these times include the network overhead.
    From the tables of the previous posting we know that a single lookup for 25000 ranges
    takes 0.98 microseconds.

    <pre class="jive-pre"> $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real 0m3.143s user 0m0.001s sys 0m0.004s $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real 0m0.008s user 0m0.001s sys 0m0.003s $ time curl --data-binary @t.xml dp3-l3.boeblingen.de.ibm.com:2066/x end-ABCDEFGHIJKLMNOPQRSTUVWZYZABCDEFGHIJK real 0m0.007s user 0m0.001s sys 0m0.002s $ xpath++ "count(//xsl:otherwise)" large.xsl 25001 $ </pre>

    And here is stylesheet genBin.xsl for generating binary decision stylesheets:
    <pre class="jive-pre"> <!-- genBin.xsl: - generate binary decision stylesheet - allows to provide "not found" value input: any (unsorted) XML file of the form <any1> <any2 min= "..." max= "..." value= "..."/> ... <any2 min= "..." max= "..." value= "..."/> </any1> output: - stylesheet for inclusion - provides "func:rangeLookup($x)" --> <xsl:stylesheet version= "1.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" xmlns:func= "http://exslt.org/functions" xmlns:exslt= "http://exslt.org/common" exclude-result-prefixes= "exslt" > <xsl:output omit-xml-declaration= "yes" method= "xml" /> <func:function name= "func:gen"> <xsl:param name= "es" /> <xsl:param name= "nf" /> <xsl:variable name= "l" select= "count($es)"/> <func:result> <xsl:choose> <xsl:when test= "$l &gt; 0"> <xsl:variable name= "mid" select= "floor(($l + 1) div 2)"/> <xsl:element name= "xsl:choose"> <xsl:element name= "xsl:when"> <xsl:attribute name= "test">$x < <xsl:value-of select= "$es[$mid]/@min" /></xsl:attribute> <xsl:copy-of select= "func:gen($es[position() &lt; $mid], $nf)" /> </xsl:element> <xsl:element name= "xsl:when"> <xsl:attribute name= "test">$x > <xsl:value-of select= "$es[$mid]/@max" /></xsl:attribute> <xsl:copy-of select= "func:gen($es[position() &gt; $mid], $nf)" /> </xsl:element> <xsl:element name= "xsl:otherwise"> <xsl:element name= "func:result"> <xsl:attribute name= "select"> '<xsl:value-of select="$es[$mid]/@value" />'</xsl:attribute> </xsl:element> </xsl:element> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:element name= "func:result"> <xsl:attribute name= "select"> '<xsl:value-of select="$nf"/>'</xsl:attribute> </xsl:element> </xsl:otherwise> </xsl:choose> </func:result> </func:function> <xsl:template match= "/*"> <xsl:variable name= "notFound" select= "'-'"/> <xsl:variable name= "sorted"> <xsl:for-each select= "*"> <xsl:sort select= "@min" data-type= "number"/> <xsl:copy-of select= "."/> </xsl:for-each> </xsl:variable> <xsl:element name= "xsl:stylesheet"> <xsl:attribute name= "version">1.0</xsl:attribute> <func:function name= "func:rangeLookup"> <xsl:element name= "xsl:param"> <xsl:attribute name= "name">x</xsl:attribute> </xsl:element> <xsl:copy-of select= "func:gen(exslt:node-set($sorted)/*,$notFound)" /> </func:function> </xsl:element> </xsl:template> </xsl:stylesheet> </pre>

    Efficient range lookup!

    Hermann.
    Thanks for the explanation . Now I can interpret the solution in a better way.
  • HermannSW
    HermannSW
    4901 Posts

    Re: Efficient XSLT range lookups

    ‏2010-11-06T17:02:24Z  
    Hello,

    DataPower does not provide XPath 2.0 function fn:compare() for comparison of two strings.
    Find a DataPower implemenation in this thread:
    https://www.ibm.com/developerworks/forums/thread.jspa?threadID=351751

    Now that func:compare() is available for comparing two strings a new application
    of the range lookup problem has been opened:
    Instead of having number ranges and a number to lookup now having string-ranges and string lookup is possible!

    Attached stylesheet genBinStr.xsl converts an input file like s7.xml below
    to a binary search stylesheet s7.xsl. This can then be included in s.xsl
    for making use of the string range lookup.

    If you look at generated stylesheet s7.xsl below you will see that genBinStr.xsl
    is smart enough to make only one call to func:compare() in cases where @min and
    @max values are the same.

    Efficiency:
    I tested the lookup with an input file containing 15000 different strings of length 38 (@min is equal to @max here).
    The generated stylesheet was of size 4.65MB.
    First time compilation of the stylesheet on a 9004 DataPower box takes 58 seconds.
    But then the stylesheet is in stylesheet cache and does not need to be recompiled.
    I sent a request containing the 15000 strings for lookup and the 15000 mapped string values were returned.
    This took slightly less than a second.
    So the avarage lookup time for this scenario is 66 microseconds or 0.066ms.
    Now I sent the 6809 strings on deepest level in binary decision stylesheet only leading to 0.073ms worst case lookup time.

    Doing the same with 15000 strings where @min and @max were different showed
    0.096ms / 0.118ms lookup times on average / in worst-case.

    Sample input file:
    
    $ cat s7.xml <es> <e min=
    "'abca'" max=
    "'abcd'" value=
    "'http://url_1'" /> <e min=
    "'dcba'" max=
    "'dcbd'" value=
    "'http://url_2'" /> <e min=
    "'baca'" max=
    "'bacd'" value=
    "'http://url_3'" /> <e min=
    "'cdaa'" max=
    "'cdad'" value=
    "'http://url_4'" /> <e min=
    "'aaaa'" max=
    "'aaaa'" value=
    "'http://url_5'" /> <e min=
    "'bbbb'" max=
    "'bbbb'" value=
    "'http://url_6'" /> <e min=
    "'cccc'" max=
    "'cccc'" value=
    "'http://url_7'" /> </es>
    


    How to use the stylesheet generated by genBinStr.xsl:
    
    $ cat s.xsl <xsl:stylesheet version=
    "1.0" xmlns:xsl=
    "http://www.w3.org/1999/XSL/Transform" xmlns:func=
    "http://exslt.org/functions" xmlns:exsl=
    "http://exslt.org/dates-and-times" xmlns:dp=
    "http://www.datapower.com/extensions" exclude-result-prefixes=
    "dp exsl func" > <xsl:output omit-xml-declaration=
    "yes" />   <xsl:include href=
    "s7.xsl"/> <xsl:template match=
    "/ts/t"> <l><xsl:value-of select=
    "func:rangeLookup(.)"/></l> </xsl:template> </xsl:stylesheet> $
    


    Pretty-printed stylesheet s7.xsl generated from s7.xml:
    
    $ tidy -q -xml s7.xsl <xsl:stylesheet xmlns:xsl=
    "http://www.w3.org/1999/XSL/Transform" version=
    "1.0"> <func:function name=
    "func:compare" xmlns:func=
    "http://exslt.org/functions"> <xsl:param name=
    "str1" /> <xsl:param name=
    "str2" /> <xsl:variable name=
    "aux1"> <str> <xsl:value-of select=
    "$str1" /> </str> <str> <xsl:value-of select=
    "$str2" /> </str> </xsl:variable> <xsl:variable name=
    "aux2"> <xsl:for-each select=
    "$aux1/str"> <xsl:sort /> <xsl:copy-of select=
    "." /> </xsl:for-each> </xsl:variable> <func:result> <xsl:choose> <xsl:when test=
    "$str1 = $str2">0</xsl:when> <xsl:when test=
    "$str1 = $aux2/str[1]">-1</xsl:when> <xsl:otherwise>1</xsl:otherwise> </xsl:choose> </func:result> </func:function> <func:function name=
    "func:rangeLookup" xmlns:func=
    "http://exslt.org/functions"> <xsl:param name=
    "x" /> <xsl:variable name=
    "tst" select=
    "func:compare($x,'bbbb')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <xsl:variable name=
    "tst" select=
    "func:compare($x,'abca')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <xsl:variable name=
    "tst" select=
    "func:compare($x,'aaaa')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:when test=
    "$tst = 1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:when test=
    "$tst = 0"> <func:result select=
    "'http://url_5'" /> </xsl:when> </xsl:choose> </xsl:when> <xsl:when test=
    "func:compare($x,'abcd') = 1"> <xsl:variable name=
    "tst" select=
    "func:compare($x,'baca')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:when test=
    "func:compare($x,'bacd') = 1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:otherwise> <func:result select=
    "'http://url_3'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <func:result select=
    "'http://url_1'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test=
    "$tst = 1"> <xsl:variable name=
    "tst" select=
    "func:compare($x,'cdaa')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <xsl:variable name=
    "tst" select=
    "func:compare($x,'cccc')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:when test=
    "$tst = 1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:when test=
    "$tst = 0"> <func:result select=
    "'http://url_7'" /> </xsl:when> </xsl:choose> </xsl:when> <xsl:when test=
    "func:compare($x,'cdad') = 1"> <xsl:variable name=
    "tst" select=
    "func:compare($x,'dcba')" /> <xsl:choose> <xsl:when test=
    "$tst = -1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:when test=
    "func:compare($x,'dcbd') = 1"> <func:result select=
    "'-'" /> </xsl:when> <xsl:otherwise> <func:result select=
    "'http://url_2'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <func:result select=
    "'http://url_4'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test=
    "$tst = 0"> <func:result select=
    "'http://url_6'" /> </xsl:when> </xsl:choose> </func:function> </xsl:stylesheet>   $
    


    Hermann.
  • HermannSW
    HermannSW
    4901 Posts

    Re: Efficient XSLT range lookups

    ‏2010-11-06T17:27:51Z  
    • HermannSW
    • ‏2010-11-06T17:02:24Z
    Hello,

    DataPower does not provide XPath 2.0 function fn:compare() for comparison of two strings.
    Find a DataPower implemenation in this thread:
    https://www.ibm.com/developerworks/forums/thread.jspa?threadID=351751

    Now that func:compare() is available for comparing two strings a new application
    of the range lookup problem has been opened:
    Instead of having number ranges and a number to lookup now having string-ranges and string lookup is possible!

    Attached stylesheet genBinStr.xsl converts an input file like s7.xml below
    to a binary search stylesheet s7.xsl. This can then be included in s.xsl
    for making use of the string range lookup.

    If you look at generated stylesheet s7.xsl below you will see that genBinStr.xsl
    is smart enough to make only one call to func:compare() in cases where @min and
    @max values are the same.

    Efficiency:
    I tested the lookup with an input file containing 15000 different strings of length 38 (@min is equal to @max here).
    The generated stylesheet was of size 4.65MB.
    First time compilation of the stylesheet on a 9004 DataPower box takes 58 seconds.
    But then the stylesheet is in stylesheet cache and does not need to be recompiled.
    I sent a request containing the 15000 strings for lookup and the 15000 mapped string values were returned.
    This took slightly less than a second.
    So the avarage lookup time for this scenario is 66 microseconds or 0.066ms.
    Now I sent the 6809 strings on deepest level in binary decision stylesheet only leading to 0.073ms worst case lookup time.

    Doing the same with 15000 strings where @min and @max were different showed
    0.096ms / 0.118ms lookup times on average / in worst-case.

    Sample input file:
    <pre class="jive-pre"> $ cat s7.xml <es> <e min= "'abca'" max= "'abcd'" value= "'http://url_1'" /> <e min= "'dcba'" max= "'dcbd'" value= "'http://url_2'" /> <e min= "'baca'" max= "'bacd'" value= "'http://url_3'" /> <e min= "'cdaa'" max= "'cdad'" value= "'http://url_4'" /> <e min= "'aaaa'" max= "'aaaa'" value= "'http://url_5'" /> <e min= "'bbbb'" max= "'bbbb'" value= "'http://url_6'" /> <e min= "'cccc'" max= "'cccc'" value= "'http://url_7'" /> </es> </pre>

    How to use the stylesheet generated by genBinStr.xsl:
    <pre class="jive-pre"> $ cat s.xsl <xsl:stylesheet version= "1.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" xmlns:func= "http://exslt.org/functions" xmlns:exsl= "http://exslt.org/dates-and-times" xmlns:dp= "http://www.datapower.com/extensions" exclude-result-prefixes= "dp exsl func" > <xsl:output omit-xml-declaration= "yes" /> <xsl:include href= "s7.xsl"/> <xsl:template match= "/ts/t"> <l><xsl:value-of select= "func:rangeLookup(.)"/></l> </xsl:template> </xsl:stylesheet> $ </pre>

    Pretty-printed stylesheet s7.xsl generated from s7.xml:
    <pre class="jive-pre"> $ tidy -q -xml s7.xsl <xsl:stylesheet xmlns:xsl= "http://www.w3.org/1999/XSL/Transform" version= "1.0"> <func:function name= "func:compare" xmlns:func= "http://exslt.org/functions"> <xsl:param name= "str1" /> <xsl:param name= "str2" /> <xsl:variable name= "aux1"> <str> <xsl:value-of select= "$str1" /> </str> <str> <xsl:value-of select= "$str2" /> </str> </xsl:variable> <xsl:variable name= "aux2"> <xsl:for-each select= "$aux1/str"> <xsl:sort /> <xsl:copy-of select= "." /> </xsl:for-each> </xsl:variable> <func:result> <xsl:choose> <xsl:when test= "$str1 = $str2">0</xsl:when> <xsl:when test= "$str1 = $aux2/str[1]">-1</xsl:when> <xsl:otherwise>1</xsl:otherwise> </xsl:choose> </func:result> </func:function> <func:function name= "func:rangeLookup" xmlns:func= "http://exslt.org/functions"> <xsl:param name= "x" /> <xsl:variable name= "tst" select= "func:compare($x,'bbbb')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <xsl:variable name= "tst" select= "func:compare($x,'abca')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <xsl:variable name= "tst" select= "func:compare($x,'aaaa')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <func:result select= "'-'" /> </xsl:when> <xsl:when test= "$tst = 1"> <func:result select= "'-'" /> </xsl:when> <xsl:when test= "$tst = 0"> <func:result select= "'http://url_5'" /> </xsl:when> </xsl:choose> </xsl:when> <xsl:when test= "func:compare($x,'abcd') = 1"> <xsl:variable name= "tst" select= "func:compare($x,'baca')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <func:result select= "'-'" /> </xsl:when> <xsl:when test= "func:compare($x,'bacd') = 1"> <func:result select= "'-'" /> </xsl:when> <xsl:otherwise> <func:result select= "'http://url_3'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <func:result select= "'http://url_1'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test= "$tst = 1"> <xsl:variable name= "tst" select= "func:compare($x,'cdaa')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <xsl:variable name= "tst" select= "func:compare($x,'cccc')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <func:result select= "'-'" /> </xsl:when> <xsl:when test= "$tst = 1"> <func:result select= "'-'" /> </xsl:when> <xsl:when test= "$tst = 0"> <func:result select= "'http://url_7'" /> </xsl:when> </xsl:choose> </xsl:when> <xsl:when test= "func:compare($x,'cdad') = 1"> <xsl:variable name= "tst" select= "func:compare($x,'dcba')" /> <xsl:choose> <xsl:when test= "$tst = -1"> <func:result select= "'-'" /> </xsl:when> <xsl:when test= "func:compare($x,'dcbd') = 1"> <func:result select= "'-'" /> </xsl:when> <xsl:otherwise> <func:result select= "'http://url_2'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <func:result select= "'http://url_4'" /> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test= "$tst = 0"> <func:result select= "'http://url_6'" /> </xsl:when> </xsl:choose> </func:function> </xsl:stylesheet> $ </pre>

    Hermann.
    The described string-range lookup can be used to provide associative array lookups:
    https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14550194#14550194

    Hermann.