These may be useful ressources to you if you use a DataPower appliance or have interest in XSLT or XML. The xpath++ tool described on this slide and available in this posting may be used even without DataPower.
The service accepts Non-XML traffic, attaches the binary input to a dummy SOAP document and therefore creates the SWA file needed for further processing.
The binary input of Non-XML transformations needs to be "consumed", otherwise the default behavior is that it will be copied to the output. Attaching the input to the dummy SOAP file does not consume the input.
Therefore stylesheet "bin.xsl" needs to invoke a <dp:input-mapping>, whose output is just discarded. In the posting above a separate FFD file is provided and used. Because its output is not used at all bin.xsl's line
<dp:input-mapping href="bin.ffd" type="ffd"/>
can be simplified to refer to a FFD file that is offically shipped as part of the DataPower product:
Several years ago I had the need to create many little chess problem solution animations. With some shell scripts, netpbm tools and gifsicle I was able to create animated gifs like that on the left (of size 80x80 pixels for fitting on my Siemens S55 cell phone display at that time). The chess problems were so called shortest construction tasks, have a look at "Construction task" on this page http://en.wikipedia.org/wiki/Chess_problem and you will find my animation as well as the definition and a reference to Shortest construction tasks maphttp://stamm-wilbrandt.de/chess/en/sct_map.html (the solution shown in the animation is for problem "Black mates White by changing a pawn into a knight").
As seen above animated gifs are good for uninterrupted animations. But later I wanted some kind of "solution player" like a CD player.
Animation (play) shoud be interruptible and single stepping (back and forth) should be possible. The solution (on the left) was to have an iframe element with the following elements inside :
anim.gif (animated .gif file)
anim.html (referencing anim.gif)
uD.gif (for D=0,...,10, single position .gif file)
anim.D.html (for D=0,...,10, referencing uD.gif)
Each .html file defines an image map with the active areas making the left side controls (the well known symbols for "stop", "back to start", "single step back", ...,"play") active by defining the needed URLs to anim.html, anim.0.html, ..., anim.10.html. You may inspect the details by clicking on link anim.3.html and inspect the HTML page source.
To start generating chess animations with XML first you have to create a single position picture. This can be done by a stylesheet putting pictures for each field on the chessboard together. Having a single .gif for each chess piece (bishop, king, (k)night, pawn, queen, "empty"), piece color (black/white) and field color (black/white), with naming allowing easy referencing by XSLT is helpful. Find below the .gifs "b.gif", "bbb.gif", "bbw.gif", "bkb.gif", ..., "wrb.gif", "wrw.gif":
This XML file is an encoding of the mate position of above chess animations:
This is a real customer problem and the first of several postings on CDATA sections.
The input XML document containted CDATA sections with an embedded embedded XML document.
Both XML declarations (see here), the XML file one and the one inside CDATA section are ISO-8859-1 encoded.
For ease of demonstration the XML file inside the CDATA section contains one element containing a single text character (German 'ü'):
Now what happens after the parser has parsed the XML file and passed it to DataPower XSLT processor? The CDATA section is removed and the complete input document will be converted to DataPower internal data format. And this internal data format is UTF-8.
So now the problem is that a dp:parse() on the inner XML document finds a mismatch of the encoding in XML declaration (ISO-8859-1) and the current (UTF-8) encoding of the document. While this is working as designed, there are the five options to get parsing done correctly:
do not use CDATA section
use CDATA section without an XML declaration inside
use CDATA section with an XML declaration inside without encoding
use CDATA section with an XML declaration inside with UTF-8 encoding
skip the XML declaration when parsing the inner XML document
Skipping the XML declaration can be simply done by "substring-after(_, '?>')".
Below stylesheet demonstrates
the difference. lenA is 2 because the UTF-8 encoding "fc 3c" for 'ü' will be counted as two (ISO-8859-1) characters. lenB is 1 because by skipping the XML declaration the UTF-8 default encoding is used which perfectly matches the (internal) encoding.
During development sometimes the real backend service a DataPower service should connect to might not be available.
In this case we could set variable var://service/mpgw/skip-backside to 1 for a Multi-Protocol Gateway service. This makes the MPGW service a loopback service and we could provide dummy backend response data.
The difference to connecting to a real backend is just the latency -- no real backend has 0 latency.
While there is no specific DataPower command to add latency to a service or stylesheet (like Unix "sleep" command) we can use a simple workaround I learned from a colleague yesterday. Just try to open a network connection to a non-existent IP address with <dp:url-open ...> and specify the timeout you are interested in:
... <!-- do 10 second delay --> <dp:url-open target="http://220.127.116.11" response="ignore" timeout="10" /> ...
This has the big advantage that it does not affect CPU-usage (non-active wait).
Because most browsers support XSLT 1.0 only and I am interested in XSLT in browsers ("As of 2010, however, XSLT 1.0 is still widely used, as there are no products that support XSLT 2.0 running in the browser, ...", http://en.wikipedia.org/wiki/XSLT#Origins)
Because of the many EXSLT functions one could define "XSLT 1.0++ = XSLT 1.0 + EXSLT", but what do I mean with "XSLT 1.0+"?
Browser support for EXSLT is either not complete or minimal. The biggest deficience of XSLT 1.0 spec is that the result of a transformation is a so called result tree fragment (RTF). Only a restricted set of operations is allowed on RTF's. The EXSLT function node-set() resolves this deficiency, and my definition is "XSLT 1.0+ = XSLT 1.0 + exslt:node-set()".
While the node-set() function is available in all browsers, it is not available in the EXSLT namespace for Microsoft Internet Explorer.
This looked interesting, but I thought that "even more" obfuscation was possible for his sample. I came up with this 992 bytes long 1-line version of Oleg's XSLT (line breaks here for FF browser display only):
<!--evenMore("http://www.tkachenko.com/blog/archives/000732.html")--><!DOCTYPE p [<!ENTITY e 'uri'><!ENTITY x 'string'><!ENTITY P 'number'><!ENTITY q 'before'>< !ENTITY w 'name'><!ENTITY L 'concat'><!ENTITY p '&w;space'><!ENTITY W 'translate' ><!ENTITY M 'length'><!ENTITY Y 'contains'><!ENTITY y 'div'><!ENTITY u 'sub&x;'> <!ENTITY Q 'not'><!ENTITY _ 'after'><!ENTITY m '-uri'><!ENTITY E 'document'>]><x :stylesheet version="1.0" xmlns:x="http://www.w3.org/1999/XSL/Transform"><x:temp late match="/"><x:variable name="_" select="&E;('')"/><x:variable name="_-_" sel ect="&P;(&Q;(_-_=_-_=_-_=_-_))"/><html><x:value-of select="&L;(&u;(&p;-&e;($_/*/ *[$_-_]),$_-_,$_-_),&u;(&w;($_/*/*[$_-_]),&x;-&M;(*>*)*2,$_-_),&u;(@_>_-,&x;-&M; (******* div @_),$_-_),&W;(&w;(($_//@*)),&W;(&w;(($_//@*)),'l',''),''),&u; (&w;($_/*/@*),6,$_-_),' ',&W;(&u;(&p;&m;($_/*),12,&x;-&M;('&_;')),'.3',''), &u;(_/_/_=//_//_,3,$_-_),&u;($_/*/*/*/@*[&Y;(.,'(')],$_-_,$_-_),'!')"/></ht ml></x:template></x:stylesheet>
So why is Obfuscated XSLT more difficult to create than Obfuscated C? See Cheat 2.
After my previous posting CDATA 1/x I wanted to make some comments on CDATA sections in general and then for CDATA and DataPower.
Doing a simple search on developerWorks for "CDATA" showed an interesting hit: Dealing with data in XML It talks about character data, parsed character data, ... and so with that link I am done with the first part.
As already said in the previous posting XML data is stored internally as UTF-8 by DataPower XSLT processor.
Document cdata-alpha-iso.xml is used to show that there is no difference for DataPower XSLT processor if the greek alpha character is given in the input
in clear inside CDATA section
(the hexdump shows that alpha is encoded as e1 in ISO-8859-7 encoding):
$ cat cdata-alpha-iso.xml <?xml version="1.0" encoding="ISO-8859-7"?> <!DOCTYPE root [ <!ENTITY alpha "α"> ]> <tag>�<![CDATA[�]]>α</tag> $ $ od -Ax -tcx1 cdata-alpha-iso.xml | tail -5 000060 341 < ! [ C D A T A [ 341 ] ] > & a e1 3c 21 5b 43 44 41 54 41 5b e1 5d 5d 3e 26 61 000070 l p h a ; < / t a g > \n 6c 70 68 61 3b 3c 2f 74 61 67 3e 0a 00007c $
In previous CDATA posting I showed how to use string-length() to get an idea of DataPower internal data representation.
Here I want to show the "magnifying glass" technique, which may be used to inspect more internals than just CDATA.
The extension funtion dp:binary-encode() Base64-encodes arbitrary binary data for DataPower XSLT processor. Since base64 data is not that easy to inspect by humans extension function dp:radix-convert(_,64,16) may be used to generate hexadecimal output from base64 data.
The second line is the "magnifying glass" output that demonstrates that every time CEB1 is stored internally, which is the UTF-8 encoding of alpha,
One last comment on converting a base64 string to a hex string.
dp:radix-convert() is a "number" function, and therefore it will strip leading 0x00 bytes (no problem for above application).
In case you want to perserve the length of the encoded base64 string use this function:
<!-- convert base64 string $b64str to hex string (2 hex digits per byte) --> <func:function name="ub:base64-to-hex"> <xsl:param name="b64str"/>
I asked myself back in 1983 at school:
What is the sum of the reciprocals of Pascal's triangle?
Of course the sum is infinite because of the 1's on left and right border.
Next question: What is the sum without the 1's?
Of course the sum is infinite again because of the harmonic series on left and right border.
Final question: What is the sum of Pascal's triangle reciprocals without the 1's and without the two harmonic series?
And the answer was (and is) 3/2 !
So based on the Theorem further below the Corollary (sum being 3/2) can be prooven pretty easily.
And here is the main Theorem, a nice decomposition of each unit fraction into binomial coefficient reciprocals.
Starting summation of binomial coefficient reciprocals at row j+1 for column j gives 1/(j-1).
It is left as exercise to the reader ;-)
(or as challenge for one being significantly shorter than two pages).
So, why is this posting marked with XSLT tag?
Easier to proof it the theorem validity can just be "seen" by actually doing the summations.
Of course these summations are done by a styesheet.
Click on Pascal.xml to compute the sums in your browser (by stylesheet Pascal.xsl). (interesting what pure HTML allows for in typesetting mathematical formulas itself)
a "prepend" service making use of special binary data processing behavior mentioned in my previous Blog posting Sending zip archives to DataPower (this just converts Non-XML input data base64string to prepend=base64string)
a normal Non-XML service with a convert-http action to convert this "HTTP-Form" input to XML (convert-http action does not need DataGlue license)
This is just another example for "does not work out-of-the-box on DataPower but can be made working".
While embedding stylesheets is part of the spec it is not supported by all browsers. Especially Internet Exploerer 6/7/8 browsers do not support it.
I found a solution on how to enable Internet Explorer browsers for embedded stylesheet processing. The first solution had the drawback that now stylesheet embedding was possible for FF and IE, but the solution broke the ability of the other big5 browsers to process embedded stylesheets although they could process them natively.
To give you an idea of an embedded stylesheet find listing of supportALL.xml (generates the table on the right) below:
<!-- This stylesheet is a modification from this posting: http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200010/msg01150.html --> <xsl:stylesheet id="style1" version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- only needed by "Opera Mini"; output method here IS "html" even without explicit declaration, see http://www.w3.org/TR/xslt#output --> <xsl:output method="html"/>
<xsl:key name="b" match="@os" use="."/>
<xsl:template match="/"> <html> <body>
<i>ApplyStylesheetEmbedding.xsl technique</i> works for: <table border="1" cellspacing="0">
<!-- Generate the row of table header cells --> <tr> <th>Browser \ OS</th> <!-- | Generate a header cell for each unique os name | | See Chapter 9 of "Building Oracle XML Applications" from O'Reilly
| for a detailed explanation of how this <xsl:key> based technique works +--> <xsl:for-each select="//@os[generate-id(.)=generate-id(key('b',.))]"> <!-- Sort by the os name (the value of the current @os attribute -->
could have been titled "Processing binary data in DataPower stylesheets 1/x". This posting is the second of several postings on processing of binary data in DataPower stylesheets.
Again this is on CDATA as in the postings "CDATA .../x". This time the (customer) problem was as follows:
accept XML input data
do XML threat protection
if no threat, pass input unmodified to backend (preserve any CDATA elements).
While preserving of CDATA might be done with cdata-section-elements of <xsl:output ... /> you have to know which elements you want to have CDATA output for, In case of DataPowwer as generic proxy any element might contain CDATA section or might not contain CDATA sections.
If the service would have XML request type CDATA sections are gone -- therefore we need a Non-XML service.
But XML threat protection does not work in Non-XML service -- therefore we need a second XML service.
Now lets discuss the solution I came up with.
Setup XML-FW with XML request type with its own XML Manager and any specific XML Parser settings. (I did set "XML Element Depth" to 1, service is on port 2060)
Setup XML-FW with Non-XML request type, and these two actions: 1) binary transform action from INPUT to NULL with stylesheet check.xsl 2) results action from INPUT to OUTPUT
This is the binary transform stylesheet check.xsl I used.
The <dp:input-mapping ... /> uses Flat File Descriptor (FFD) store:///pkcs7-convert-input.ffd shipped with DataPower and used by five PKCS7 stylesheets located in store:/// folder. As documented in the FFD itself (see below) it generates structure <object><message>***binary data***</message></object> with a binaryNode as child of <message> element. Find the details on binaryNode in previous posting.
Now the "binary data" (which in this case is XML data, but we want pass it unmodified) needs to be posted to second XML (threat protection) service by <dp:url-open>. Posting binary data with <dp:url-open> is done with data-type="base64". Therefore we just convert the binaryNode data to base64 by dp:binary-encode(/object/message) in <dp:url-open>. We are not interested in the response of the service called but only in its response code, therefore response="responsecode". The result of <dp:url-open> call is stored in variable $reponse. In case the HTTP response code is not equal to 200 we know that something is wrong, and for the service called we know that XML threat protection found a problem. Therefore we abort service execution by <dp:reject>.
So if service execution does not get aborted by <dp:reject> the results action from INPUT to OUTPUT just copies the unmodified INPUT to OUTPUT (preserving any CDATA section). Otherwise (in case of a detected XML threat) the service returns a generic error message to the client.
What we have seen is
how to process binary (Non-XML) input data (resulting in a binaryNode)
how to convert a binaryNode to a base64 encoded string
how to pass "binary" data to dp:url-open() by data-type="base64"
how to process response code of dp:url-open
By the arguments above there cannot be a solution without a side call.
And this is from stylesheet "store:///pkcs7-convert-input.ffd":
... <!-- This FFD converts the input into an XML tree like this:
This post is on little tool mimeswacurl for sending an arbitrary file as attachment of a SOAP with Attachments file to a service endpoint. This is really useful if developing a DataPower service dealing with SOAP with Attachments.
On DataPower the so called attachment-manifest gives details of the SOAP document as well as all atachments.
Stylesheet manifest.xsl which just outputs var://local/attachment-manifest for demo application below:
rawTCP2HTTP never was a perfect solution, but things have become really bad:
While the passthru/passthru trick with FSH timeout of 1sec works for clients not sending a FIN with rawTCP2HTTP running on <18.104.22.168 firmwares (there DataPower did not send a FIN to client when FSH timeout occured), networking behavior changed starting with 22.214.171.124 firmware. Now connection is terminated completely on frontside timeout, resulting in an Internal error response to client.
What remains is that sending rawTCP binary message from client terminated with FIN (like netcat command "nc" does) still works even with 7.0.0.x firmware.
There were Requests For Enhancement, but I cannot find any of them anymore in RFE tool :-(
In writing this up for a WSTE webcast below solution for rawTCP Non-XML processing became nearly trivial(!).
And now it only makes use of definitely supported features (there is nothing besides passthru and raw XML Handler).
See the details on this slide (packet capture screenshot on the following slide):
since quite some time there exists an Enhancement request for a rawTCP Non-XML Front Side Handler
1.5 years ago I developed a prototype of a rawTCP2HTTP converter in DataPower Firmware which allows to convert
rawTCP Non-XML data to HTTP chunked data which can then be processed by any (DataPower) HTTP service
(see IBM publication Bridging raw TCP binary data to HTTP, http://priorartdatabase.com/IPCOM/000197959)
So it seems that processing rawTCP binary data is not possible with DataPower appliances.
Even one week ago I would have said "it is impossible to process Non-XML rawTCP data on DataPower appliances".
But that is not true, and this posting is just another one in the series of "does not work out-of-the-box on DataPower but can be made working"
So now that we know that it can be done the question is how.
It turns out to be really easy to create a rawTCP2HTTP service as MPGW service on DataPower.
Even though XA35 and XS40 do not allow for binary data processing they at least allow for Non-XML UTF-8 data processing by the first link of above list.
We make use of the fact that binary input of Non-XML transformations needs to be "consumed", otherwise the default behavior is that it will be copied to the output (see 3rd and 1st link above).
In addition we use a rawTCP XML Front Side Handler as that is the only one not complaining about protocol violations.
And then we make sure that INPUT is guaranteed to not being consumed -- thats all.
So this is rawTCP2HTTP MPGW service:
Front Side Protocol: Stateless Raw XML Handler listening on port1
Request Type: Non-XML
Type: static backend
Backend URL: http://127.0.0.1:port2
Response Type: Pass-Thru
And this is rawTCP2HTTP policy:
Results action from NULL to OUTPUT
In the example port1 is 2091 and port2 is 2092.
Sending data with a HTTP POST to DataPower can be easily done with with cURL or ApacheBench tool:
curl --data-binary @file ... http://dpbox:port1
ab -p file ... http://dpbox:port1
Sending rawTCP data can be done by Netcat tool (nc), this statement sends file to service listening on port at host:
nc host port < file
To have a basis for measurements I chose a very simple Non-XML example service (toHexb.xsl).
It just returns the (Non-XML) input as hexadecimal string, see posting DataPower "binary" node type (binaryNode) for details:
Now sending a request is not that spectacular -- we get the same hexadecimal output for file te3t we can also get by octal dump (od) tool:
[stammw@bl3d2027@de ~]$ nc 192.168.10.1 2091 <te3t ; echo
[stammw@bl3d2027@de ~]$ od -Ax -tcx1 te3t
000000 t e 003 t
74 65 03 74
But this is really over rawTCP!
See the following screenshot of a "All Interfaces" (3.8.2 firmware) packet capture.
The Filter for ports 2091 and 2092 just selects the "interesting" part of packect capture.
At the bottom left you see the command execution and output.
On the bottom right you see Follow TCP Stream for services on port 2091 (rawTCP) and 2092 (HTTP).
Hexadecimally displayed content of packet 116 is just the HTTP post of 2091 service to 2092 service after FIN ACK has been received.
This is the first restriction of rawTCP2HTTP service -- it cannot deal with HTTP persistent connections.
Time stamps for packets 110 and 128 show that the transaction took 2.003ms.
But that is not the value which is important.
Important is the average transaction overhead of rawTCP service on 2091 over HTTP service on 2092.
There are quite some tools like ApacheBench (ab) mentioned above for getting performance information of HTTP services.
I was not able to find a similar tool for benchmarking rawTCP performance.
Therefore I wrote one myself based on TCPClient.c (http://en.wikipedia.org/wiki/Berkeley_sockets#Client).
It was able to benchamrk rawTCP as well as HTTP data (by sending the necessary HTTP Post data).
For persistent and non-persistent connection HTTP benchmarking my numbers were similar to ApacheBench results (-k option for "keepalive").
These are the average results for sending 20000 requests with a concurrency of 20 to a 9004 DataPower from a directly connected Laptop:
0.1985ms per request for rawTCP service on 2091
0.0809ms per request for HTTP service on 2092 (without persistent connections)
0.0647ms per request for HTTP service on 2092 (with persistent connections)
So the overhead of rawTCP2HTTP service is 0.1985ms - 0.0647ms = 0.1338ms.
While for this very simple example service the overhead time dominates the total transaction time, 0.14ms overhead for rawTCP processing enablement is a good price for many real world Non-XML processing scenarios -- until now this is the only way to process rawTCP Non-XML data on DataPower appliances.
These are reasons why the "real" Non-XML rawTCP Front Side Handler of above mentioned Enhancement Request is still needed:
enabling persistent connections
reducing/eliminating 0.1338ms overhead of adding/removing of HTTP headers
having to use "XML rawTCP Front Side Handler" for dealing with "Non-XML data" sounds "interesting" at least :-)
Last, but not least, see that NonXML rawTCP processing on a XA35 or XS40 DataPower appliance is possible by above technique and
the technique from 1st link posting above (by "prepend=" trick with convert-http action):
$ od -Ax -tcx1 test
000000 t e s t 1 2 3 t e s t 1 2 3 t e
74 65 73 74 31 32 33 74 65 73 74 31 32 33 74 65
000010 s t \n
73 74 0a
$ nc cosmopolitan 2091 < test | tidy -q -xml
<?xml version="1.0" encoding="utf-8"?>
$ soma/doSoma admin soma/version.xml cosmopolitan:5550 | xpath++ "//Version/text()" -
Enter host password for user 'admin':
I tested rawTCP2HTTP on all supported release branches.
Finally I just tested it on a box with 126.96.36.199 firmware without problems.
188.8.131.52 DataPower Firmware was released in 1H2009 -- since then rawTCP2HTTP service waited to be discovered ... ;-)