To be able to directly "read" the binary data logged and my "brain-base64-decoder" not being that good stylesheet log-binary.xsl logs the binary data hexadecimally encoded (50% message length increase compared to base64-encoded).
See this little demonstration:
$ echo -n "test" | coproc2 log-binary.xsl - http://dp3-l3:2224 -s | od -tcx1 0000000 t e s t 74 65 73 74 0000004 $
This is the log entry from my box "dp3-l3.boeblingen.de.ibm.com":
As you may know binary or Non-XML data processing requires "DataGlue" license shown for "show license" CLI command.
XS40 as well as new XG45 (without DIM feature) do not have "DataGlue" license and allow for very limited Non-XML processing capabilities. One Non-XML feature which is present is "" action. It converts Non-XML CGI-encoded input (an HTTP POST of HTML
form or URI parameters) into equivalent XML message.
This is a demonstration of complete sample application "binary-reverse" -- it just reverses any (binary) input data and returns that. This works on boxes without DataGlue license like XS40, but on other boxes as well. The 0x00 bytes at begin, in the middle and at the end of sample input file are only present "to make it more difficult" ...
$ cat -v 0in0put0 ; echo ^@2b|~2b -^@that is the question^@ $ $ od -Ax -tcx1 0in0put0 000000 \0 2 b | ~ 2 b - \0 t h a t i 00 32 62 7c 7e 32 62 20 2d 00 74 68 61 74 20 69 000010 s t h e q u e s t i o n \0 73 20 74 68 65 20 71 75 65 73 74 69 6f 6e 00 00001f $ $ curl --data-binary @0in0put0 http://dp2-l3:2305 -s | cat -v ; echo ^@noitseuq eht si taht^@- b2~|b2^@ $ $ curl --data-binary @0in0put0 http://dp2-l3:2305 -s | od -Ax -tcx1 000000 \0 n o i t s e u q e h t s i 00 6e 6f 69 74 73 65 75 71 20 65 68 74 20 73 69 000010 t a h t \0 - b 2 ~ | b 2 \0 20 74 61 68 74 00 2d 20 62 32 7e 7c 62 32 00 00001f $
This is sample output from 2nd service in export presented last Monday in Frankfurt. Here you can see how to convert binary input data to base64 or hexbin encoded representation for "normal" stylesheet processing, as well as how to return arbitrary binary data from a stylesheet generated base64 string:
$ od -tcx1 te0t 0000000 t e \0 t 74 65 00 74 0000004 $ $ curl --data-binary @te0t http://dp2-l3:1405/base64; echo <base64>dGUAdA==</base64> $ $ curl --data-binary @te0t http://dp2-l3:1405/hexbin; echo <hexbin>74650074</hexbin> $ $ curl --data-binary dGUAdA== http://dp2-l3:1405/output -s | od -tcx1 0000000 t e \0 t 74 65 00 74 0000004 $
Here I tried to show the flow of "binary-reverse" sample application together with ALL intermedite contexts -- follow the arrows to follow the flow.
Please click on the image to open the BIG screenshot showing all the details!
The most important setting to make above service work is to enable "Non-XML Processing" for the rule(!) -- click for BIG screenshot:
Of course the service request type must be flagged "Non-XML" as well -- click for BIG screenshot:
Last, but not least, stylesheet binary-reverse.xsl which you can also find on 1st screenshot above:
Have fun with binary data and attachment processing,
For any conforming XML parser these two documents should have the same "meaning".
None of these changes affect the "meaning" of an XML document: * reordering attributes * changing whitespace within an attribute list * removing redundant namespace declarations * converting CDATA sections to equivalent escaped text * changing the XML declaration.
But real backend systems sometimes do not have compliant XML parsers, and the requirement to send the input "unchanged" to backend after some checks (XML validation, XML threat protection) exists.
The solution provided in the posting cited above does exactly that, but needed to do a "side-call" via "dp:url-open()" to work.
Today I want to show a solution which does not require a side call and does all what is needed in a single policy.
Find the input files and 4.0.1 config export here: xml-policy-files.2.zip <Ooops>wrong export was contained in first archive, corrected now</Oops>
As in the previous solution request type has to be Non-XML. This requires a binary transform action to make INPUT context available for any further processing.
The trick I want to show here is the use of stylesheet binary-passthru.xsl:
It does not do much, just copies the (Non-XML) input to its output context.
And this is how we can make XML input passed to this (Non-XML) service available to a "normal" stylesheet for precessing.
Just have a "Transform Binary" actions as 1st action after match, with input comtext "INPUT" and output context "input". Whenever context "INPUT" needs to be used in further actions, use context "input" instead. And if there is no final results actions, add one.
Finally have the final results action have "INPUT" as its input context (yes, that is the "unmodified" input). That's all.
I used normal transform action with stylesheet "store:///identity.xsl", input context "input" and output context "NULL" in the export attached. What does that do, and why output to "NULL"?
The output is not really needed, therefore NULL as output context. store:///identity.xsl is good enough to do XML parsing, so any Non-XML input will fail here. In additioon the XML Threat protection configured in the XML Manager happens!
In the attached config export I use XML Manager "71byte" just for demo purposes, which restricts "XML Bytes Scanned" to 71.
This will let pass sample document "umlaute.iso-8859-1.xml" of length 71 bytes, and reject "umlaute.utf-8.xml" of length 72 bytes.
There are three services to play with:
* passthru loopback XML FW listening on port 2550 * XML FW listening on port 2551
* MPGW listening on port 2552
Both, the XML FW and the MPGW have
* backend 127.0.0.1:2550 * policy "xml" as described above (with store:///identity.xsl transform) * XML manager "71byte"
This screenshot of my webmail client proves that sending the email with attachment really worked:
Last, but not least, I want to use this posting for a graphical experiment.
Since the code listings available in this blog are not as "nice" as those in developerWorks forum I took a screenshot of the
terminal with syntax highlighted listing of smtp.xsl. Since best display in this blog is for pictures of width exactly 400 pixels
I had to zoom out a little ... the resulting screenshot is a 400x1770 pixel(!) .gif image.
Good that I have a 24'' monitor in my office and a 1920x1200 graphic card for working in "Left" instead of "Normal" view ... :-)
demonstrated that arrows from Unicode "Arrows" range 2190-21FF as well as other Unicode characters can be part of function names.
The attached and described stylesheet povides useful conversion functions in addition to dp:radix-convert() (see func:bin⇉hex() for a nice XPath technique). Here a conversion of type "...⇉..." preserves leading '0's, while a conversion of type "...┈⇢..." does not preserve leading '0's (like dp:radix-convert() ).
b64 (base64), hex, bin (01-strings) and dec (number) can be converted by these function calls:
The response returned from backend contains a 12 byte Non-XML prefix before XML. This stylesheet strips the Non-XML data, verifies the binary prefix correctness, and does XML validation for the rest. 004SA.xsl
This is an export containing both stylesheet for request/response policies, and a simulation of AS/400 backend by netcat (nc) tool is described and used for a full roundtrip verification.
The advantage of dp:decode(_,'base-64') is that it does a UTF-8 validation and preventes negative effects.
I cannot give a more complete example until having access to my boxes in the IBM network which I do not have (and do not want to have :-) ) by my cell phone or the Internet terminal in our youth hostel in most nordic German town List on North Sea island Sylt ...
since quite some time there exists an Enhancement request for a rawTCP Non-XML Front Side Handler
1.5 years ago I developed a prototype of a rawTCP2HTTP converter in DataPower Firmware which allows to convert rawTCP Non-XML data to HTTP chunked data which can then be processed by any (DataPower) HTTP service (see IBM publication Bridging raw TCP binary data to HTTP, http://priorartdatabase.com/IPCOM/000197959)
So it seems that processing rawTCP binary data is not possible with DataPower appliances.
Even one week ago I would have said "it is impossible to process Non-XML rawTCP data on DataPower appliances". But that is not true, and this posting is just another one in the series of "does not work out-of-the-box on DataPower but can be made working"
So now that we know that it can be done the question is how.
It turns out to be really easy to create a rawTCP2HTTP service as MPGW service on DataPower. Even though XA35 and XS40 do not allow for binary data processing they at least allow for Non-XML UTF-8 data processing by the first link of above list.
We make use of the fact that binary input of Non-XML transformations needs to be "consumed", otherwise the default behavior is that it will be copied to the output (see 3rd and 1st link above). In addition we use a rawTCP XML Front Side Handler as that is the only one not complaining about protocol violations. And then we make sure that INPUT is guaranteed to not being consumed -- thats all.
So this is rawTCP2HTTP MPGW service:
Front Side Protocol: Stateless Raw XML Handler
listening on port1
Request Type: Non-XML
Type: static backend
Backend URL: http://127.0.0.1:port2
Response Type: Pass-Thru
And this is rawTCP2HTTP policy:
Results action from NULL to OUTPUT
In the example port1 is 2091 and port2 is 2092.
Sending data with a HTTP POST to DataPower can be easily done with with cURL or ApacheBench tool:
curl --data-binary @file ... http://dpbox:port1
ab -p file ... http://dpbox:port1
Sending rawTCP data can be done by Netcat tool (nc), this statement sends file to service listening on port at host:
nc host port < file
To have a basis for measurements I chose a very simple Non-XML example service (toHexb.xsl). It just returns the (Non-XML) input as hexadecimal string, see posting DataPower "binary" node type (binaryNode) for details:
Now sending a request is not that spectacular -- we get the same hexadecimal output for file te3t we can also get by octal dump (od) tool:
[stammw@bl3d2027@de ~]$ nc 192.168.10.1 2091 <te3t ; echo 74650374 [stammw@bl3d2027@de ~]$ od -Ax -tcx1 te3t 000000 t e 003 t 74 65 03 74 000004 [stammw@bl3d2027@de ~]$
But this is really over rawTCP! See the following screenshot of a "All Interfaces" (3.8.2 firmware) packet capture. The Filter for ports 2091 and 2092 just selects the "interesting" part of packect capture. At the bottom left you see the command execution and output. On the bottom right you see Follow TCP Stream for services on port 2091 (rawTCP) and 2092 (HTTP). Hexadecimally displayed content of packet 116 is just the HTTP post of 2091 service to 2092 service after FIN ACK has been received. This is the first restriction of rawTCP2HTTP service -- it cannot deal with HTTP persistent connections. (http://stamm-wilbrandt.de/en/blog/capture.all.pcap.gif)
Time stamps for packets 110 and 128 show that the transaction took 2.003ms.
But that is not the value which is important. Important is the average transaction overhead of rawTCP service on 2091 over HTTP service on 2092.
There are quite some tools like ApacheBench (ab) mentioned above for getting performance information of HTTP services. I was not able to find a similar tool for benchmarking rawTCP performance. Therefore I wrote one myself based on TCPClient.c (http://en.wikipedia.org/wiki/Berkeley_sockets#Client). It was able to benchamrk rawTCP as well as HTTP data (by sending the necessary HTTP Post data). For persistent and non-persistent connection HTTP benchmarking my numbers were similar to ApacheBench results (-k option for "keepalive").
These are the average results for sending 20000 requests with a concurrency of 20 to a 9004 DataPower from a directly connected Laptop:
0.1985ms per request for rawTCP service on 2091
0.0809ms per request for HTTP service on 2092 (without persistent connections)
0.0647ms per request for HTTP service on 2092 (with persistent connections)
So the overhead of rawTCP2HTTP service is 0.1985ms - 0.0647ms = 0.1338ms.
While for this very simple example service the overhead time dominates the total transaction time, 0.14ms overhead for rawTCP processing enablement is a good price for many real world Non-XML processing scenarios -- until now this is the only way to process rawTCP Non-XML data on DataPower appliances.
These are reasons why the "real" Non-XML rawTCP Front Side Handler of above mentioned Enhancement Request is still needed:
enabling persistent connections
reducing/eliminating 0.1338ms overhead of adding/removing of HTTP headers
having to use "XML rawTCP Front Side Handler" for dealing with "Non-XML data" sounds "interesting" at least :-)
Last, but not least, see that NonXML rawTCP processing on a XA35 or XS40 DataPower appliance is possible by above technique and the technique from 1st link posting above (by "prepend=" trick with convert-http action):
$ od -Ax -tcx1 test 000000 t e s t 1 2 3 t e s t 1 2 3 t e 74 65 73 74 31 32 33 74 65 73 74 31 32 33 74 65 000010 s t \n 73 74 0a 000013 $ $ nc cosmopolitan 2091 < test | tidy -q -xml <?xml version="1.0" encoding="utf-8"?> <request> <url>/basic.xml?transaction=1</url> <base-url>/basic.xml</base-url> <args src="url"> <arg name="transaction">1</arg> </args> <args src="body"> <arg name="prepend">test123test123test</arg> </args> </request>
$ soma/doSoma admin soma/version.xml cosmopolitan:5550 | xpath++ "//Version/text()" - Enter host password for user 'admin':
I tested rawTCP2HTTP on all supported release branches. Finally I just tested it on a box with 188.8.131.52 firmware without problems. 184.108.40.206 DataPower Firmware was released in 1H2009 -- since then rawTCP2HTTP service waited to be discovered ... ;-)