There is only one way writing files to DataPower filesystem not going through XML Management set-file request:
<dp:dump-nodes> allows to store a nodeSet under a specified filename in "temporary:" folder.
nodeSet has one limitation, and that is that you cannot store arbitrary (binary) files that way.
Adding this functionality (write binary files) to dump-nodes was what RFE 71699 was about.
Customer did know that storing base64 encoded binary data is possible, but wanted to be able to write binary data.
OK, an RFE (Request For Enhancement) requests to add new functionality to a next major release. RFE 71699 was rejected because writing binary data to temporary: folder is possible with today's firmware. Yes, you need the latest 7.2.0.x firmware (for "dp:gatewayscript()" and GatewayScript "_.readsAsXML()"), but 7.2.0 is definitely earlier available than a next major release. The RFE developer update describes in short how to do with v7.2.0 firmware, this blog posting will give you the details.
OK, first we have to write (binary) data to a file in "temporary:" folder with GatewayScript. This can be done via the "fs" (filesystem) API in writing eg. a Buffer. The tricky part is how a XSLT can pass a binaryNode to GatewayScript. A binaryNode is handled in DataPower as "special" XML. And therefore you have to read the input via readAsXML(), the nodelist returned will contain just a single element for a binaryNode passed from XSLT, and that can be easily ".toBuffer()"ed. So this is the complete GatewayScript fs.write.js (click for download):
Now we need only to pass a binary node in XSLT in dp:gatewayscript() call, stylesheet fs.write.xsl (click for download) shows that there is no magic at all, just pass the binaryNode as "input" in dp:gatewayscript() [you have to store "fs.write.js" above into "local:" folder]
Here you can see coproc2 call that executes XSLT and passed Non-XML data (0x03 is no valid XML character): $ coproc2 fs.write.xsl <(echo -en 'te\x3t') http://dp6-l3:2224 $
And here we see in DataPower CLI what is going on:
xi50(config)# dir temporary: File Name Last Modified Size --------- ------------- ---- log/ Jan 27, 2016 1:24:05 PM 4096 datapowerjs/ Jan 11, 2016 11:56:03 AM 4096 export/ Jan 11, 2016 11:56:03 AM 4096 dpmon/ Jan 28, 2016 5:16:00 AM 4096 ftp-response/ Jan 11, 2016 11:56:03 AM 4096 image/ Jan 7, 2016 8:37:04 AM 4096
221792.9 MB available to temporary:
xi50(config)# show file temporary:test.dat
% Unable to display 'temporary:test.dat' - file is not printable
xi50(config)# dir temporary: File Name Last Modified Size --------- ------------- ---- log/ Jan 27, 2016 1:24:05 PM 4096 datapowerjs/ Jan 11, 2016 11:56:03 AM 4096 test.dat Jan 28, 2016 5:25:01 AM 4 export/ Jan 11, 2016 11:56:03 AM 4096 dpmon/ Jan 28, 2016 5:16:00 AM 4096 ftp-response/ Jan 11, 2016 11:56:03 AM 4096 image/ Jan 7, 2016 8:37:04 AM 4096
Here is pbmtobraile online to try out (without the need to install
anything), especially "echo -e ... | pbmtext | pnmcrop |
pbmtobraille". Here is an input form to try out, with short listing
of "echo -e" options: https://stamm-wilbrandt.de/echo-e.to.braille.html
I wanted to be able to do same or similar [of course store as file and view via gimp or browser is possible].
Back in 2011 I gave 2 WSTE webcasts on "Non-XML Data Processing in WebSphere DataPower SOA Appliances Stylesheets". The 2nd webcast shows on slide 28 how I did convert a bitmap image into textual output making use of the Braille Patterns.
This is the conversion for snowman.pbm, .pbm is portable bitmap format from netpbm tools:
Typically only the top 2x3 dots of 2x4 get used, as you can see above I used all 2x4.
samples.txt[.pre.html] contains various sample output produced (shown below), which are part of pbmtobraille.c's comments too.
9x9.pbm is a really crazy parsing sample according the .pbm spec, following this statement:
"Mr. Poskanzer cautions that programs that read this format should be as
lenient as possible, accepting anything that looks remotely like a pixmap."
This is the header section demonstrating basic use with pbmtext output, including negation of generated output, as well as the help line telling the tool's features:
This is top of tail comment section, showing graphviz output done by pbmtobraille:
And finally this is bottom section showing a bigger layout in vertical direction (layout=TB is default, Top to Bottom):
The whole thread discussed that handcrafted FFDs are not supported and referred back to this 2012 posting on the options. It also listed the only Enhancement Request that has been done since 2007 for FFD processing (FFD PMRs were fixed of course).
Further below I will show how easily binary data processing can be done with GatewayScript (available with 188.8.131.52 firmware). But before lets summarize all DataPower Non-XML data processing options here in one place:
Contivo FFDs: you need Contivo Analyst product,
no handcrafted FFDs are supported (you cannot raise PMRs against handmade FFDs)
"Binary data processing without DataGlue license!" technique, with
... <hexbin>...<hexbin> --(XSLT)-- base64 ...
... <hexbin>...<hexbin> --(XQuery)-- base64 ... (with 184.108.40.206 or higher firmware, allows for XPath 2.0)
One comment on option 4. While this works without Dataglue license (these days a XG45 without DIM option) you have to "pay" the price in form of added latency and memory consumption of the attachment processing needed by that technique.
The simple GatewayScript data structure for processing binary data is the buffer object.
For reading binary input we use readAsBuffer() method, and its documentation tries to move people to use the Buffers object.
When contexts are small, use the readAsBuffer() function. Use the readAsBuffers() function when a context is large. The readAsBuffer() function requires a contiguous memory allocation. The readAsBuffer() function is faster but is more demanding on the memory system. The readAsBuffers() function does not require contiguous memory to populate the Buffers object.
Use of Buffers might be valid for some Non-XML processing, but when the application needs access to whole input I prefer buffer.
Good news is that the first (workable) Non-XML sample program can be found in readAsBuffer() documentation itself. It is a binary identity operation with error handling. Here you can see rAB.js:
Since binary identity is not that interesting lets see now the binary reverse operation from "... without Dataglue license" posting. Adding 5 lines to rAB.js does the job. here is reverse.js:
Now lets see what both do on sample input from "... without DataGlue license" posting:
$ od -tcx1 0in0put0 0000000 \0 2 b | ~ 2 b - \0 t h a t i 00 32 62 7c 7e 32 62 20 2d 00 74 68 61 74 20 69 0000020 s t h e q u e s t i o n \0 73 20 74 68 65 20 71 75 65 73 74 69 6f 6e 00 0000037 $ $ coproc2 rAB.js 0in0put0 http://dp2-l3:2227 -s | od -tcx1 0000000 \0 2 b | ~ 2 b - \0 t h a t i 00 32 62 7c 7e 32 62 20 2d 00 74 68 61 74 20 69 0000020 s t h e q u e s t i o n \0 73 20 74 68 65 20 71 75 65 73 74 69 6f 6e 00 0000037 $ $ coproc2 reverse.js 0in0put0 http://dp2-l3:2227 -s | od -tcx1 0000000 \0 n o i t s e u q e h t s i 00 6e 6f 69 74 73 65 75 71 20 65 68 74 20 73 69 0000020 t a h t \0 - b 2 ~ | b 2 \0 20 74 61 68 74 00 2d 20 62 32 7e 7c 62 32 00 0000037 $
OK, that was small input, lets process 10MB.
$ head --bytes 10M /dev/urandom > out $ du -sb out 10485760 out $ $ time ( coproc2 reverse.js out http://dp2-l3:2227 -s > out.2 )
real 0m1.048s user 0m0.012s sys 0m0.080s $ time ( coproc2 reverse.js out.2 http://dp2-l3:2227 -s > out.3 )
real 0m1.064s user 0m0.008s sys 0m0.075s $ time ( coproc2 rAB.js out.2 http://dp2-l3:2227 -s > out.4 )
real 0m0.294s user 0m0.014s sys 0m0.074s $ $ diff out out.3 $ diff out.2 out.4 $ $ sha1sum out out.2 8fe128844bc9a19aac275272e243a1c4ce6adc7b out 66590beec5c0e226ba9efc8436d119589a67e9d8 out.2 $
Last question to be answered is on the runtime of rAB.js and reverse.js on the 10MB input. That can be answered easily based on the ExtLatency logging target of coproc2gatewayscript again blog posting:
So the reverse operation on 10MB data (read binary data, revert, output result to context) took (908-137)=771msec.
The binary identity operation on 10MB data (read binary data, output input to context) took (147-134)=13msec.
There is a "front" XML-FW listening on port 6001, gziping the Non-XML input and dispatching to /gz XML-FW on port 6002 or /gz-hash XML-FW on port 6003.
In order to gzip(Non-XML) both rules have to set "gzip" as Output-Filter and Non-XML Processing to "on"(via Objects screen).
The gz-hash service on port 6003 has a Non-XML Transform Action with output context "xwa" generating the hash. Next is a Results action that attaches Non-XML input to "xwa" context. Because the Non-XML input is the gziped input from "front" service this does the right thing.
Last a Results action returns the "xwa" context to OUTPUT (inclusive the attachment, as MIME, see above).
Here is a combination of 4 screenshots of the whole gz-hash policy.
And this is stylesheet "hash.xsl" that hashes the Non-XML input data using dp:hash-base64() DataPower extension function: I got confirmation that the hash computed by DataPower matches the hash computed for same file by Java backend application.
To be able to directly "read" the binary data logged and my "brain-base64-decoder" not being that good stylesheet log-binary.xsl logs the binary data hexadecimally encoded (50% message length increase compared to base64-encoded).
See this little demonstration:
$ echo -n "test" | coproc2 log-binary.xsl - http://dp3-l3:2224 -s | od -tcx1 0000000 t e s t 74 65 73 74 0000004 $
This is the log entry from my box "dp3-l3.boeblingen.de.ibm.com":
As you may know binary or Non-XML data processing requires "DataGlue" license shown for "show license" CLI command.
XS40 as well as new XG45 (without DIM feature) do not have "DataGlue" license and allow for very limited Non-XML processing capabilities. One Non-XML feature which is present is "" action. It converts Non-XML CGI-encoded input (an HTTP POST of HTML
form or URI parameters) into equivalent XML message.
This is a demonstration of complete sample application "binary-reverse" -- it just reverses any (binary) input data and returns that. This works on boxes without DataGlue license like XS40, but on other boxes as well. The 0x00 bytes at begin, in the middle and at the end of sample input file are only present "to make it more difficult" ...
$ cat -v 0in0put0 ; echo ^@2b|~2b -^@that is the question^@ $ $ od -Ax -tcx1 0in0put0 000000 \0 2 b | ~ 2 b - \0 t h a t i 00 32 62 7c 7e 32 62 20 2d 00 74 68 61 74 20 69 000010 s t h e q u e s t i o n \0 73 20 74 68 65 20 71 75 65 73 74 69 6f 6e 00 00001f $ $ curl --data-binary @0in0put0 http://dp2-l3:2305 -s | cat -v ; echo ^@noitseuq eht si taht^@- b2~|b2^@ $ $ curl --data-binary @0in0put0 http://dp2-l3:2305 -s | od -Ax -tcx1 000000 \0 n o i t s e u q e h t s i 00 6e 6f 69 74 73 65 75 71 20 65 68 74 20 73 69 000010 t a h t \0 - b 2 ~ | b 2 \0 20 74 61 68 74 00 2d 20 62 32 7e 7c 62 32 00 00001f $
This is sample output from 2nd service in export presented last Monday in Frankfurt. Here you can see how to convert binary input data to base64 or hexbin encoded representation for "normal" stylesheet processing, as well as how to return arbitrary binary data from a stylesheet generated base64 string:
$ od -tcx1 te0t 0000000 t e \0 t 74 65 00 74 0000004 $ $ curl --data-binary @te0t http://dp2-l3:1405/base64; echo <base64>dGUAdA==</base64> $ $ curl --data-binary @te0t http://dp2-l3:1405/hexbin; echo <hexbin>74650074</hexbin> $ $ curl --data-binary dGUAdA== http://dp2-l3:1405/output -s | od -tcx1 0000000 t e \0 t 74 65 00 74 0000004 $
Here I tried to show the flow of "binary-reverse" sample application together with ALL intermedite contexts -- follow the arrows to follow the flow.
Please click on the image to open the BIG screenshot showing all the details!
The most important setting to make above service work is to enable "Non-XML Processing" for the rule(!) -- click for BIG screenshot:
Of course the service request type must be flagged "Non-XML" as well -- click for BIG screenshot:
Last, but not least, stylesheet binary-reverse.xsl which you can also find on 1st screenshot above:
Have fun with binary data and attachment processing,
Before I did the zip2html posting last December I had a solution without the cool execution of an attached stylesheet.
I did extract all files needed, and because storing on local filesystem was not possible without going through xml-mgmt,
made use of a self-implemented "file cache". In difference to normal backend response caching in that scenario the
files are available on the "client side" and caching these was not that easy.
But because I finally came up with a purely attachment based solution for zip2html tool there was no need for the "file cache" anymore.
I had discussions with a Techsales colleague at that time, and he just told me that he needs to cache client data and asked for my cache.
So I seperated out my "client request cache", reworked it and post it here today for anybody who needs to cache client data on DataPower.
By default the document count is 5000 (which you may want to reduce) and the document cache size is 0 (disabled, you need to increase).
The maximal size of a document cache is 161MB, but keep in mind that anyconfigured document cache memory is lost for transactiuons.
I did set document caching to fixed with time to live (TTL) of 59 seconds for URLs matching "*cache*", all other URLs are not chached.
Before going into the details, that is what "you get".
First we cache (POST) the document with content "test123" under URL .../cache/0001.
Then we retrieve it (GET) two times successfully.
The big image Screenshot.png gets cached under URL .../cache/002 then.
And two more get requests get it back from the cache.
The "word-count" (wc) commands prove that the received and original sizes (273657 bytes) are identical:
This is screenshot of "Status->XML Processing->Document Cache" status provider after above commands, 5 documents cached:
And this is screenshot of "Status->XML Processing->Document Status" status provider after above commands.
Here we can see that the small "test123" document gets cached as one document.
The big Screenshot.png gets cached as 3 parts (only last part less than 127.000 bytes in size).
And the concatenated complete compressed base64 string for retireval under URL .../cache/002:
Again, before going into the details, find a 220.127.116.11 domain backup and its zip2html tool output attached here.
The three files stored in local:/// directory of that domain (shown further below) are available inlined in the "(all)" link of zip2html file!
Now the technical problem needed to overcome is the following, see HTTP/1.1 spec RFC 2616:
only GET (9.3) and HEAD (9.4) method responses are cacheable
POST (9.5), PUT (9.6) and DELETE (9.7) method responses are not cacheable
Since HEAD is not really helpful in getting client data put into DataPower document cache, only GET remains.
So a caching service on DataPower needs to "send" the client data using GET to a helper service on the same box to do the caching.
Unfortunately the HTTP header size is limited, so although compressing the client data first it may not fit into a single request.
The solution I have implemented is this:
sends some (binary) data of arbitrary size to DataPower for caching under a specific URL
requests a cached version for a given URL
DataPower "cache" service with same XML manager (same document cache as "cache-loopback" service) :
compresses received data by dp:deflate() resulting in a big base64 string
splits that base64 string into parts of size 127.000 bytes (in order to get them through the HTTP header "eye of the needle")
sends each of these parts to 2nd DataPower service via GET calls and passes the 127000 bytes in HTTP request header
sends a final GET request for combining all those parts on chained DataPower service for later retrieval
when receiving a "normal" GET request for a file in cache, return that file (uncompressed of course)
chained DataPower "cache-loopback" service with enabled document cache (same XML manager as "cache" service):
extract the 127.000 bytes received by a "part" GET request and "return" them which stores them into document cache
when receiving a "combine" GET request from 1st service, reading all 127.000 byte parts for that request from
document cache and "return" their concatenation, which stores the whole compressed document into document cache
So what the file cache does is storing files from the client side into DataPower document cache circumventing the problem
that you cannot cache files by a POST request -- that's all -- see the demonstrations above, and try out yourself !
I did import and test the above attached backup on 18.104.22.168, 22.214.171.124, 126.96.36.199 and 188.8.131.52 firmwares.
(for me this is the first time, that all rule "matches" are HTTP-method matches (for GET or POST))
backup-b4.zip is a backup of the 4 domains "coproc2", "demo", "empty" (domain created, no config added) and "zip2html".
"demo" domain contains files of different types which your browser's application/zip helper should be able to correctly classify.
zip2html.zip (428.1 KB) is the XML FW service export of zip2html tool.
Fixed links above and below that got broken by developerWorks DataPower forum migration last month.
Add this is new export that can be used on all appliances including XB6 and XM70. zip2html-mpgw.zip (492.9 KB) is the MPGW service export of zip2html tool.
Use this URL in your web browser: http://yourBox:2228/form
zip2html.xsl (22.0 KB) is the stylesheet discussed in detail under "Technical discussion" at the bottom.
With this tool even a full device backup is processed in seconds in case "files=none".
Be careful processing big backups with option "files=all" as that might take few minutes to complete.
Just having "default" domain as part of a backup takes time in case "files=all" -- all files in "store:///" folder belong to default domain and get added.
HTML form input
This is how the HTML form looks like.
It allows to select the backup (or export) from your filesystem,
Then you have to select whether you want the files to be included in the generated HTML page in "data" links. This allows inspecting the files by the "application/zip" helper application registered with your browser.
And you have to select whether the attached or the onboard clixform.xsl should be used.
The clixform.xsl converts the "export.xml"s in the .zip archive to a sequence of CLI commands (Command Line Interface) during the initial phase of a DataPower import. This output is what you will see in the yellow background sections.
Find more explanations on which clixform.xsl will be used by DataPower by clicking on the screenshot or the real HTML form opened in you browser.
You open the HTML form with "/form" enpoint of your DataPower service, eg: http://dp3-l3:2229/form
HTTP GET to installed service
You can retrieve the HTML form with correct "post" link to the DataPower box by a simple HTTP GET.
You can then use the stored form.html instead of the HTML form from DataPower box itself.
$ curl -s http://dp3-l3:2229/form >form.html
HTTP POST to installed service
Selection of the clixform mode is done by the endpoint, the files selection is done in the query part of the URL:
As above the selection of the clixform mode is done by the endpoint and the files selection is done in the query part of the URL.
But this time stylesheet clixform.xsl is sent together with the .zip archive to the Non-XML endpoint of coproc2 service (find details here).
Here you can see the difference in generated output for a DataPower backup (left) and a DataPower export (right):
For backups the "domains" section contains the list of domains contained in the backup.
After each line the "go" link allows you to quickly get to the start of the domain's information.
This is what the domain output looks like.
Above the table you find the domain name as well as a link to the top of the document (eg. for selecting another domain by "go" link).
The table contains a configuration and a files section.
You can easily jump between both by the "files" and "configuration" links,
The "(no)" before "files" indicates that this output was generated with "files=none" selection.
For "files=all" it looks like this: "(all)"
If your browser allows you to open "data:..." links you can inspect the files by clicking on "all" link.
The "application/zip" helper application of your browser will open the base64 encoded .zip archive in the all link then.
With the description above you have everything you need to use zip2html tool.
If you are interested in some of the technical details of zip2html.xsl you are right here.
Stylesheet zip2html first creates a dummy context named "swa".
It then attaches the received backup.zip under Content-ID "sys" (cid:sys).
Next step is the extraction of export.xml from top level directory of the archive.
Then the general information HTML table at the top gets created.
Now iteration over the domains found in export.xml is done.
Each domain is contained in top level directory of received archive.
Eg. domain "dom1" will be read from cid:sys and attached as cid:dom1.zip.
The attaching allows to read export.xml from dom1.zip as well as to access
the files information for the domain by querying the attachment manifest.
In addition the files can be extracted and copied over to a newly created archive in case "files=all".
Finally the domain attachment gets removed by dp:strip-attachments before the next domain is processed.
There is a directory dp-aux in received archive (attachment cid:sys).
Above you see the inclusion hierarchy of the files in that directory.
Applying dp:transform(_,_) to clixform.xsl requires a URL as the reference to it.
"attachment://" is the protocol, "swa" is the context, "Archive=zip" says we have a zip-archive (DataPower has support for tar, too).
Finally "Filename=" points to a file in the archive, allowing to access all files at all directory levels.
The output of dp:transform() from clixform.xsl applied to export.xml is used to generate the CLI configuration output of each domain
(with yellow background color).
See the use of "localZip" if you need to create an empty archive yourself.
For each domain a new empty zip-archive gets attached to receive all files selected by "files=all" as "cid:aux".
Finally this zip-archive is read to generate the "data:application/zip;base64,..." links by just using dp:binary-decode().
At the bottom of stylesheet zip2html.xsl you find a section allowing to process "multipart/form-data" directly in the stylesheet.
swaform tool or func:filename() are needed in case you need to process binary file uploads, convert-http action cannot deal with those.
zip2html is realized by a Non-XML loopback XML firewall with "Process Messages Whose Body Is Empty" set to "on" on advanced tab.
The endpoint "/form" is for an HTTP GET request.
The endpoints "/attached", "/onboard" and "/mime" are for HTTP POST requests.
Now comes a tricky part, the dp:input-mapping needed to process the POST request input data requires data on its input context.
But there is no input data for the "/form" GET request using the same stylesheet.
On the right you see the simple solution I found for this mixed GET/POST processing problem for Non-XML input processing stylesheets:
Have two rules, one for "/form" endpoint, and another for everything else.
In "/form" rule have input context NULL for "Transform Binary" action.
In the other rule have input context INPUT -- that's it.
stylesheet classes.xsl which generated that diagram is attached and discussed a little bit.
You might be interested in "graph extraction" from XML data and Depth-First-Search traversal of the extracted graph which was used to generate the "block" hierarchy diagram. Also the simple and nice heuristic which allows
to allocate exactly the vertical space needed for a node in the diagram is interesting -- in "XML speak" instead of "graph speak"
Accessing the Non-XML data is possible by a binary transform action. WTX provides QUOTEDTOTEXT() function, but what I want to show here is a simpler way to do it.
The solution is by simply replacing the "multipart/signed" Content-Type by "multipart/related" with the correct "boundary" setting. because "multipart/related" messages can be processed by DataPower. This Content-Type rewrite technique was also used in swaform tool allowing to process HTTP form data with binary file uploads (multipart/form-data):
a binary transform action with stylesheet Signed2Related.xsl, with input INPUT and output NULL
a results action with input INPUT
a backend URL of http://127.0.0.1:port2
And you will have a
loopback XML service listening on 127.0.0.1:port2
attachment processing set to "Skip"
store:///identity.xsl transform action
XML Manager with Minimum Output Escaping rule in Compile Options policy (for showing German Umlaute unescaped below)
That's all. I tested with MPGW 1st service and 2nd service as MPGW, XML FW or XSL Accelerator successfully (on 3.7.3, 3.8.0 and 4.0.1 firmware).
Here is the output generated by the service:
$ curl --data-binary @Qp.mime http://dp2-l3:8121; echo <?xml version="1.0" encoding="UTF-8"?> <sample>If you believe that truth=beauty, then surely mathematics is the most beautiful branch of philosophy. <umlaute>äöüÄÖÜß</umlaute></sample> $
<?xml version=3D"1.0" encoding=3D"ISO-8859-1"?> <sample>If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.=0A= <umlaute>=E4=F6=FC=C4=D6=DC=DF</umlaute>= </sample> ------=_Part_2_12345 Content-Type: application/pkcs7-signature; name=smime.p7s; smime-type=signed-data Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature
This should be base64 sign part data. Missing intentionally for qouted-printable demo. ------=_Part_2_12345--
<!-- Allow accessing the decoded (quotable-printable) body-part in chained service with "Skip" attachment processing by "multipart/related" --> <dp:set-http-request-header name="'Content-Type'" value="concat('multipart/related; type="text/xml"; ', 'boundary="',$boundary,'"')" />
The advantage of dp:decode(_,'base-64') is that it does a UTF-8 validation and preventes negative effects.
I cannot give a more complete example until having access to my boxes in the IBM network which I do not have (and do not want to have :-) ) by my cell phone or the Internet terminal in our youth hostel in most nordic German town List on North Sea island Sylt ...
In addition the posting shows how to make use of netcat tool as simplest form of a rawTCP server. This
is useful in debugging issues between DataPower and other TCP servers
-- get it working with netcat and you know that DataPower side is fine.
rawTCP2HTTP never was a perfect solution, but things have become really bad:
While the passthru/passthru trick with FSH timeout of 1sec works for clients not sending a FIN with rawTCP2HTTP running on <184.108.40.206 firmwares (there DataPower did not send a FIN to client when FSH timeout occured), networking behavior changed starting with 220.127.116.11 firmware. Now connection is terminated completely on frontside timeout, resulting in an Internal error response to client.
What remains is that sending rawTCP binary message from client terminated with FIN (like netcat command "nc" does) still works even with 7.0.0.x firmware.
There were Requests For Enhancement, but I cannot find any of them anymore in RFE tool :-(
In writing this up for a WSTE webcast below solution for rawTCP Non-XML processing became nearly trivial(!).
And now it only makes use of definitely supported features (there is nothing besides passthru and raw XML Handler).
See the details on this slide (packet capture screenshot on the following slide):
since quite some time there exists an Enhancement request for a rawTCP Non-XML Front Side Handler
1.5 years ago I developed a prototype of a rawTCP2HTTP converter in DataPower Firmware which allows to convert
rawTCP Non-XML data to HTTP chunked data which can then be processed by any (DataPower) HTTP service
(see IBM publication Bridging raw TCP binary data to HTTP, http://priorartdatabase.com/IPCOM/000197959)
So it seems that processing rawTCP binary data is not possible with DataPower appliances.
Even one week ago I would have said "it is impossible to process Non-XML rawTCP data on DataPower appliances".
But that is not true, and this posting is just another one in the series of "does not work out-of-the-box on DataPower but can be made working"
So now that we know that it can be done the question is how.
It turns out to be really easy to create a rawTCP2HTTP service as MPGW service on DataPower.
Even though XA35 and XS40 do not allow for binary data processing they at least allow for Non-XML UTF-8 data processing by the first link of above list.
We make use of the fact that binary input of Non-XML transformations needs to be "consumed", otherwise the default behavior is that it will be copied to the output (see 3rd and 1st link above).
In addition we use a rawTCP XML Front Side Handler as that is the only one not complaining about protocol violations.
And then we make sure that INPUT is guaranteed to not being consumed -- thats all.
So this is rawTCP2HTTP MPGW service:
Front Side Protocol: Stateless Raw XML Handler listening on port1
Request Type: Non-XML
Type: static backend
Backend URL: http://127.0.0.1:port2
Response Type: Pass-Thru
And this is rawTCP2HTTP policy:
Results action from NULL to OUTPUT
In the example port1 is 2091 and port2 is 2092.
Sending data with a HTTP POST to DataPower can be easily done with with cURL or ApacheBench tool:
curl --data-binary @file ... http://dpbox:port1
ab -p file ... http://dpbox:port1
Sending rawTCP data can be done by Netcat tool (nc), this statement sends file to service listening on port at host:
nc host port < file
To have a basis for measurements I chose a very simple Non-XML example service (toHexb.xsl).
It just returns the (Non-XML) input as hexadecimal string, see posting DataPower "binary" node type (binaryNode) for details:
Now sending a request is not that spectacular -- we get the same hexadecimal output for file te3t we can also get by octal dump (od) tool:
[stammw@bl3d2027@de ~]$ nc 192.168.10.1 2091 <te3t ; echo
[stammw@bl3d2027@de ~]$ od -Ax -tcx1 te3t
000000 t e 003 t
74 65 03 74
But this is really over rawTCP!
See the following screenshot of a "All Interfaces" (3.8.2 firmware) packet capture.
The Filter for ports 2091 and 2092 just selects the "interesting" part of packect capture.
At the bottom left you see the command execution and output.
On the bottom right you see Follow TCP Stream for services on port 2091 (rawTCP) and 2092 (HTTP).
Hexadecimally displayed content of packet 116 is just the HTTP post of 2091 service to 2092 service after FIN ACK has been received.
This is the first restriction of rawTCP2HTTP service -- it cannot deal with HTTP persistent connections.
Time stamps for packets 110 and 128 show that the transaction took 2.003ms.
But that is not the value which is important.
Important is the average transaction overhead of rawTCP service on 2091 over HTTP service on 2092.
There are quite some tools like ApacheBench (ab) mentioned above for getting performance information of HTTP services.
I was not able to find a similar tool for benchmarking rawTCP performance.
Therefore I wrote one myself based on TCPClient.c (http://en.wikipedia.org/wiki/Berkeley_sockets#Client).
It was able to benchamrk rawTCP as well as HTTP data (by sending the necessary HTTP Post data).
For persistent and non-persistent connection HTTP benchmarking my numbers were similar to ApacheBench results (-k option for "keepalive").
These are the average results for sending 20000 requests with a concurrency of 20 to a 9004 DataPower from a directly connected Laptop:
0.1985ms per request for rawTCP service on 2091
0.0809ms per request for HTTP service on 2092 (without persistent connections)
0.0647ms per request for HTTP service on 2092 (with persistent connections)
So the overhead of rawTCP2HTTP service is 0.1985ms - 0.0647ms = 0.1338ms.
While for this very simple example service the overhead time dominates the total transaction time, 0.14ms overhead for rawTCP processing enablement is a good price for many real world Non-XML processing scenarios -- until now this is the only way to process rawTCP Non-XML data on DataPower appliances.
These are reasons why the "real" Non-XML rawTCP Front Side Handler of above mentioned Enhancement Request is still needed:
enabling persistent connections
reducing/eliminating 0.1338ms overhead of adding/removing of HTTP headers
having to use "XML rawTCP Front Side Handler" for dealing with "Non-XML data" sounds "interesting" at least :-)
Last, but not least, see that NonXML rawTCP processing on a XA35 or XS40 DataPower appliance is possible by above technique and
the technique from 1st link posting above (by "prepend=" trick with convert-http action):
$ od -Ax -tcx1 test
000000 t e s t 1 2 3 t e s t 1 2 3 t e
74 65 73 74 31 32 33 74 65 73 74 31 32 33 74 65
000010 s t \n
73 74 0a
$ nc cosmopolitan 2091 < test | tidy -q -xml
<?xml version="1.0" encoding="utf-8"?>
$ soma/doSoma admin soma/version.xml cosmopolitan:5550 | xpath++ "//Version/text()" -
Enter host password for user 'admin':
I tested rawTCP2HTTP on all supported release branches.
Finally I just tested it on a box with 18.104.22.168 firmware without problems.
22.214.171.124 DataPower Firmware was released in 1H2009 -- since then rawTCP2HTTP service waited to be discovered ... ;-)
a "prepend" service making use of special binary data processing behavior mentioned in my previous Blog posting Sending zip archives to DataPower (this just converts Non-XML input data base64string to prepend=base64string)
a normal Non-XML service with a convert-http action to convert this "HTTP-Form" input to XML (convert-http action does not need DataGlue license)
This is just another example for "does not work out-of-the-box on DataPower but can be made working".