Over the years I was (and still am) interested in non-XML data processing with DataPower.
I did numerous developerWorks DataPower forum postings on that, as well as WSTE webcasts.
And in this blog now 23 postings are tagged with non-xml:
https://www.ibm.com/developerworks/community/blogs/HermannSW/tags/non-xml
This post was motivated by Ruben's question on GatewayScript binary data processing methods:
https://www.ibm.com/developerworks/community/forums/html/topic?id=55fec5ff-4b1e-449c-adbc-0c51f4b0839c#48c75129-0e7b-45ac-8807-d31a5f2618fd
The whole thread discussed that handcrafted FFDs are not supported and referred back to this 2012 posting on the options. It also listed the only Enhancement Request that has been done since 2007 for FFD processing (FFD PMRs were fixed of course).
Then I remembered my Binary data processing without DataGlue license! blog posting allowing to do binary data processing without WTX and without FFD. I really like that blog post's graphical flow diagram, have a look.
Further below I will show how easily binary data processing can be done with GatewayScript (available with 7.0.0.0 firmware). But before lets summarize all DataPower Non-XML data processing options here in one place:
- WTX
-
Contivo FFDs: you need Contivo Analyst product,
no handcrafted FFDs are supported (you cannot raise PMRs against handmade FFDs) - there are 4 exceptional, simple FFDs that are supported, see the WebCasts on basics and advanced techniques
-
"Binary data processing without DataGlue license!" technique, with
- ... <hexbin>...<hexbin> --(XSLT)-- base64 ...
- ... <hexbin>...<hexbin> --(XQuery)-- base64 ... (with 6.0.0.0 or higher firmware, allows for XPath 2.0)
- GatewayScript
One comment on option 4. While this works without Dataglue license (these days a XG45 without DIM option) you have to "pay" the price in form of added latency and memory consumption of the attachment processing needed by that technique.
GatewayScript Non-XML data processing
The simple GatewayScript data structure for processing binary data is the buffer object.
For reading binary input we use readAsBuffer() method, and its documentation tries to move people to use the Buffers object.
When contexts are small, use the readAsBuffer() function. Use the readAsBuffers() function when a context is large. The readAsBuffer() function requires a contiguous memory allocation. The readAsBuffer() function is faster but is more demanding on the memory system. The readAsBuffers() function does not require contiguous memory to populate the Buffers object.
Use of Buffers might be valid for some Non-XML processing, but when the application needs access to whole input I prefer buffer.
Good news is that the first (workable) Non-XML sample program can be found in readAsBuffer() documentation itself. It is a binary identity operation with error handling. Here you can see rAB.js:
Since binary identity is not that interesting lets see now the binary reverse operation from "... without Dataglue license" posting. Adding 5 lines to rAB.js does the job. here is reverse.js:
Now lets see what both do on sample input from "... without DataGlue license" posting:
$ od -tcx1 0in0put0
0000000 \0 2 b | ~ 2 b - \0 t h a t i
00 32 62 7c 7e 32 62 20 2d 00 74 68 61 74 20 69
0000020 s t h e q u e s t i o n \0
73 20 74 68 65 20 71 75 65 73 74 69 6f 6e 00
0000037
$
$ coproc2 rAB.js 0in0put0 http://dp2-l3:2227 -s | od -tcx1
0000000 \0 2 b | ~ 2 b - \0 t h a t i
00 32 62 7c 7e 32 62 20 2d 00 74 68 61 74 20 69
0000020 s t h e q u e s t i o n \0
73 20 74 68 65 20 71 75 65 73 74 69 6f 6e 00
0000037
$
$ coproc2 reverse.js 0in0put0 http://dp2-l3:2227 -s | od -tcx1
0000000 \0 n o i t s e u q e h t s i
00 6e 6f 69 74 73 65 75 71 20 65 68 74 20 73 69
0000020 t a h t \0 - b 2 ~ | b 2 \0
20 74 61 68 74 00 2d 20 62 32 7e 7c 62 32 00
0000037
$
OK, that was small input, lets process 10MB.
$ head --bytes 10M /dev/urandom > out
$ du -sb out
10485760 out
$
$ time ( coproc2 reverse.js out http://dp2-l3:2227 -s > out.2 )
real 0m1.048s
user 0m0.012s
sys 0m0.080s
$ time ( coproc2 reverse.js out.2 http://dp2-l3:2227 -s > out.3 )
real 0m1.064s
user 0m0.008s
sys 0m0.075s
$ time ( coproc2 rAB.js out.2 http://dp2-l3:2227 -s > out.4 )
real 0m0.294s
user 0m0.014s
sys 0m0.074s
$
$ diff out out.3
$ diff out.2 out.4
$
$ sha1sum out out.2
8fe128844bc9a19aac275272e243a1c4ce6adc7b out
66590beec5c0e226ba9efc8436d119589a67e9d8 out.2
$
Last question to be answered is on the runtime of rAB.js and reverse.js on the 10MB input. That can be answered easily based on the ExtLatency logging target of coproc2gatewayscript again blog posting:
...,AXF=137,AGS=908,...
...,AXF=134,AGS=147,...
So the reverse operation on 10MB data (read binary data, revert, output result to context) took (908-137)=771msec.
The binary identity operation on 10MB data (read binary data, output input to context) took (147-134)=13msec.
That means that the 5 lines of diff took 758msec:
$ diff rAB.js reverse.js
6a7,11
> for(var i=0, j=buffer.length-1; i<j; ++i, --j) {
> var b = buffer[i];
> buffer[i] = buffer[j];
> buffer[j] = b;
> }
$
Btw, all I wanted and needed to know on buffer object member functions, readAsBuffer() and other stuff I did find in KnowledgeCenter.
Btw2, float and double buffer function like buf.readDoubleBE() are very powerful (reading encoded integer numbers was possible in XSLT, see slide 22 of this WSTE webcast).
Hermann <myBlog/> <myTweets/> | <GraphvizFiddle/> | <xqib/> | <myCE/> <myFrameless/> |