DataPower allows processing of XML processing in all ICU supported encodings.
You can find a list (including all alias names) here, 321 different groups with more than 1000 aliases:
http://demo.icu-project.org/icu-bin/convexp
GatewayScript does support the same few encodings nodejs does support:
- ascii, base64, binary, hex, utf16le, ucs2, utf16le, utf8
In this developerWorks forum posting the question was stated, how to XML parse base64 encoded codepage 1141 ebcdic string:
https://www.ibm.com/developerworks/community/forums/html/topic?id=d659ab27-563a-428e-9ec1-a2970e06ad67#d659ab27-563a-428e-9ec1-a2970e06ad67
In my response I showed a Non-XML stylesheet solution, which is required for all ebcdic type encodings.
[ Non-ebcdic encoding base64 strings can be parsed easily by "dp:parse($inp, 'base64')" ]
Customer came back that his appliance misses DataGlue license needed for processing Non-XML (DataPower proprietary) stylesheeet.
So here I will show how you can parse arbitray ICU encodings from above list in GatewayScript. Some work is needed, though, which I describe now. This is a followup on blog posting "Processing ISO-8859-15 data in GatewayScrip":
https://www.ibm.com/developerworks/community/blogs/HermannSW/entry/Howto_adapt_ICU_code_page_conversion_for_GatewayScript
First you need to open the code page you are interested in with your browser, this is for IBM01141 codepage, you can click the link on the page I pointed to above:
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-1141_P100-1997&s=ALL
Next you select codepage layout table rows "00" to "F0" comletely and copy them.
Then you open LibreOffic Calc, create a new spreadsheet and paste your clipboard. Calc will recognize your data and create rows 1-16 and columns A-R from it.
First you delete column R (not needed), but keep not needed column A (that will generate needed commata later).
Then you add this formula into cell B18:
=+CONCATENATE("0x",RIGHT(B1,4))
Now you select area B18-Q33, then Edit->Fill->Down and then Edit->Fill->Right.
Next you check for conversions that did not work, P25 in our case and replace by correct value you lookup in codepage layout as 0x003D for codepoint 7E.
Finally you save the document with delimiter ',' and empty text delimiters (no quotes).
Then you take the last 16 rows of that file into your C or GatewayScript program for defining a 256 character mapping array.
The two boolean variables determine whether base64 encoded or raw IBM01141 encoded data is processed, and whether the decoded result will be returned as is, or will be XML parsed.
This is the sample input I use, the decoded cp1141 ebcdic does not look nice in browser:
$ cat ab.1141.xml.b64
TG+nlJNApYWZoomWlX5/8Uvwf0CFlYOWhImVh35/ycLU8PHx9PF/b24lTIFu8UyCbvJMYYJu80xhgW4=$
$ base64 -d ab.1141.xml.b64 ; echo
Lo���@�������~�K�@��������~��������on%L�n�L�n�La�n�La�n
$
This is decoding without XML parsing:
$ coproc2 readIBM01141.js ab.1141.xml.b64 http://dp3-l3:2227 -s ; echo
<?xml version="1.0" encoding="IBM01141"?>
<a>1<b>2</b>3</a>
$
$ coproc2 readIBM01141.js ab.1141.xml.b64 http://dp3-l3:2227 -s | od -tx1
0000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31
0000020 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 49 42
0000040 4d 30 31 31 34 31 22 3f 3e 0a 3c 61 3e 31 3c 62
0000060 3e 32 3c 2f 62 3e 33 3c 2f 61 3e
0000073
$
As you can see the result is UTF-8 encoded, and therefore Non-XML (wrong@encoding in XML declaration).
That is the reason why XML.parse() has to skip the XML declaration to successfully parse as XML.
Here you see correct XML parsed result with parseAsXML = true
$ coproc2 readIBM01141.js ab.1141.xml.b64 http://dp3-l3:2227 -s ; echo
<?xml version="1.0" encoding="UTF-8"?>
<a>1<b>2</b>3</a>
$
Hermann.
https://stamm-wilbrandt.de/en/forum/readIBM01141.js
session.input.readAsBuffer(function (error, buf) {
"use strict";
var str, i, n, buf2;
var base64decode = true, parseAsXML = true;
if (base64decode) {
str = buf.toString();
n = buf.write(str, 0, str.length, "base64");
} else {
n = buf.length;
}
buf2 = new Buffer(2 * n);
if (error) {
// handle error
session.output.write(error.errorMessage);
} else {
// Conversion from unicode base plane to UTF16LE is really simple.
for(i = 0; i < n; i += 1) {
// write [converted] value n as 16bit little endian word
buf2.writeUInt16LE( map1141[buf[i]], 2 * i );
}
str = buf2.toString("utf16le");
if (parseAsXML) {
session.output.write( XML.parse(str.substr(str.indexOf("?>") + 2)) );
} else {
session.output.write( str );
}
}
});
var map1141 = [
0x0000,0x0001,0x0002,0x0003,0x009C,0x0009,0x0086,0x007F,0x0097,0x008D,0x008E,0x000B,0x000C,0x000D,0x000E,0x000F
,0x0010,0x0011,0x0012,0x0013,0x009D,0x0085,0x0008,0x0087,0x0018,0x0019,0x0092,0x008F,0x001C,0x001D,0x001E,0x001F
,0x80,0x81,0x0082,0x0083,0x84,0x000A,0x0017,0x001B,0x0088,0x0089,0x008A,0x008B,0x008C,0x0005,0x0006,0x0007
,0x0090,0x0091,0x0016,0x0093,0x0094,0x0095,0x0096,0x0004,0x0098,0x99,0x009A,0x009B,0x0014,0x0015,0x009E,0x001A
,0x20,0x00A0,0x00E2,0x007B,0x00E0,0x00E1,0x00E3,0x00E5,0x00E7,0x00F1,0x00C4,0x002E,0x003C,0x0028,0x002B,0x0021
,0x0026,0x00E9,0x00EA,0x00EB,0x00E8,0x00ED,0x00EE,0x00EF,0x00EC,0x007E,0x00DC,0x0024,0x002A,0x0029,0x003B,0x005E
,0x002D,0x002F,0x00C2,0x005B,0x00C0,0x00C1,0x00C3,0x00C5,0x00C7,0x00D1,0x00F6,0x002C,0x0025,0x005F,0x003E,0x003F
,0x00F8,0x00C9,0x00CA,0x00CB,0x00C8,0x00CD,0x00CE,0x00CF,0x00CC,0x0060,0x003A,0x0023,0x00A7,0x0027,0x3D,0x0022
,0x00D8,0x0061,0x0062,0x0063,0x0064,0x0065,0x0066,0x0067,0x0068,0x0069,0x00AB,0x00BB,0x00F0,0x00FD,0x00FE,0x00B1
,0x00B0,0x006A,0x006B,0x006C,0x006D,0x006E,0x006F,0x0070,0x0071,0x0072,0x00AA,0x00BA,0x00E6,0x00B8,0x00C6,0x20AC
,0x00B5,0x00DF,0x0073,0x0074,0x0075,0x0076,0x0077,0x0078,0x0079,0x007A,0x00A1,0x00BF,0x00D0,0x00DD,0x00DE,0x00AE
,0x00A2,0x00A3,0x00A5,0x00B7,0x00A9,0x0040,0x00B6,0x00BC,0x00BD,0x00BE,0x00AC,0x007C,0x00AF,0x00A8,0x00B4,0x00D7
,0x00E4,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,0x0047,0x0048,0x0049,0x00AD,0x00F4,0x00A6,0x00F2,0x00F3,0x00F5
,0x00FC,0x004A,0x004B,0x004C,0x004D,0x004E,0x004F,0x0050,0x0051,0x0052,0x00B9,0x00FB,0x007D,0x00F9,0x00FA,0x00FF
,0x00D6,0x00F7,0x0053,0x0054,0x0055,0x0056,0x0057,0x0058,0x0059,0x005A,0x00B2,0x00D4,0x005C,0x00D2,0x00D3,0x00D5
,0x30,0x0031,0x0032,0x0033,0x0034,0x0035,0x0036,0x0037,0x0038,0x0039,0x00B3,0x00DB,0x005D,0x00D9,0x00DA,0x009F
];