IBM Support

IBM Sterling B2B Integrator and Character Encodings

Technical Blog Post


Abstract

IBM Sterling B2B Integrator and Character Encodings

Body

When you consider implementing Processes and Transactions in Sterling Integrator which make use of different Character Encodings there are several things to keep in mind.

First of all, what are encodings. These are Human readable Encodings like these:

TEXT Language
ᎤᏙᎯᏳ ᎣᏍᏗᎸᏉᏗᎭ ᎠᎴ ᎡᎳᏗ ᏃᏍᏓᏛᏁ ᎣᏍᏓ Cherokee
யூனிக்கோடு குறியேற்பையே சீர்தரமாக Tamil
বাগনানে আক্রান্ত প্রাক্তন সিপিএম বিধায়ক Bengali

But when a machine looks at data it only sees numbers. It does not have any idea about the meaning of those numbers if we do not define a Context for these numbers. This Context is called the Character Encoding

Here are a few Examples of how to write Hello World

Binary Context
c885 9393 9640 a696 9993 8425 Hello World in EBCDIC
4865 6c6c 6f20 776f 726c 640a Hello World in ASCII

Both ways to write Hello World are correct, but if the Machine just looks at the numbers without knowing how to interpret the numbers it would not be able to read that data.

So if a machine is not be able to recognize "Hello World" without knowing about the Encoding to use, how should it be able to process Data that does contain international Languages like e.g. Cherokee

Challenges

  • Without any extra information it is almost impossible for a parsing System to find out in what way the data is encoded
  • Encoding information can be added either implicitly or explicitly
  • Examples for implicit:
    • Every file that gets in through this FSA is ASCII by definition
    • Every file that is written into a mailbox with FTP Server Adapter and has the Filename "Orders.edi" is UTF-8 encoded
  • Examples for explicit:
    • Encoding info is in the envelope, e.g. X.400
    • Or it is specified with encoding and content type e.g. HTTP

Implications for ISBI

  • You can not just pickup a file from e.g. the filesystem or via FTP and expect SI to detect the right encoding
  • Some Services and Adapters behave a bit clumsy when it comes to proper handling of encoding.
  • Most of the Business Data that we deal with is still Ascii
  • Most of Consulting and Customers do not care too much since they do not face issues with their ASCII based WFs
  • But more and more clients are going „global“ adding Partners in EMEA or just add a new shipping adress
  • 住所          〒103-8510 東京都中央区日本橋箱崎町19-21 (This is the IBM Tokio Address)

How to succeed

  • Always make sure that there is a common agreement between the customer and the Partner what Encoding / Characterset they want to use to exchange data
  • Make sure that for every BP step you have an idea of which encoding the Documents are in
  • Use the built-in tool in S:I to work with the encoding, if needed.
  • Be aware that the Operating Systems Encoding Settings will influence the default behavior of S:I for some Adapters ( e.g. FS Adapter)

Tools and Services

  • Some Services/Adapters allow to specifiy the Outbound Encoding
    • E.g. SAP or OdetteFTP Adapters allow to specify an Outbound Encoding
  • GetDocumentInformationService
    • Can be used to check if and what CharacterEncoding is set on a Process Data
    • Can be used to change the CharacterEncoding in the Envelope of a Process Data WITHOUT changing the document content
  • EncodingConversion Service
    • Can be used to change the Character Encoding in a Process Data
    • !!! But it does not change the information in the Envelope

Examples

How to change Envelope Data

Problem You receive a File with your SFTP Server. Your Partner told you that the content of the File is encoded in UTF8 but as soon as you want to process the file you have issues reading the file when it comes to the special characters
1st Solution You process the File without using the Encoding Information that you have. Your partner told you that the File is written in UTF8. The SFTP Server stores the file in your system without any Encoding Information. You can change the SFTP Server Service configuration to assume a default Encoding like UTF-8. This means that for all inbound SFTP Files, the SFTP Server Adapter would change the Character Encoding in the Document Envelope to UTF8.
Alternative Solution If you only want to change the Character Encoding for files that are processed by a certain BP you can use the GetDocumentInformationService to change the Document Envelope Information

How to re-code Data

Problem Your internal processing turned out to be best in UTF8. All your Flows run fine, but certainly a new Trading Partner in Asia requests you to send the new Order Data via FTP and encoded in Shift_JIS, a common encoding in Japan
Solution As one of the last steps before the FTP PUT in your BP you can use the Encoding Conversion Service to have the payload converted from UTF8 to SJIS. This will change the formatting of the content but will not change the Document Envelope

It is also possible to use both Services in a row in a BP to prepare a document for services that both look at the Envelope and at the Payload. Some Webservices would do that to determine the actual mime-encoding while wrapping/enveloping the document for transport.

Examples

Envelope Modifications

Let us assume the following Problem . You receive a testfile from your Trading Partner, but the content looks completely corrupt


<test>
�����s�������{�����蒬
</test>

You have the bilateral agreement with your partner that he is sending you the Data in SHIFT-JIS. You add the GetDocumentInformation Service in your BP to change the Envelope Information for this Document. This sets the correct Character Encoding in the Envelope. From there on the BP can process the data.


  <operation name="SetContentType"> 
   <participant name="GetDocumentInfoService"/> 
   <output message="xout"> 
	<assign to="." from="*"/> 
	<assign to="DocumentCharEncoding">SJIS</assign> 
	<assign to="updateMetaDataOnly">true</assign> 
   </output> 
   <input message="xin"> 
	<assign to="." from="*"/> 
   </input> 
  </operation>


When you look at your Document after this Step in the BP you can see the content and the Data can be processed.


<test>
東京都中央区日本橋箱崎町 
</test>

Content Conversion

Following up on our previous Example you now want to make this document available to one of your backend Applications. But this Backend Application is not able to process SHIFT-JIS Document. It can process UTF-8 Documents though. You will use the EncodingConversion Service to change the actual payload


  <operation name="ChangeEncoding"> 
   <participant name="EncodingConversion"/> 
   <output message="xout"> 
	<assign to="." from="*"/> 
	<assign to="input_encoding">SJIS</assign> 
	<assign to="output_encoding">UTF-8</assign> 
   </output> 
   <input message="xin"> 
	<assign to="." from="*"/> 
   </input> 
  </operation>

When you put an operation like this into your BP, it will convert the actual Document. It will NOT change the Document Envelope. So if the following services do need that you have to use a combination of EncodignConversion Service and GetDocumentInformation Service. This combination will allow you to change both the payload and the envelope information

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS3JSW","label":"IBM Sterling B2B Integrator"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

UID

ibm11121973