IBM Support

Processing Invalid XML Data Through The Integration Framework

Technical Blog Post


Abstract

Processing Invalid XML Data Through The Integration Framework

Body

Integration framework uses XML messages to publish and consume data from and to Maximo. However there are a range of characters that are considered invalid for an XML document. The Maximo Mbo framework would allow such characters to be entered and saved into Maximo. Problem starts however when one tries to export that data through the Integration Framework.  

 
Compounding the problem - most XML parsers would not validate for invalid XML characters in XML node/element values while creating an XML document – but do so when parsing an XML document. So if not validated at the time of creation we run a risk of generating an invalid XML document which cannot be parsed by any XML parser thus rendering the document impossible to consume. For a list of allowed characters in XML 1.0 please refer to http://www.w3.org/TR/2000/REC-xml-20001006#charsets.
 
Most common way of introducing these characters is by copying some text from some word processing software like MS Word/ MS Power Point and pasting it on a text field [say description] or a text area [say long description] in the Maximo Application UI. These [xml invalid] characters might just be copied inadvertently and has no significance to the actual text content. On the other hand its entirely possible that these caharacters serve a purpose and conveys some meaning to the text content. In the former case we need to get rid of the characters and in the latter we would need to retain the characters. Fortunately the Integration framework can be configured to handle both the cases. Lets look at each one of them separately.
 
Lets take the case where the xml invalid characters [mostly control characters] got copied over inadvertently and the user tries to save the record. In this case there are 2 possibilities:
  1.  One [or more] Publish channel(s) associated with the Mbo save event [Add/Update/Delete] gets invoked [assuming the event is enabled for the channel]. In this case the channel would invoke the Mbo serialization routine [as per the Object structure defined for this channel] real-time.
  2. No Publish channel is associated or the listener is disabled for the associated publish channel. In this case the record [mbo] gets saved – as mentioned earlier Mbo's do not complain on XML invalid characters and nor should they do so. At some point of time an admin person or an end user or a peer software might want to export/query the record using the MIF channels or XML/REST query services/apis. This in turn would invoke the Mbo serialization routine [as per the Object structure defined for this channel]. The only difference from #1 is – this one obviously is not real time – ie its happening [long] after the Mbo save has been done – and potentially is triggered by a user or a software that has no clue that an invalid XML data is lurking inside the Mbo.

Now as we have discussed earlier – most XML parsers do-not catch these XML invalid characters while creating the XML. So the only place to do this check is at the serialization [Mbo->XML conversion] time. There is a property defined just to control this – mxe.int.validatexmltext. By default its set to 1 [ie true] which implies the serialization process will validate any ALN data [which includes all other variations of ALN like UPPER]. Simply put – this will instruct the serializer to sniff out the invalid characters and throw an error. However the error would only indicate which Mbo attribute value has the XML invalid characters – and sniffing that bad character out [to “clean” the field value] is a manual/offline process which might prove to be tedious. Also there is some cost involved in parsing every ALN data for invalid characters. The cost becomes more pronounced when we are talking REST/XML queries which tends to be more real-time as opposed to an offline export. If for some implementation you are confident to not have such XML invalid characters – you might consider turning the check off by setting the property mxe.int.validatexmltext value to 0. Be aware that turning the property off would potentially result in generating invalid XML [in case the Mbo data contains XML invalid characters] which might not be detected till it reaches the External system.

Now lets look at the case when we are interested in keeping those characters – all it takes is just to flip a property value. The property name is mxe.int.binarytext. Setting the property value to 1 will instruct the serializer routine to

  1. detect if there are XML invalid characters in the Mbo data.
  2. If yes on step 1 – base64 encode the text data which converts the value to XML acceptable characters.
  3. Write the converted value [base64 encoded string] to the XML. 
  4. Mark the XML element with a flag binarytext=”true”. A sample XML element  will look like this - <DESCRIPTION binarytext=”true">abcdefg</DESCRIPTION>.

So as we see here – the Mbo attribute value having the invalid XML characters is converted to a base64 encoded text which will always be XML compatible [by virtue of it being base64 encoded]. A simple base64 decoding is good enough to get the original value back. The onus however lies on the receiving end which should look for the XML attribute “binarytext” [for every element] and retrieve the original value by running a base64 decode routine on the XML element value.

The MIF de-serialization [inbound processing] process is equipped to handle this “binarytext” out of the box. So Maximo/Tpae would be able to handle production and consumption of this kind of XML without any custom code. For example Migration manager would heavily benefit from this – as both the XML generation and XML consumption is done using the integration framework. However other systems will not be as smooth in handling these kind of XML data – and they need to be customized to handle this situation [ie process the XML element value based on the "binarytext" flag per XML element].

Important thing to note here is – to have the “binarytext” feature enabled we need to have the “validatexmltext” feature turned on. Also the “binarytext” feature is only standard from the Maximo 7.5 releases. Prior to 7.5 release – the “binarytext” feature was only available programmatically such that only internal components would be able to access that feature.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSLKT6","label":"IBM Maximo Asset Management"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

UID

ibm11134345