Working with CDATA section

A CDATA section is used to mark a section of an XML document, so that the XML parser interprets it only as character data, and not as markup. It comes handy when one XML data need to be embedded within another XML document.

An XML element can contain text content, typically in the form:
<element>text content</element>
The bracket < and &amp; sign, both have meaning to an XML parser. If they are included in the text content of an element, they change the meaning of the XML document. For example, the following string is a wrong XML statement:
<element><text><content></element>
There are two methods to ensure that an XML file is well-formed.
  • Use character entities:
    <element>&amp;lttext&amp;gt;<content</element>
    or
    <element>&amp;lttext<content</element>
  • Use a CDATA section:
    <element><![CDATA[<text><content>]]></element>
If the text content of an XML element contains another XML child element, the parser will ignore all the content in the parent element text followed by the first bracket, (<). For example:
<element1> text1 <element2> text2 </element2> </element1>
or
<element1> text1 <element2> text2 </element2> text3 </element1>
The parser will consider text1 as the content of element1 and ignore all other content followed by first bracket, (<). For example:
<element1> <element2> text2 </element2> </element1>
and
<element1> <element2> text2 </element2> text3 </element1>

Parser will ignore all the content followed by the first bracket < in element1. This scenario will lead to the following exception, as element1 content will be empty,

XML to CommArea conversion error.

To avoid stripping or ignoring the content of element1 in the above scenarios, you can wrap the child element, element1 in a CDATA section, or by using character entities.

The CDATA section is treated as a block of character data by the parser, allowing inclusion of any character in the data stream. CDATA section starts with the special sequence <![CDATA[ and ends with the ]]> sequence. After the TXSeries XML parser identifies the CDATA sections, it strips out the delimiter <![CDATA[ and ]]> and treats the text contents as raw text. Anything between those delimiter will pass through the XML parser untouched.