Recognizing file records as messages to be parsed

Use the FileInput, FTEInput and FileRead nodes to segment your input file into messages that are to be parsed.

The node segments your input file into messages that are to be parsed by one of the following parsers:
  • XMLNSC
  • MRM Custom Wire Format (CWF)
  • MRM Tagged Delimited String Format (TDS)
The Message domain property of the node specifies the parser to use; XMLNSC or MRM. Specify Parsed Record Sequence for the Record detection property so that the node splits the file into messages to be parsed by either the XMLNSC parser or MRM parser.

The XMLNSC parser

If you select the XMLNSC parser, the end of the root tag marks the end of the message. XML comments, XML processing instructions, and white space that appear after the end of the XML message are discarded. The start of the next XML message is marked either by the next XML root tag or the next XML prolog.

The MRM parser

If you select an MRM parser, ensure that the message model has a defined message boundary and does not rely on the parse being stopped when it reaches the end of the bit stream. If the final element has a maxOccurs value of -1, the parser continues to read bytes until the end of the bit stream or until it encounters bytes that cause a parsing exception. In either case, the parser is unable to identify the end of one message and the start of the next. If you use Data Element Separation = Use Data Pattern, ensure that the pattern recognizes a specified number of bytes. Be aware, therefore, that a pattern of * identifies all available characters and so would read an entire input file.

If you use delimited separations with message group indicators and terminators, ensure that the combination of group indicator and terminator does not match a record delimiter. For example, a message might start with a left brace ({) and end with a right brace (}). If there is a delimiter of }{ within the message, the delimiter matches the boundary between multiple messages; as a result, a delimiter within the current message might be identified as a message boundary. This might cause bytes in a subsequent message to be included in the current message causing parser exceptions or unexpected content in the parse tree.