Recognizing file records as messages to be parsed
Use the FileInput, FTEInput and FileRead nodes to segment your input file into messages that are to be parsed.
- XMLNSC
- MRM Custom Wire Format (CWF)
- MRM Tagged Delimited String Format (TDS)
The XMLNSC parser
If you select the XMLNSC parser, the end of the root tag marks the end of the message. XML comments, XML processing instructions, and white space that appear after the end of the XML message are discarded. The start of the next XML message is marked either by the next XML root tag or the next XML prolog.
The MRM parser
If you select an MRM parser,
ensure that the message model has a defined message boundary and does
not rely on the parse being stopped when it reaches the end of the
bit stream. If the final element has a maxOccurs value
of -1, the parser continues to read bytes until the end of the bit
stream or until it encounters bytes that cause a parsing exception.
In either case, the parser is unable to identify the end of one message
and the start of the next. If you use Data Element Separation = Use
Data Pattern
, ensure that the pattern recognizes a specified
number of bytes. Be aware, therefore, that a pattern of *
identifies
all available characters and so would read an entire input file.
If
you use delimited separations with message group indicators and terminators,
ensure that the combination of group indicator and terminator does
not match a record delimiter. For example, a message might start with
a left brace ({
) and end with a right brace (}
).
If there is a delimiter of }{
within the message,
the delimiter matches the boundary between multiple messages; as a
result, a delimiter within the current message might be identified
as a message boundary. This might cause bytes in a subsequent message
to be included in the current message causing parser exceptions or
unexpected content in the parse tree.