MRM TDS format: Data pattern separation types

For a data pattern separation type, each data value is matched with a regular expression that is specified as a property of each element.

The length of both textual and non-textual data is determined by the Data Pattern property of the element. If the Physical Type of the element is Length Encoded String 1 or Length Encoded String 2, the regular expression must match both the encoded length and the following data. The length in the encoded length must be consistent with the length matched by the regular expression. If the Physical Type of the element is Null Terminated String, the regular expression must match both the data and the following null terminator.

The Data Pattern separation type uses a regular expression that is specified for each element to match the data. The parser matches the data with the regular expression in the Data Pattern property for that element. TDS parsing in the MRM parser uses the regular expression in Data Pattern to determine the length of the element, whether it is repeating, and whether it is present in the bit stream.

No delimiters or tags, other than those coded as part of the regular expression pattern, are used in the bit stream. See Message Sets: Using regular expressions to parse data elements for an explanation of pattern matching.

For example, if the first three Data Pattern properties are, respectively:
  • [A-Z]{1,3}
  • [0-9]+
  • [a-z]*
and the message data is:
DT31758934information for you
Then, in this example:
  • First data element = DT
  • Second data element = 31758934
  • Third data element = information

The first data pattern means "from one to three characters in the range A to Z", the second means "one or more characters in the range 0 to 9", and the third means "zero or more characters in the range a to z". Note how each element's data was terminated by the first character that did not match the element's Data Pattern.

If the TDS message that is being parsed is encoded in a single-byte code page, the Data Pattern property can include hexadecimal values. A hexidecimal value is specified as \xNN, where N is a hexadecimal digit in the range 0 to F. Note, however, that the value \x00 is not valid.

Performance issues

The parsing required in Data Pattern separation type is the slowest of all the different separation types because of its complexity.

Therefore, use Data Pattern separation type only when no other separation type models the message. Do not use it, for example, when you can use Fixed Length separation type.

Applicable parameters

Only one parameter is used:

Data Pattern for each element, indicates the regular expression that is used for string matching.