Configuring the DFDL annotations

The second stage of an approach to modeling data using DFDL involves adding DFDL annotations to the logical structure that you established. The DFDL annotations describe the physical format of the components.

Before you begin

Follow the guidance in Understanding the logical structure.

About this task

Determine the characteristics of your components.

Procedure

  1. All elements (simple and complex)
    1. Does the element have any delimiters, that is, an initiator or a terminator? If so what is the encoding, and are they present when the element is empty or nil?
      This characteristic determines the dfdl:initiator, dfdl:terminator, dfdl:encoding and associated properties.
    2. How is the content of the element established?
      This characteristic determines the dfdl:lengthKind and associated properties:
      • explicit for a fixed length.
      • prefixed if there is a length prefix.
      • delimited if bounded by a delimiter.
      • pattern to use a regular expression.
      • implicit if the length is determined by its type.
      • endOfParent, if bounded by its parent.
    3. If the element is optional or is an array, how is the number of occurrences established?
      This characteristic determines the dfdl:occursCountKind and associated properties.
    4. Are there any alignment rules to apply?
      This characteristic usually occurs only for binary data, and determines the dfdl:alignment, dfdl:fillByte, and associated properties.
    5. How is any nil value described?
      This characteristic determines the dfdl:nilKind, dfdl:nilValue and associated properties.
    6. Is an assert or discriminator needed to establish whether the element exists?
  2. Simple elements
    1. Is the element text or binary representation? The representation and simple type determines which other properties must be set.
      • For text, the properties are dfdl:encoding, and the several DFDL text-related properties.
      • For binary, the properties are dfdl:byteOrder, and the several DFDL binary-related properties.
    2. For text formats, is an escape scheme needed?
      This characteristic determines whether a dfdl:defineEscapeScheme annotation is needed, and if so a dfdl:escapeSchemaRef to reference it.
    3. If global simple types are identified, decide whether the simple type can carry some of the properties rather than the element, thus creating reusable physical types.
  3. Sequences
    1. Is the sequence ordered or unordered?
      This characteristic determines the dfdl:sequenceKind property.
    2. Does it have a separator that is used to delimit its child elements, and if so is the separator's position infix, prefix or postfix? Are separators sometimes suppressed (for example, when optional elements are missing)?
      These characteristics determine the dfdl:encoding, dfdl:separator, dfdl:separatorPosition, and dfdl:separatorSuppressionPolicy properties.
    3. Do all the child elements of the sequence have unique initiators that can identify that they exist?
      This characteristic determines the dfdl:initiatedContent property.
    4. Does the sequence itself have an initiator or a terminator?
      This characteristic determines the dfdl:initiator, dfdl:terminator, dfdl:encoding, and associated properties.
  4. Choices
    1. Is the choice one where all the branches must occupy the same length or not?
      This characteristic determines the dfdl:choiceLengthKind and associated properties.
    2. Do all the branches of the choice have unique initiators that can identify which one appears?
      This characteristic determines the dfdl:initiatedContent property.
    3. Are discriminators needed on the branches to establish which one appears?
    4. Does the choice itself have an initiator or a terminator?
      This characteristic determines the dfdl:initiator, dfdl:terminator, dfdl:encoding, and associated properties.

Example

CSV file that shows elements and attributes.

To continue with the example of a file of employee records, in which all data is text, with a dfdl:encoding of ASCII.

  • The employees element does not have an initiator or a terminator so dfdl:initiator and dfdl:terminator are set to empty string ''. Its length is determined by its child elements, so dfdl:lengthKind is implicit.
  • The sequence for employees has dfdl:sequenceKind ordered, because its child components always appear in the order specified.
  • The employeeRecords element starts with {, so has a dfdl:initiator {{. (Two opening braces ({{) are needed to prevent the DFDL initiator from being misinterpreted as a DFDL expression.) The element ends with } and CR/LF, and so has dfdl:terminator }%CR;%LF;. (An alternative is to model the } and CR/LF as the separator of the parent sequence.) Again, its length is determined by its child elements, so dfdl:lengthKind is implicit.
  • Its sequence has dfdl:sequenceKind ordered.
  • Each simple element has dfdl:representation text. Each has a unique start tag that is used as the dfdl:initiator. Each has a variable length value that is delimited either by ',' or, if the element is last in the record, by } and CR/LF. Consequently, dfdl:lengthKind is delimited.
  • The ',' delimiter is best modeled as the dfdl:separator of the parent sequence, and not as the dfdl:terminator of the element. The dfdl:separatorPosition is infix meaning that the ',' occurs only between the child elements in the sequence.
  • Because each simple element has an initiator, dfdl:initiatedContent yes must be set on the parent sequence. This means the parser uses the initiator to positively identify each element.
  • Although the salary element is optional, no DFDL discriminator is needed because the DFDL parser deduces that it is missing when it finds the employeeRecord terminator. When salary is missing, notice that the ',' before it is suppressed. That is modeled using dfdl:separatorSuppressionPolicy trailingEmpty on the parent sequence.
  • Several other text-related properties must be set for each simple element. These properties control whether any padding or trimming takes place, and whether an escape scheme is in use to prevent data characters being interpreted as delimiters.
  • Several other type-specific properties must be set for each simple element, to control the interpretation of the data value. For example, the permanent element is type xs:boolean and so needs dfdl:textBooleanTrueRep Y and dfdl:textBooleanFalseRep N.

What to do next

The next stage is to organize the DFDL model: Organizing the DFDL model.