The second stage of an approach to modeling data using DFDL involves adding DFDL
annotations to the logical structure that you established. The DFDL annotations describe the
physical format of the components.
About this task
Determine the characteristics of your components.
Procedure
-
All elements (simple and complex)
-
Does the element have any delimiters, that is, an initiator or a terminator? If so what is the
encoding, and are they present when the element is empty or nil?
This characteristic determines the dfdl:initiator, dfdl:terminator, dfdl:encoding and
associated properties.
-
How is the content of the element established?
This characteristic determines the dfdl:lengthKind and associated properties:
explicit
for a fixed length.
prefixed
if there is a length prefix.
delimited
if bounded by a delimiter.
pattern
to use a regular expression.
implicit
if the length is determined by its type.
endOfParent
, if bounded by its parent.
-
If the element is optional or is an array, how is the number of occurrences established?
This characteristic determines the dfdl:occursCountKind and associated properties.
-
Are there any alignment rules to apply?
This characteristic usually occurs only for binary data, and determines the dfdl:alignment,
dfdl:fillByte, and associated properties.
-
How is any nil value described?
This characteristic determines the dfdl:nilKind, dfdl:nilValue and associated
properties.
-
Is an assert or discriminator needed to establish whether the element exists?
-
Simple elements
-
Is the element text or binary representation? The representation and simple type determines
which other properties must be set.
- For text, the properties are dfdl:encoding, and the several DFDL text-related properties.
- For binary, the properties are dfdl:byteOrder, and the several DFDL binary-related properties.
-
For text formats, is an escape scheme needed?
This characteristic determines whether a dfdl:defineEscapeScheme annotation is needed, and if
so a dfdl:escapeSchemaRef to reference it.
-
If global simple types are identified, decide whether the simple type can carry some of the
properties rather than the element, thus creating reusable physical types.
-
Sequences
-
Is the sequence ordered or unordered?
This characteristic determines the dfdl:sequenceKind property.
-
Does it have a separator that is used to delimit its child elements, and if so is the
separator's position
infix
, prefix
or postfix
?
Are separators sometimes suppressed (for example, when optional elements are missing)?
These characteristics determine the dfdl:encoding, dfdl:separator, dfdl:separatorPosition, and
dfdl:separatorSuppressionPolicy properties.
-
Do all the child elements of the sequence have unique initiators that can identify that they
exist?
This characteristic determines the dfdl:initiatedContent property.
-
Does the sequence itself have an initiator or a terminator?
This characteristic determines the dfdl:initiator, dfdl:terminator, dfdl:encoding, and
associated properties.
-
Choices
-
Is the choice one where all the branches must occupy the same length or not?
This characteristic determines the dfdl:choiceLengthKind and associated properties.
-
Do all the branches of the choice have unique initiators that can identify which one appears?
This characteristic determines the dfdl:initiatedContent property.
-
Are discriminators needed on the branches to establish which one appears?
-
Does the choice itself have an initiator or a terminator?
This characteristic determines the dfdl:initiator, dfdl:terminator, dfdl:encoding, and
associated properties.
Example
To continue with the example of a file of employee records, in which all data is text,
with a dfdl:encoding of ASCII
.
- The
employees
element does not have an initiator or a terminator so
dfdl:initiator and dfdl:terminator are set to empty string ''. Its length is determined by its child
elements, so dfdl:lengthKind is implicit
.
- The sequence for
employees
has dfdl:sequenceKind ordered
,
because its child components always appear in the order specified.
- The
employeeRecords
element starts with {
, so has a
dfdl:initiator {{
. (Two opening braces ({{
) are needed to prevent
the DFDL initiator from being misinterpreted as a DFDL expression.) The element ends with
}
and CR/LF, and so has dfdl:terminator }%CR;%LF;
. (An alternative
is to model the } and CR/LF as the separator of the parent sequence.) Again, its length is
determined by its child elements, so dfdl:lengthKind is implicit
.
- Its sequence has dfdl:sequenceKind
ordered
.
- Each simple element has dfdl:representation
text
. Each has a unique start tag
that is used as the dfdl:initiator. Each has a variable length value that is delimited either by ','
or, if the element is last in the record, by }
and CR/LF. Consequently,
dfdl:lengthKind is delimited
.
- The
','
delimiter is best modeled as the dfdl:separator of the parent sequence,
and not as the dfdl:terminator of the element. The dfdl:separatorPosition is infix
meaning that the ','
occurs only between the child elements in the sequence.
- Because each simple element has an initiator, dfdl:initiatedContent
yes
must be
set on the parent sequence. This means the parser uses the initiator to positively identify each
element.
- Although the
salary
element is optional, no DFDL discriminator is needed
because the DFDL parser deduces that it is missing when it finds the employeeRecord
terminator. When salary
is missing, notice that the ','
before it
is suppressed. That is modeled using dfdl:separatorSuppressionPolicy trailingEmpty
on the parent sequence.
- Several other text-related properties must be set for each simple element. These properties
control whether any padding or trimming takes place, and whether an escape scheme is in use to
prevent data characters being interpreted as delimiters.
- Several other type-specific properties must be set for each simple element, to control the
interpretation of the data value. For example, the
permanent
element is type
xs:boolean and so needs dfdl:textBooleanTrueRep Y
and dfdl:textBooleanFalseRep
N
.
What to do next
The next stage is to organize the DFDL model: Organizing the DFDL model.