Structure Definition Document

The structure definition document is an XML document that specifies the fixed-size data structures that can occur in the binary data stream and that are parsed by the StructureParse operator.

Format

This document follows a specific format and can be composed of mandatory and optional sections.

You can use the following example to create a new structure definition document:


<?xml version="1.0" encoding="UTF-8"?>
<structures
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.ibm.com/software/data/infosphere/streams/parser StructureParseStructure.xsd"
	xmlns=http://www.ibm.com/software/data/infosphere/streams/parser
>
</structures>

The first line specifies the encoding format of the file. The second line, which starts with the <structures> XML element, specifies the schema location and other settings that are required for the XSD-driven validation.

The XML schema definition is stored in the etc/xsd/StructureParseStructure.xsd file in the toolkit directory.

You can specify the following mandatory and optional sections in the structure definition document in the following order:

1. Variable declarations (optional)

In some cases, it is necessary to store fixed or field values for later use in conditions.

For example, the binary data stream can contain different versions of the same structure. The version information is stored only in a header structure and must be stored in a variable while the header structure is processed. The stored value can be used later to support the correct detection of the versioned structure.

2. Group definitions (optional)

Each structure in the structure definition document has a condition. If the condition is met while parsing the binary input data, the corresponding structure is detected. Sometimes, the conditions for different structures are partially identical, for example with versioned structures that check for the version that is stored in a variable.

The generated code runs these partial condition evaluations for all structures instead of running them only once. In other words, the code generator does not optimize such repetitions by itself. It is the responsibility of the structure definition document author to identify repeating condition parts and to move them into the group conditions.

3. Structure definitions (mandatory)

The structure definition section is the heart of the definition file. It contains all relevant definitions of structures that can occur in the binary data stream, as well as the corresponding fields, conditions to detect the structures, and relations between the structures.

4. Structure-based synchronization (optional)

If some data is lost or invalid or if unexpected data is ingested, the structure detection algorithm enters an error mode and drops all subsequent data. This algorithm tries to synchronize itself with the next valid structure.

The default algorithm synchronizes with the next window punctuation. After the next window punctuation, it is assumed that the next data block begins with a valid structure.

If you have no window punctuations that can be used to synchronize, or if a problem occurs in the beginning of your input data leading to large amounts of dropped data, you should synchronize earlier than at the window punctuation. For example, suppose you have a 1 GB file, and the file encounters a problem at the first byte. In this case, the complete file is dropped even if it is possible to synchronize at the second byte.

This operator can be configured to use structure-based synchronization. In other words, the author of the structure definition document can specify one or more structures that can be used for synchronization. The conditions of these structures find a valid synchronization point by iterating through the binary data stream.

Variable declarations
Group definitions
Structure definitions
Groups and group conditions
Actions
Structure-based synchronization
Use Cases & Solutions