How to model data with DFDL

Analyze data formats for which no model exists. Understand the structure, so that you can create the corresponding DFDL model.

Data modeling with DFDL has a strong analogy with programming. Suppose you want to learn a new programming language and write a program to solve a business problem. Take Java™ as an example. You buy a Java book and read up on the language theory. You get hold of a good Java editor and learn how to use it. But the hardest part is working out how to structure the program that solves your business problem. To get a head start, you might look at examples that other programmers created.

DFDL data modeling is the same. You can learn the theory about the modeling language and you can learn how to use an editor for that language. But the hardest part is looking at the actual data and working out how to go about creating the best model for it. If you are lucky the problem is solved in whole or in part by already having a model of the data in one format or another (metadata). For DFDL, IBM® provides importers to convert certain data formats (for example, COBOL and C metadata) into DFDL schemas. But what do you do when you have no model to reference, just one or more examples of the data format? Formatted text messages, such as comma-separated values (CSV) messages, often have no model. You can learn how to analyze data formats in order to understand the structure and create the corresponding DFDL model.

The task consists of the following stages: