Metadata

Metadata describes the data flowing through your job in terms of column definitions, which describe each of the fields making up a data record. Metadata is handled through table definitions, or through Schema files.

Metadata is information about data. It describes the data flowing through your job in terms of column definitions, which describe each of the fields making up a data record.

InfoSphere® DataStage® has two alternative ways of handling metadata, through table definitions, or through Schema files. By default, parallel stages derive their meta data from the columns defined on the Outputs or Input page Column tab of your stage editor. Additional formatting information is supplied, where needed, by a Formats tab on the Outputs or Input page. In some cases you can specify that the stage uses a schema file instead by explicitly setting a property on the stage editor and specify the name and location of the schema file. Note that, if you use a schema file, you should ensure that runtime column propagation is turned on. Otherwise the column definitions specified in the stage editor will always override any schema file.

Where is additional formatting information needed? Typically this is where you are reading from, or writing to, a file of some sort and InfoSphere DataStage needs to know more about how data in the file is formatted.

You can specify formatting information on a row basis, where the information is applied to every column in every row in the dataset. This is done from the Formats tab (the Formats tab is described with the stage editors that support it; for example, for Sequential files, see page Input Link Format Tab). You can also specify formatting for particular columns (which overrides the row formatting) from the Edit Column Metadata dialog box for each column (see page Field Level).