Big Data File stage: Options category

In the Options category of the input link properties for the Big Data File stage, you can specify how the stage handles partially written files and whether the first line of the file contains column names. You can also specify options for rejected records, schema files, and the maximum file size.

Cleanup on failure

This is set to True by default and specifies that the stage will delete any partially written files if the stage fails for any reason. Set this to False to specify that partially written files should be left.

First Line is Column Names

Specifies that the first line of the file contains column names. This property is false by default.

Reject mode

This specifies what happens to any data records that are not written to a file for some reason. Choose from Continue to continue operation and discard any rejected rows, Fail to cease writing if any rows are rejected, or Save to send rejected rows down a reject link.

Continue is set by default.

Schema file

This is an optional property. By default the stage uses the column definitions defined on the Columns and Format tabs as a schema for writing to the file. You can, however, specify a file containing a schema instead (note, however, that if you have defined columns on the Columns tab, you should ensure these match the schema file). Type in a pathname or browse for a schema file.

Max File Size

This is an optional property. This property specifies the maximum size for a target file in megabytes (MB). When the specified maximum size is reached, another target file is generated. Alternatively, you can specify a key column in the Properties category to trigger generation of a new file.

This property is available when the stage property Write Method is set to Generate Multiple Files. If a maximum file size is not specified, then the file size is unlimited.