Output tab (DataStage)

The Output tab allows you to specify details about how the data is read for a file set.

The file set can have only one output link. It can also have a single reject link, where rows that have failed to be written or read for some reason can be sent. The Output name drop-down list allows you to choose whether you are looking at details of the main output link (the stream link) or the reject link.

Use the Source category to specify how the file set operates:
File set
This property defines the file set that the data will be read from. You can type in a path name of a file set descriptor file (by convention ending in .fs).
Options:
Keep file partitions
Set this to True to partition the read data set according to the organization of the input file(s). So, for example, if you are reading three files you will have three partitions. Defaults to False.
Reject mode
Allows you to specify behavior for read rows that do not match the expected schema. Choose from Continue to continue operation and discard any rejected rows, Fail to cease reading if any rows are rejected, or Save to send rejected rows down a reject link. Defaults to Continue.
Report progress
Choose Yes or No to enable or disable reporting. By default the stage displays a progress report at each 10% interval when it can ascertain file size. Reporting occurs only if the file is greater than 100 KB, records are fixed length, and there is no filter on the file.
Filter
This is an optional property. You can use this to specify that the data is passed through a filter program after being read from the files. Specify the filter command, and any required arguments, in the Property Value box.
Schema file
This is an optional property. By default the File Set stage will use the column definitions defined on the Columns and Format tabs as a schema for reading the file. You can, however, specify a file containing a schema instead (note, however, that if you have defined columns on the Columns tab, you should ensure these match the schema file). Type in a path name of a schema file. This property is mutually exclusive with Use Schema Defined in File Set.
Use schema defined in file set
When you create a file set you have an option to save the schema along with it. When you read the file set you can use this schema in preference to the column definitions by setting this property to True. This property is mutually exclusive with Schema File.
File name column
This is an optional property. It adds an extra column of type VarChar to the output of the stage, containing the path name of the file the record is read from. You should also add this column manually to the Columns definitions to ensure that the column is not dropped if you are not using runtime column propagation, or it is turned off at some point.
Row number column
This is an optional property. It adds an extra column of type unsigned BigInt to the output of the stage, containing the row number. You must also add the column to the columns tab, unless runtime column propagation is enabled.
Strip BOM
Set this property TRUE to drop the UTF-16 Endianess byte order mark when reading data. By default, this property is set to FALSE.

Format

The Format section allows you to supply information about the format of the files that you are writing. You can supply information about the format of the flat file that you are writing.

If you do not alter any of the Format settings, the stage will produce a file of the following format:

  • File comprises variable length columns contained within double quotes.
  • All columns are delimited by a comma, except for the final column in a row.
  • Rows are delimited by a UNIX newline.

The following sections list the property types and properties available for each type.

Record level

These properties define details about how data records are formatted in the flat file. Where you can enter a character, this can usually be an ASCII character or a multi-byte Unicode character (if you have NLS enabled). The available properties are:

  • Fill char. Does not apply to output links.
  • Final delimiter string. Specify the string written after the last column of a record in place of the column delimiter. Enter one or more characters, this precedes the record delimiter if one is used. Mutually exclusive with Final delimiter, which is the default. For example, if you set Delimiter to comma and Final delimiter string to `, ` (comma space - you do not need to enter the inverted commas) all fields are delimited by a comma, except the final field, which is delimited by a comma followed by an ASCII space character. DataStage® skips the specified delimiter string when reading the file.
  • Final delimiter. Specify the single character written after the last column of a record in place of the field delimiter. Type a character or select one of whitespace, end, none, null, tab, or comma. DataStage skips the specified delimiter string when reading the file. See the following diagram for an illustration.
    • whitespace. The last column of each record will not include any trailing white spaces found at the end of the record.
    • end. The last column of each record does not include the field delimiter. This is the default setting.
    • none. The last column of each record does not have a delimiter, used for fixed-width fields.
    • null. The last column of each record is delimited by the ASCII null character.
    • comma. The last column of each record is delimited by the ASCII comma character.
    • tab. The last column of each record is delimited by the ASCII tab character.
  • Intact. The intact property specifies an identifier of a partial schema. A partial schema specifies that only the column(s) named in the schema can be modified by the stage. All other columns in the row are passed through unmodified. The file containing the partial schema is specified in the Schema File property on the Outputs tab. This property has a dependent property:
    • Check intact. Select this to force validation of the partial schema as the file or files are imported. Note that this can degrade performance.
  • Record delimiter string. Specify the string at the end of each record. Enter one or more characters. This is mutually exclusive with Record delimiter, which is the default, and record type and record prefix.
  • Record delimiter. Specify the single character at the end of each record. Type a character or select one of the following:
    • UNIX Newline (the default)
    • null

      To specify a DOS newline, use the Record delimiter string property set to "\r" or "\n".

      Record delimiter is mutually exclusive with Record delimiter string, Record prefix, and record type.

  • Record length. Select Fixed where fixed length fields are being read. DataStage calculates the appropriate length for the record. Alternatively specify the length of fixed records as number of bytes. This is not used by default (default files are comma-delimited).
  • Record Prefix. Specifies that a variable-length record is prefixed by a 1-, 2-, or 4-byte length prefix. It is set to 1 by default. This is mutually exclusive with Record delimiter, which is the default, and record delimiter string and record type.
  • Record type. Specifies that data consists of variable-length blocked records (varying) or implicit records (implicit). If you choose the implicit property, data is written as a stream with no explicit record boundaries. The end of the record is inferred when all of the columns defined by the schema have been parsed. The varying property allows you to specify one of the following IBM® blocked or spanned formats: V, VB, VS, VBS, or VR. Data is imported by using one of these formats. These formats are not available for export.

    This property is mutually exclusive with Record length, Record delimiter, Record delimiter string, and Record prefix and by default is not used.

Field Defaults

Defines default properties for columns read from the file or files. These are applied to all columns, but can be overridden for individual columns from the Columns tab using the Edit Column Metadata dialog box. A common reason to override a property for an individual column occurs when reading comma-separated values (CSV) files. CSV files often enclose fields in quotes when the fields might contain a special character, such as the field delimiter. In this case, the Quote property for the columns in question should be overridden.

Where you can enter a character, this can usually be an ASCII character or a multi-byte Unicode character (if you have NLS enabled). The available properties are:

  • Actual field length. Specifies the actual number of bytes to skip if the field's length equals the setting of the null field length property.
  • Delimiter. Specifies the trailing delimiter of all fields in the record. Type an ASCII character or select one of whitespace, end, none, null, comma, or tab. DataStage skips the delimiter when reading.
    • whitespace. Whitespace characters at the end of a column are ignored, that is, are not treated as part of the column.
    • end. The end of a field is taken as the delimiter, that is, there is no separate delimiter. This is not the same as a setting of `None' which is used for fields with fixed-width columns.
    • none. No delimiter (used for fixed-width).
    • null. ASCII Null character is used.
    • comma. ASCII comma character is used.
    • tab. ASCII tab character is used.
  • Delimiter string. Specify the string at the end of each field. Enter one or more characters. This is mutually exclusive with Delimiter, which is the default. For example, specifying `, ` (comma space - you do not need to enter the inverted commas) specifies each field is delimited by `, ` unless overridden for individual fields. DataStage skips the delimiter string when reading.
  • Null field length. The length in bytes of a variable-length field that contains a null. When a variable-length field is read, a length of null field length in the source field indicates that it contains a null. This property is mutually exclusive with null field value.
  • Null field value. Specifies the value given to a null field if the source is set to null. Can be a number, string, or C-type literal escape character. For example, you can represent a byte value by \ooo, where each o is an octal digit 0 - 7 and the first o is < 4, or by \xhh, where each h is a hexadecimal digit 0 - F. You must use this form to encode non-printable byte values.

    This property is mutually exclusive with Null field length and Actual length. For a fixed width data representation, you can use Pad char (from the general section of Type defaults) to specify a repeated trailing character if the value you specify is shorter than the fixed width of the field.

    You can specify a list of null values that a column could contain that represent null. To do this you specify a separator character in the dependent Null field value separator property, and then use this separator to delimit the null values in the Null field value property. For example, if you set Null field value separator to contain the slash character (/), then you could specify NULL/null/NUL/nul to specify that any of these strings could represent a null value in this column.

  • Null field value separator

    This is a dependent property of Null field value. You can specify a separator that can be used in the Null field value property to specify a range of values that could represent the null value. You can specify a number, string, or C-type literal escape character (as for Null field value) as a separator, but a single character such as a comma (,) or slash (/) character is the best choice. You must only specify a separator if you specify multiple values in Null field value; specifying a separator without using it will cause a runtime error.

  • Prefix bytes. You can use this option with variable-length fields. Variable-length fields can be either delimited by a character or preceded by a 1-, 2-, or 4-byte prefix containing the field length. DataStage reads the length prefix but does not include the prefix as a separate field in the data set it reads from the file.

    This property is mutually exclusive with the Delimiter, Quote, and Final Delimiter properties, which are used by default.

  • Print field. This property is intended for use when debugging jobs. Set it to have DataStage produce a message for every field it reads. The message has the format:
    Importing N: D
    where:
    • N is the field name.
    • D is the imported data of the field. Non-printable characters contained in D are prefixed with an escape character and written as C string literals; if the field contains binary data, it is output in octal format.
  • Quote. Specifies that variable length fields are enclosed in single quotes, double quotes, or another character or pair of characters. Choose Single or Double, or enter a character. This is set to double quotes by default.

    When reading, DataStage ignores the leading quote character and reads all bytes up to but not including the trailing quote character.

  • Vector prefix. For fields that are variable length vectors, specifies that a 1-, 2-, or 4-byte prefix contains the number of elements in the vector. You can override this default prefix for individual vectors.

    Variable-length vectors must use either a prefix on the vector or a link to another field in order to specify the number of elements in the vector. If the variable length vector has a prefix, you use this property to indicate the prefix length. DataStage reads the length prefix but does not include it as a separate field in the data set. By default, the prefix length is assumed to be one byte.

Type Defaults

These are properties that apply to all columns of a specific data type unless specifically overridden at the column level. They are divided into a number of subgroups according to data type.

General

These properties apply to several data types (unless overridden at column level):
  • Byte order. Specifies how multiple byte data types (except string and raw data types) are ordered. Choose from:
    • little-endian. The high byte is on the right.
    • big-endian. The high byte is on the left.
    • native-endian. As defined by the native format of the machine. This is the default.
  • Data Format. Specifies the data representation format of a field. Applies to fields of all data types except string, ustring, and raw and to record, subrec or tagged fields containing at least one field that is neither string nor raw. Choose from:
    • binary
    • text (the default)

      A setting of binary has different meanings when applied to different data types:

    • For decimals, binary means packed.
    • For other numerical data types, binary means "not text".
    • For dates, binary is equivalent to specifying the julian property for the date field.
    • For time, binary is equivalent to midnight_seconds.
    • For timestamp, binary specifies that the first integer contains a Julian day count for the date portion of the timestamp and the second integer specifies the time portion of the timestamp as the number of seconds from midnight. A binary timestamp specifies that two 32-but integers are written.

      By default data is formatted as text, as follows:

    • For the date data type, text specifies that the data read, contains a text-based date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
    • For the decimal data type: a field represents a decimal in a string format with a leading space or '-' followed by decimal digits with an embedded decimal point if the scale is not zero. The destination string format is: [+ | -]ddd.[ddd] and any precision and scale arguments are ignored.
    • For numeric fields (int8, int16, int32, uint8, uint16, uint32, sfloat, and dfloat): DataStage assumes that numeric fields are represented as text.
    • For the time data type: text specifies that the field represents time in the text-based form %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system.
    • For the timestamp data type: text specifies a text-based timestamp in the form %yyyy-%mm-%dd %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system.
  • Field max width. The maximum number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.
  • Field width. The number of bytes in a field represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If it's a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.

    If you specify neither field width nor field max width, numeric fields written as text have the following number of bytes as their maximum width:

    • 8-bit signed or unsigned integers: 4 bytes
    • 16-bit signed or unsigned integers: 6 bytes
    • 32-bit signed or unsigned integers: 11 bytes
    • 64-bit signed or unsigned integers: 21 bytes
    • single-precision float: 14 bytes (sign, digit, decimal point, 7 fraction, "E", sign, 2 exponent)
    • double-precision float: 24 bytes (sign, digit, decimal point, 16 fraction, "E", sign, 3 exponent)
  • Pad char. This property is ignored for output links.
  • Character set. Specifies the character set. Choose from ASCII or EBCDIC. The default is ASCII. Applies to all data types except raw and ustring and record, subrec, or tagged containing no fields other than raw or ustring.

String

These properties are applied to columns with a string data type, unless overridden at column level.
  • Export EBCDIC as ASCII. Not relevant for output links.
  • Import ASCII as EBCDIC. Select this to specify that ASCII characters are read as EBCDIC characters.

Decimal

These properties are applied to columns with a decimal data type unless overridden at column level.
  • Allow all zeros. Specifies whether to treat a packed decimal column containing all zeros (which is normally illegal) as a valid representation of zero. Select Yes or No. The default is No.
  • Decimal separator. Specify the ASCII character that acts as the decimal separator (period by default).
  • Packed. Select an option to specify what the decimal columns contain, choose from:
    • Yes to specify that the decimal fields contain data in packed decimal format (the default). This has the following sub-properties:

      Check. Select Yes to verify that data is packed, or No to not verify.

      Signed. Select Yes to use the existing sign when reading decimal fields. Select No to write a positive sign (0xf) regardless of the fields' actual sign value.

    • No (separate) to specify that they contain unpacked decimal with a separate sign byte. This has the following sub-property:

      Sign Position. Choose leading or trailing as appropriate.

    • No (zoned) to specify that they contain an unpacked decimal in either ASCII or EBCDIC text. This has the following sub-property:

      Sign Position. Choose leading or trailing as appropriate.

    • No (overpunch) to specify that the field has a leading or end byte that contains a character which specifies both the numeric value of that byte and whether the number as a whole is negatively or positively signed. This has the following sub-property:

      Sign Position. Choose leading or trailing as appropriate.

  • Precision. Specifies the precision of a packed decimal. Enter a number.
  • Rounding. Specifies how to round the source field to fit into the destination decimal when reading a source field to a decimal. Choose from:
    • up (ceiling). Truncate source column towards positive infinity. This mode corresponds to the IEEE 754 Round Up mode. For example, 1.4 becomes 2, -1.6 becomes -1.
    • down (floor). Truncate source column towards negative infinity. This mode corresponds to the IEEE 754 Round Down mode. For example, 1.6 becomes 1, -1.4 becomes -2.
    • nearest value. Round the source column towards the nearest representable value. This mode corresponds to the COBOL ROUNDED mode. For example, 1.4 becomes 1, 1.5 becomes 2, -1.4 becomes -1, -1.5 becomes -2.
    • truncate towards zero. This is the default. Discard fractional digits to the right of the right-most fractional digit supported by the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, truncate to the scale size of the destination decimal. This mode corresponds to the COBOL INTEGER-PART function. Using this method 1.6 becomes 1, -1.6 becomes -1.
  • Scale. Specifies the scale of a source packed decimal.

Numeric

These properties apply to integer and float fields unless overridden at column level.
  • C_format. Perform non-default conversion of data from string data to a integer or floating-point. This property specifies a C-language format string used for reading integer or floating point strings. This is passed to sscanf(). For example, specifying a C-format of %x and a field width of 8 ensures that a 32-bit integer is formatted as an 8-byte hexadecimal string.
  • In_format. Format string used for conversion of data from string to integer or floating-point data This is passed to sscanf(). By default, DataStage invokes the C sscanf() function to convert a numeric field formatted as a string to either integer or floating point data. If this function does not output data in a satisfactory format, you can specify the in_format property to pass formatting arguments to sscanf().
  • Out_format. This property is not relevant for output links.

Date

These properties are applied to columns with a date data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
  • Days since. Dates are written as a signed integer containing the number of days since the specified date. Enter a date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
  • Format string. The string format of a date. By default this is %yyyy-%mm-%dd.
  • Is Julian. Select this to specify that dates are written as a numeric value containing the Julian day. A Julian day specifies the date as the number of days from 4713 BCE January 1, 12:00 hours (noon) GMT.

Time

These properties are applied to columns with a time data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
  • Format string. Specifies the format of columns representing time as a string. By default this is %hh-%mm-%ss.
  • Is midnight seconds. Select this to specify that times are written as a binary 32-bit integer containing the number of seconds elapsed from the previous midnight.

Timestamp

These properties are applied to columns with a timestamp data type unless overridden at column level.
  • Format string. Specifies the format of a column representing a timestamp as a string. The format combines the format for date strings and time strings.