Type defaults
Data type defaults are properties that apply to all columns of a specific data type unless specifically overridden at the column level.
The Data type defaults are divided into a number of subgroups according to data type.
General
The following properties apply to several data types (unless overridden at column level):
- Byte order. Specifies how multiple byte data types (except string
and raw data types) are ordered. Choose from the following byte order
items:
- little-endian. The high byte is on the right.
- big-endian. The high byte is on the left.
- native-endian. As defined by the native format of the machine. This is the default.
- Data Format. Specifies the data representation format of a field.
Applies to fields of all data types except string, ustring, and raw,
and to record, subrecord or tagged fields containing at least one
field that is neither string nor raw. Choose from the following data
formats.
Data Format Description binary A setting of binary has different meanings when applied to different data types: - For decimals, binary means packed.
- For other numerical data types, binary means "not text".
- For dates, binary is equivalent to specifying the julian property for the date field.
- For time, binary is equivalent to midnight_seconds.
- For timestamp, binary specifies that the first integer contains a Julian day count for the date portion of the timestamp and the second integer specifies the time portion of the timestamp as the number of seconds from midnight. A binary timestamp specifies that two 32-but integers are written.
text (the default) By default data is formatted as text, as follows: - Date data type. Text specifies that the data to be written contains a text-based date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
- Decimal data type. A field represents a decimal
in a string format with a leading space or '-' followed by decimal
digits with an embedded decimal point if the scale is not zero.
The destination string format is: [+ | -]ddd.[ddd] and any precision and scale arguments are ignored.
- Numeric fields (int8, int16, int32, uint8, uint16, uint32, sfloat, and dfloat). InfoSphere® DataStage® assumes that numeric fields are represented as text.
- Time data type. Text specifies that the field represents time in the text-based form %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system.
- Timestamp data type. Text specifies a text-based timestamp in the form %yyyy-%mm-%dd %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system.
- Field max width. The maximum number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrecord, or tagged if they contain at least one field of this type.
- Field width. The number of bytes in a field represented as a string.
Enter a number. This is useful where you are storing numbers as text.
If you are using a fixed-width character set, you can calculate the
number of bytes exactly. If it's a variable length encoding, base
your calculation on the width and frequency of your variable-width
characters. Applies to fields of all data types except date, time,
timestamp, and raw; and record, subrecord, or tagged if they contain
at least one field of this type.
If you specify neither field width nor field max width, numeric fields written as text have the following number of bytes as their maximum width:
- 8-bit signed or unsigned integers: 4 bytes
- 16-bit signed or unsigned integers: 6 bytes
- 32-bit signed or unsigned integers: 11 bytes
- 64-bit signed or unsigned integers: 21 bytes
- single-precision float: 14 bytes (sign, digit, decimal point, 7 fraction, "E", sign, 2 exponent)
- double-precision float: 24 bytes (sign, digit, decimal point, 16 fraction, "E", sign, 3 exponent)
- Pad char. Specifies the pad character used when strings or numeric values are written to an external string representation. Enter a character (single byte for strings, can be multi byte for ustrings) or choose null or space. The pad character is used when the external string representation is larger than required to hold the written field. In this case, the external string is filled with the pad character to its full length. Space is the default. Applies to string, ustring, and numeric data types and record, subrecord, or tagged types if they contain at least one field of this type.
- Character set. Specifies the character set. Choose from ASCII or EBCDIC. The default is ASCII. Applies to all data types except raw and ustring and record, subrecord, or tagged containing no fields other than raw or ustring.
String
These properties are applied to columns with a string data type, unless overridden at column level.
- Export EBCDIC as ASCII. Select this to specify that EBCDIC characters are written as ASCII characters. Applies to fields of the string data type and record, subrecord, or tagged fields if they contain at least one field of this type.
- Import ASCII as EBCDIC. Not relevant for input links.
Decimal
These properties are applied to columns with a decimal data type unless overridden at column level.
- Allow all zeros. Specifies whether to treat a packed decimal column containing all zeros (which is normally illegal) as a valid representation of zero. Select Yes or No. The default is No.
- Decimal separator. Specify the ASCII character that acts as the decimal separator (period by default).
- Packed. Select an option to specify what the decimal columns contain.
- Yes. Specifies that the decimal columns contain data in packed
decimal format (the default). This has the following sub-properties:
Sub-Properties Choices Check Yes. Verifies that data is packed.
No. Does not verify.
Signed Yes. To use the existing sign when writing decimal columns.
No. To write a positive sign (0xf) regardless of the columns' actual sign value.
- No (separate). Specifies that they contain unpacked decimal with
a separate sign byte. This has the following sub-property:
Sub-Property Description Sign Position Choose leading or trailing as appropriate.
- No (zoned). Specifies that they contain an unpacked decimal in
either ASCII or EBCDIC text. This has the following sub-property:
Sub-Property Description Sign Position Choose leading or trailing as appropriate. - No (overpunch). Specifies that the field has a leading or end
byte that contains a character which specifies both the numeric value
of that byte and whether the number as a whole is negatively or positively
signed. This has the following sub-property:
Sub-Property Description Sign Position Choose leading or trailing as appropriate.
- Yes. Specifies that the decimal columns contain data in packed
decimal format (the default). This has the following sub-properties:
- Precision. Specifies the precision where a decimal column is written in text format. Enter a number. When a decimal is written to a string representation, InfoSphere DataStage uses the precision and scale defined for the source decimal field to determine the length of the destination string. The precision and scale properties override this default. When they are defined, InfoSphere DataStage truncates or pads the source decimal to fit the size of the destination string. If you have also specified the field width property, InfoSphere DataStage truncates or pads the source decimal to fit the size specified by field width.
- Rounding. Specifies how to round a decimal column when writing
it. Choose from the following rounding items:
- up (ceiling). Truncate source column towards positive infinity. This mode corresponds to the IEEE 754 Round Up mode. For example, 1.4 becomes 2, -1.6 becomes -1.
- down (floor). Truncate source column towards negative infinity. This mode corresponds to the IEEE 754 Round Down mode. For example, 1.6 becomes 1, -1.4 becomes -2.
- nearest value. Round the source column towards the nearest representable value. This mode corresponds to the COBOL ROUNDED mode. For example, 1.4 becomes 1, 1.5 becomes 2, -1.4 becomes -1, -1.5 becomes -2.
- truncate towards zero. This is the default. Discard fractional digits to the right of the right-most fractional digit supported by the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, truncate to the scale size of the destination decimal. This mode corresponds to the COBOL INTEGER-PART function. Using this method 1.6 becomes 1, -1.6 becomes -1.
- Scale. Specifies how to round a source decimal when its precision and scale are greater than those of the destination. By default, when the InfoSphere DataStage writes a source decimal to a string representation, it uses the precision and scale defined for the source decimal field to determine the length of the destination string. You can override the default by means of the precision and scale properties. When you do, InfoSphere DataStage truncates or pads the source decimal to fit the size of the destination string. If you have also specified the field width property, InfoSphere DataStage truncates or pads the source decimal to fit the size specified by field width.
Numeric
These properties apply to integer and float fields unless overridden at column level.
- C_format. Perform non-default conversion of data from integer or floating-point data to a string. This property specifies a C-language format string used for writing integer or floating point strings. This is passed to sprintf(). For example, you specify a C-format of %x and a field width of 8, so that integers are written as 8-byte hexadecimal strings.
- In_format. This property is not relevant for input links.
- Out_format. Format string used for conversion of data from integer or floating-point data to a string. This is passed to sprintf(). By default, DataStage invokes the C sprintf() function to convert a numeric field formatted as either integer or floating point data to a string. If this function does not output data in a satisfactory format, you can specify the out_format property to pass formatting arguments to sprintf().
Date
These properties are applied to columns with a date data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
- Days since. Dates are written as a signed integer containing the number of days since the specified date. Enter a date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
- Format string. The string format of a date. By default this is
%yyyy-%mm-%dd. The Format string can
contain one or a combination of the following elements:
- %dd. A two-digit day.
- %mm. A two-digit month.
- %year_cutoffyy. A two-digit year derived from yy and the specified four-digit year cutoff, for example %1970yy.
- %yy. A two-digit year derived from a year cutoff of 1900.
- %yyyy. A four-digit year.
- %ddd. Day of year in three-digit form (range of 1- 366).
- %mmm. Three-character month abbreviation.
The format_string is subject to the following restrictions:
- It cannot have more than one element of the same type, for example it cannot contain two %dd elements.
- It cannot have both %dd and %ddd.
- It cannot have both %yy and %yyyy.
- It cannot have both %mm and %ddd.
- It cannot have both %mmm and %ddd.
- It cannot have both %mm and %mmm.
- If it has %dd, it must have %mm or %mmm.
- It must have exactly one of %yy or %yyyy.
When you specify a date format string, prefix each component with the percent symbol (%). Separate the string's components with any character except the percent sign (%).
If this format string does not include a day, it is set to the first of the month in the destination field. If the format string does not include the month and day, they default to January 1. Note that the format string must contain a month if it also contains a day; that is, you cannot omit only the month.
The year_cutoff is the year defining the beginning of the century in which all two digit years fall. By default, the year cutoff is 1900; therefore, a two-digit year of 97 represents 1997. You can also set this using the environment variable APT_DATE_CENTURY_BREAK_YEAR, but this is overridden by %year_cutoffyy if you have set it.
You can specify any four-digit year as the year cutoff. All two-digit years then specify the next possible year ending in the specified two digits that is the same or greater than the cutoff. For example, if you set the year cutoff to 1930, the two-digit year 30 corresponds to 1930, and the two-digit year 29 corresponds to 2029.
- Is Julian. Select this to specify that dates are written as a numeric value containing the Julian day. A Julian day specifies the date as the number of days from 4713 BCE January 1, 12:00 hours (noon) GMT.
Time
These properties are applied to columns with a time data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
- Format string. Specifies the format of columns representing time
as a string. By default this is %hh-%mm-%ss.
The possible components of the time format string are:
- %hh. A two-digit hours component.
- %nn. A two-digit minute component (nn represents minutes because mm is used for the month of a date).
- %ss. A two-digit seconds component.
- %ss.n. A two-digit seconds plus fractional part, where n is the number of fractional digits with a maximum value of 6. If n is 0, no decimal point is printed as part of the seconds component. Trailing zeros are not suppressed.
You must prefix each component of the format string with the percent symbol. Separate the string's components with any character except the percent sign (%).
- Is midnight seconds. Select this to specify that times are written as a binary 32-bit integer containing the number of seconds elapsed from the previous midnight.
Timestamp
- Format string. Specifies the format of a column representing a
timestamp as a string. Defaults to %yyyy-%mm-%dd %hh:%nn:%ss.
- %dd. A two-digit day.
- %mm. A two-digit month.
- %year_cutoffyy. A two-digit year derived from yy and the specified four-digit year cutoff.
- %yy. A two-digit year derived from a year cutoff of 1900.
- %yyyy. A four-digit year.
- %ddd. Day of year in three-digit form (range of 1 - 366)
The following items explains the format of the hours:
- %hh. A two-digit hours component.
- %nn. A two-digit minute component (nn represents minutes because mm is used for the month of a date).
- %ss. A two-digit seconds component.
- %ss:n. A two-digit seconds plus fractional part, where n is the number of fractional digits with a maximum value of 6. If n is 0, no decimal point is printed as part of the seconds component. Trailing zeros are not suppressed.
You must prefix each component of the format string with the percent sign (%). Separate the string's components with any character except the percent sign (%).