Modify stage: Options category
Use the Options category to specify how the Modify stage operates.
Specification
This is a statement with one of the following the forms:
- DROP columnname [, columnname]
- KEEP columnname [, columnname]
- new_columnname:new_type = [explicit_conversion_function] old_columnname
If you choose to drop a column or columns, all columns are retained except those you explicitly drop. If you chose to keep a column or columns, all columns are excluded except those you explicitly keep.
If you specify multiple specifications each will be carried out sequentially.
Some type conversions InfoSphere® DataStage® can carry out automatically, others need you to specify an explicit conversion function. Some conversions are not available.
The following table summarizes the availability, with the source fields shown vertically and the target fields shown horizontally. A value of "d" indicates automatic (default) conversion, "m" indicates that manual conversion is required, a blank square indicates that conversion is not possible:
int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | sfloat | dfloat | |
---|---|---|---|---|---|---|---|---|---|---|
int8 | d | d | d | d | d | d | d | d | d m | |
uint8 | d | d | d | d | d | d | d | d | d | |
int16 | d m | d | d | d | d | d | d | d | d | |
uint16 | d | d | d | d | d | d | d | d | d | |
int32 | d m | d | d | d | d | d | d | d | d | |
uint32 | d | d | d | d | d | d | d | d | d | |
int64 | d m | d | d | d | d | d | d | d | d | |
uint64 | d | d | d | d | d | d | d | d | d | |
sfloat | d m | d | d | d | d | d | d | d | d | |
dfloat | d m | d | d | d | d | d | d | d | d | |
decimal | d m | d | d | d | d m | d | d m | d m | d | d m |
string | d m | d | d m | d | d | d m | d | d | d | d m |
ustring | d m | d | d m | d | d | d m | d | d | d | d m |
raw | m | m | ||||||||
date | m | m | m | m | ||||||
time | m | m | m | |||||||
time stamp | m | m | m |
decimal | string | ustring | raw | date | time | timestamp | |
---|---|---|---|---|---|---|---|
int8 | d | d m | d m | m | m | m | |
uint8 | d | d | d | ||||
int16 | d | d m | d m | ||||
uint16 | d | d m | d m | ||||
int32 | d | d m | d m | m | m | ||
uint32 | d | m | m | m | |||
int64 | d | d | d | ||||
uint64 | d | d | d | ||||
sfloat | d | d | d | ||||
dfloat | d m | d m | d m | m | m | ||
decimal | d m | d m | |||||
string | d m | d | m | m | m | ||
ustring | d m | d | m | m | |||
raw | |||||||
date | m | m | m | ||||
time | m | m | d m | ||||
timestamp | m | m | m | m |
For a default type conversion, your specification would take the following form:
new_columnname:new_type = [explicit_conversion_function] old_columnname
For example, to produce an int8 column type:
int8col:int8 = uint64col
Where a manual conversion is required, your specification takes the form:
new_columnname:new_type = conversion_function (old_columnname)
For example:
day_column:int8 = month_day_from_date (date_column)
The new_type can be any of the destination types that are supported for conversions from the source (that is, any of the columns marked "m" in the above table). For example, you can use the conversion hours_from_time to convert a time to an int8, or to an int16, int32, dfloat, and so on. InfoSphere DataStage warns you when it is performing an implicit data type conversion, for example hours_from_time expects to convert a time to an int8, and will warn you if converting to a int16, int32, or dfloat.
The following table lists the available conversion functions. The source and destinations are always specified in terms of column names. Preliminary arguments are enclosed in square brackets, the source column name is enclosed in round brackets.
Conversion | Arguments | Output type | Description | Example |
---|---|---|---|---|
date_from_days_ since | [base_date (date)] (number_col (int32)) | date | Converts an integer field into a date by adding the integer
to the specified base date. The base_date must be in the format yyyy-mm-dd and
must be either double quoted or a variable. |
date_col:date = date_from_days_since ["1958-08-18"] (int_col) |
date_from_julian_day | (juliandate_col (uint32)) | date | Date from Julian day. | date_col:date = date_from_julian_day (julian_col) |
date_from_string | [date_format] (string_col (string)) | date | Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. | date_col:date = date_from_string ["%yyyy-%mm-%dd"] (string_col) |
date_from_timestamp | (timestamp_col (timestamp) ) | date | Converts the timestamp to a date representation. | date_col:date = date_from_timestamp (ts_col) |
date_from_ustring | [date_format] (string_col (ustring)) | date | Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. | date_col:date = date_from_ustring (ustring_col, "%yyyy-%mm-%dd") |
days_since_from_date | [source_date (date)] (date_col (string)) | int32 | Returns a value corresponding to the number of days from source_date to the specified date. source_date must be in the form yyyy-mm-dd and can be quoted or unquoted. | dayssince_col:int32 = days_since_from_date ["1958-08-18"] (sourcedate_col,) |
decimal_from_decimal | [r_type] (source_decimal_col (decimal)) | decimal | Decimal from decimal. | decimal_col:decimal = decimal_from_decimal [ceil] (source_col) |
decimal_from_dfloat | [r_type] (source_dfloat_col (dfloat)) | decimal | Decimal from dfloat. | decimal_col:decimal = decimal_from_dfloat [ceil] (source_col) |
decimal_from_string | [r_type] (source_string_col (string)) | decimal | Decimal from string. | decimal_col:decimal = decimal_from_string [ceil] (source_col) |
decimal_from_ustring | [r_type] (source_ustring_col (ustring)) | decimal | Decimal from ustring. | decimal_col:decimal = decimal_from_ustring [ceil] (source_col) |
dfloat_from_decimal | [fix_zero] (source_dec_col (decimal)) | dfloat | Dfloat from decimal. | dfloat_col:dfloat = dfloat_from_decimal [fix_zero] (source_col) |
hours_from_time | (source_time_col (time)) | int8 | Hours from time. | hours_col:int8 = hours_from_time (time_col) |
int32_from_decimal | [r_type, fix_zero] (source_decimal_col (decimal)) | int32 | Int32 from decimal. | int32_col:int32 = int32_from_decimal [cal, fix_zero] (dec_col) |
int64_from_decimal | [r_type, fix_zero] (source_decimal_col (decimal)) | int64 | Int64 from decimal. | int64_col:int64 = int64_from_decimal [ceil] (dec_col) |
julian_day_from_date | (date_col (date)) | uint32 | Julian day from date. | julianday_col:uint32 = julian_day_from_date (date_col) |
lookup_string_from _int16 | [table_definition ], (number_col (int16)) | string | Converts numeric values to strings by means of a lookup table. | gendercol:string = lookup_string_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_ustring_from _int16 | [table_definition ] (number_col (int16)) | ustring | Converts numeric values to ustrings by means of a lookup table. | gendercol:ustring = lookup_ustring_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_ustring_from _int32 | [table_definition ] (number_col (int32)) | ustring | Converts numeric values to ustrings by means of a lookup table.. | gendercol:ustring = lookup_string_from_int32 |
lookup_string_from _uint32 | [table_definition ] (number_col (uint32)) | string | Converts numeric values to strings by means of a lookup table. | gendercol:string = lookup_string_from_uint16 |
lookup_int16_from _string | [table_definition ] (string_col (string)) | int16 | Converts strings to numeric values by means of a lookup table. | int_col:int16 = lookup_int16_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_int16_from _ustring | [table_definition ] (ustring_col (ustring)) | int16 | Converts strings to numeric values by means of a lookup table. | int_col:int16 = lookup_int16_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_uint32_from _string | [table_definition ] (string_col (string)) | uint32 | Converts strings to numeric values by means of a lookup table. | int_col:uint32 = lookup_uint32_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_uint32_from _ustring | [table_definition ] (ustring_col (ustring)) | uint32 | Converts ustrings to numeric values by means of a lookup table. | int_col:uint32 = lookup_uint32_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lowercase_string | (instring_col (string)) | string | Convert strings to all lower case. Non-alphabetic characters are ignored in the conversion. | ostring_col:string = lowercase_string (istring_col) |
lowercase_ustring | (instring_col (ustring)) | string | Convert ustrings to all lower case. Non-alphabetic characters are ignored in the conversion. | ostring_col:ustring = lowercase_string (istring_col) |
mantissa_from _decimal | (decimal_col (decimal)) | dfloat | Returns the mantissa from the given decimal | matissa_col:dfloat = mantissa_from_decimal (dec_col) |
mantissa_from_dfloat | (dfloat_col (dfloat)) | dfloat | Returns the mantissa from the given dfloat | matissa_col:dfloat = mantissa_from_dfloat (dfloat_col) |
microseconds _from_time | (time_col (time)) | int32 | Returns the microseconds from a time field. | msec_col:int32 = microseconds_from_time (time_col) |
midnight_seconds _from_time | (time_col (time)) | dfloat | Returns the seconds-from-midnight from the supplied time. | midsec_col:dfloat = midnight_seconds_from_time (time_col) |
minutes_from_time | (time_col (time)) | int8 | Returns the minutes from a time field. | minsec_col:int8 = minutes_from_time (time_col) |
month_day _from_date | (date_col (date)) | int8 | Returns the day of month from a date field. | monthday_col:int8 = month_day_from_date (date_col) |
month_from_date | (date_col (date)) | int8 | Returns the numeric month from a date field. | month_col:int8 = month_from_date (date_col) |
next_weekday_from _date | [day] (date_col (date)) | date | Returns the date of the specified day of the week soonest after the source date (including the source date). day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. | nextday_col:date = next_weekday_from_date [wed](date_col) |
notnull | (any) | int8 | Returns true (1) when an expression does not evaluate to the null value. | isnotnull_col:int8 = notnull (test_col) Note:
test_col needs to be
Nullable checked as yes |
null | (any) | int8 | Returns true (1) when an expression does evaluate to the null value | isnull_col:int8 = null (test_col) |
previous_weekday _from_date | [day] (date_col (date)) | date | The destination contains the closest date for the specified day of the week earlier than the source date (including the source date). The day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. | prevday_col:date = previous_weekday_from_date [wed](date_col) |
raw_from_string | (string_col (string)) | raw | Returns a string in raw representation. | raw_col:raw = raw_from_string (string_col) |
raw_length | (raw_col (raw)) | int32 | Returns the length of a raw field. | rawlength_col:int32 = raw_length (raw_col) |
seconds_from_time | (time_col (time)) | dfloat | Returns the seconds from a time field. | sec_col:dfloat = seconds_from_time (time_col) |
seconds_since_from _timestamp | (timestamp_col (timestamp)) | dfloat | Seconds since the time given by timestamp. | secsince_col:dfloat = seconds_since_from_timestamp (timestamp_col) |
string_from_date | [date_format] (date_col (date)) | string | Converts the date to a string representation using the specified date_format. | datestring_col:string = string_from_date [%dd-%mm-%yyyy] (date_col) |
string_from_decimal | [fix_zero] (decimal_col (decimal) | string | Returns a string from a decimal. | string_col:string = string_from_decimal [fix_zero] (dec_col) |
string_from_time | [time_format] (time_col (time)) | string | Converts the time to a string representation using the specified time_format. The default time format is %hh:%nn:%ss. | timestring_col:string = string_from_time [%hh:%nn:%ss.] (time_col) |
string_from _timestamp | [timestamp_format] (timestamp_col (timestamp)) | string | Converts the timestamp to a string representation using the specified timestamp _format. The default timestamp format is %yyyy-%mm-%dd. %hh:%nn:%ss. | stringtimestamp_col:string = string_from_timestamp [%yyyy-%mm-%dd. %hh:%nn:%ss.] (timestamp_col) |
string_from_ustring | (string_col (ustring)) | string | Returns a string from a ustring. | string_col:string = string_from_ustring (ustring_col) |
string_length | (string_col (string)) | int32 | Returns an int32 containing the length of a string. | length_col:int32 = string_length (string_col) |
substring | [startPosition,len] (string_col (string)) | string | Converts long strings to shorter strings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. | shorstring_col:string = substring [5,10] (longstring_col) |
time_from_midnight _seconds | (dfloat_col (dfloat)) | time | Returns a time from aseconds-from-midnight field. | time_col:time = time_from_midnight_seconds (dfloat_col) |
time_from_string | [time_format] (string_col (string)) | time | Converts the string to a time representation using the specified time_format. The default time format is %hh:%nn:%ss | time_col:time = time_from_string [%hh:%nn:%ss] (string_col) |
time_from_timestamp | (timestamp_col (timestamp)) | time | Time from timestamp. | time_col:time = time_from_timestamp (timestamp_col) |
time_from_ustring | (string_col (ustring)) | time | Returns a time from a ustring. | time_col:time = time_from_ustring (string_col) |
timestamp_from_date | [time](date_col (date)) | timestamp | Timestamp from date. The time argument optionally specifies the time to be used in building the timestamp result and must be in the form hh:nn:ss. If omitted, the time defaults to midnight. | timestamp_col:timestamp = timestamp_from_date [08:20:33] (date_col) |
timestamp_from _seconds_since | (secondssince_col (dfloat)) | timestamp | Timestamp from a seconds since value. | timestamp_col:timestamp = timestamp_from_seconds_since [1964-10-15.08:20:33](secondssince_col) |
timestamp_from _string | [timestamp_format] (string_col (string)) | timestamp | Converts the string to a timestamp representation using the specified timestamp _format. By default, the string format is %yyyy-%mm-%dd hh:nn:ss. | timestamp_col:timestamp = timestamp_from_string [%yyyy-%mm-%dd hh:nn:ss] (string_col) |
timestamp_from_time | [date](time_col (time)) | timestamp | Timestamp from time. The date argument is required. It specifies the date portion of the timestamp and must be in the form yyyy-mm-dd. | timestamp_col:timestamp = timestamp_from_time [1958-08-18] (time_col) |
timestamp_from _timet | (timet_col (int32)) | timestamp | Timestamp from time_t. The source field must contain a timestamp as defined by the UNIX time_t representation. | timestamp_col:timestamp = timestamp_from_timet (timet_col) |
timestamp_from _ustring | (string_col (ustring)) | timestamp | Returns a timestamp from a ustring. | timestamp_col:timestamp = timestamp_from_ustring (string_col) |
timet_from _timestamp | (tstamp_col (timestamp)) | int32 | Time_t from timestamp. The destination column contains a timestamp as defined by the UNIX time_t representation. | timet_col:int32 = timet_from_timestamp (tstamp_col) |
uint64_from_decimal | [r_type, fix_zero] (dec_col (decimal)) | uint64 | Uint64 from decimal. | int_col:uint64 = uint64_from_decimal [ceil, fix_zero] (dec_col) |
uppercase_string | (string_col (string)) | string | Convert strings to all upper case. Non-alphabetic characters are ignored in the conversion. | string_col:string = uppercase_string (instring_col) |
uppercase_ustring | (string_col (ustring)) | ustring | Convert ustrings to all upper case. Non-alphabetic characters are ignored in the conversion. | ustring_col:string = uppercase_ustring (string_col) |
u_raw_from_string | (string_col (ustring)) | raw | Returns a raw from a ustring | raw_col:raw = u_raw_from_string (string_col) |
ustring_from_date | (date_col (date)) | ustring | Returns a ustring from a date. | string_col:ustring = ustring_from_date (date_col) |
ustring_from_decimal | (dec_col (decimal)) | ustring | Returns a ustring from a decimal. | string_col:ustring = ustring_from_decimal (dec_col) |
ustring_from_string | (string_col (string)) | ustring | Returns a ustring from a string. | string_col:ustring = ustring_from_string (string_col) |
ustring_from_time | (time_col (time)) | ustring | Returns a ustring from a time. | string_col:ustring = ustring_from_time (time_col) |
ustring_from _timestamp | (timestamp_col (timestamp)) | ustring | Returns a ustring from a timestamp. | string_col:ustring = ustring_from_timestamp (timestamp_col) |
ustring_length | (string_col (ustring)) | int32 | Returns the length of a ustring. | length_col:int32 = ustring_length (string_col) |
u_substring | [startPosition,len] (string_col (string)) | ustring | Converts long ustrings to shorter ustrings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. | shorstring_col:ustring = substring [5,10] (longstring_col) |
weekday_from_date | [originDay] (date_col (date)) | int8 | Day of week from date. originDay is a string specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is defined as day zero. The originDay can be either single- or double-quoted or the quotes can be omitted. | dow_int:int8 = weekday_from_date [mon] (date_col) |
year_day_from_date | (date_col (date)) | int16 | Day of year from date (returned value 1-366). | doy_col:int16 = year_day_from_date (date_col) |
year_from_date | (date_col (date)) | int16 | Year from date. | year_col:int16 = year_from_date (date_col) |
year_week_from_date | (date_col (date)) | int8 | Week of year from date. | week_col:int8 = year_week_from_date (date_col) |
table_definition defines the rows of a string lookup table and has the following form:
{propertyList} ('string' = value; 'string' = value; ... )
where:
- propertyList is one or more of the following
options; the entire list is enclosed in braces and properties are
separated by commas if there are more than one:
- case_sensitive. Perform a case-sensitive search for matching strings; the default is case-insensitive.
- default_value = defVal. The default numeric value returned for a string that does not match any of the strings in the table.
- default_string = defString. The default string returned for numeric values that do not match any numeric value in the table.
- string specifies a comma-separated list of strings associated with value; enclose each string in quotes.
- value specifies a comma-separated list of 16-bit integer values associated with string.
date_format is the standard date formatting string described in Date and time formats
R_type is a string representing the rounding type and should contain one of the following:
- ceil. Round the source field toward positive infinity. E.g, 1.4 -> 2, -1.6 -> -1.
- floor. Round the source field toward negative infinity. E.g, 1.6 -> 1, -1.4 -> -2.
- round_inf. Round or truncate the source field toward the nearest representable value, breaking ties by rounding positive values toward positive infinity and negative values toward negative infinity. E.g, 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2.
- trunc_zero. Discard any fractional digits to the right of the rightmost fractional digit supported in the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. E.g, 1.6 -> 1, -1.6 -> -1.
You can specify fix_zero for decimal source columns so that columns containing all zeros (by default illegal) are treated as a valid decimal with a value of zero.