Modify stage: Options category

Use the Options category to specify how the Modify stage operates.

Specification

This is a statement with one of the following the forms:

  • DROP columnname [, columnname]
  • KEEP columnname [, columnname]
  • new_columnname:new_type = [explicit_conversion_function] old_columnname

If you choose to drop a column or columns, all columns are retained except those you explicitly drop. If you chose to keep a column or columns, all columns are excluded except those you explicitly keep.

If you specify multiple specifications each will be carried out sequentially.

Some type conversions InfoSphere® DataStage® can carry out automatically, others need you to specify an explicit conversion function. Some conversions are not available.

The following table summarizes the availability, with the source fields shown vertically and the target fields shown horizontally. A value of "d" indicates automatic (default) conversion, "m" indicates that manual conversion is required, a blank square indicates that conversion is not possible:

  int8 uint8 int16 uint16 int32 uint32 int64 uint64 sfloat dfloat
int8   d d d d d d d d d m
uint8 d   d d d d d d d d
int16 d m d   d d d d d d d
uint16 d d d   d d d d d d
int32 d m d d d   d d d d d
uint32 d d d d d   d d d d
int64 d m d d d d d   d d d
uint64 d d d d d d d   d d
sfloat d m d d d d d d d   d
dfloat d m d d d d d d d d  
decimal d m d d d d m d d m d m d d m
string d m d d m d d d m d d d d m
ustring d m d d m d d d m d d d d m
raw m       m          
date m   m   m m        
time m       m         m
time stamp m       m         m
  decimal string ustring raw date time timestamp
int8 d d m d m   m m m
uint8 d d d        
int16 d d m d m        
uint16 d d m d m        
int32 d d m d m   m   m
uint32 d m m   m    
int64 d d d        
uint64 d d d        
sfloat d d d        
dfloat d m d m d m     m m
decimal   d m d m        
string d m   d   m m m
ustring d m d       m m
raw              
date   m m       m
time   m m       d m
timestamp   m m   m m  

For a default type conversion, your specification would take the following form:


new_columnname:new_type = [explicit_conversion_function] old_columnname 

For example, to produce an int8 column type:


int8col:int8 = uint64col

Where a manual conversion is required, your specification takes the form:


new_columnname:new_type = conversion_function (old_columnname)

For example:


day_column:int8 = month_day_from_date (date_column)

The new_type can be any of the destination types that are supported for conversions from the source (that is, any of the columns marked "m" in the above table). For example, you can use the conversion hours_from_time to convert a time to an int8, or to an int16, int32, dfloat, and so on. InfoSphere DataStage warns you when it is performing an implicit data type conversion, for example hours_from_time expects to convert a time to an int8, and will warn you if converting to a int16, int32, or dfloat.

The following table lists the available conversion functions. The source and destinations are always specified in terms of column names. Preliminary arguments are enclosed in square brackets, the source column name is enclosed in round brackets.

Conversion Arguments Output type Description Example
date_from_days_ since [base_date (date)] (number_col (int32)) date Converts an integer field into a date by adding the integer to the specified base date. The base_date must be in the format yyyy-mm-dd and must be either double

quoted or a variable.

date_col:date = date_from_days_since ["1958-08-18"] (int_col)
date_from_julian_day (juliandate_col (uint32)) date Date from Julian day. date_col:date = date_from_julian_day (julian_col)
date_from_string [date_format] (string_col (string)) date Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. date_col:date = date_from_string ["%yyyy-%mm-%dd"] (string_col)
date_from_timestamp (timestamp_col (timestamp) ) date Converts the timestamp to a date representation. date_col:date = date_from_timestamp (ts_col)
date_from_ustring [date_format] (string_col (ustring)) date Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. date_col:date = date_from_ustring (ustring_col, "%yyyy-%mm-%dd")
days_since_from_date [source_date (date)] (date_col (string)) int32 Returns a value corresponding to the number of days from source_date to the specified date. source_date must be in the form yyyy-mm-dd and can be quoted or unquoted. dayssince_col:int32 = days_since_from_date ["1958-08-18"] (sourcedate_col,)
decimal_from_decimal [r_type] (source_decimal_col (decimal)) decimal Decimal from decimal. decimal_col:decimal = decimal_from_decimal [ceil] (source_col)
decimal_from_dfloat [r_type] (source_dfloat_col (dfloat)) decimal Decimal from dfloat. decimal_col:decimal = decimal_from_dfloat [ceil] (source_col)
decimal_from_string [r_type] (source_string_col (string)) decimal Decimal from string. decimal_col:decimal = decimal_from_string [ceil] (source_col)
decimal_from_ustring [r_type] (source_ustring_col (ustring)) decimal Decimal from ustring. decimal_col:decimal = decimal_from_ustring [ceil] (source_col)
dfloat_from_decimal [fix_zero] (source_dec_col (decimal)) dfloat Dfloat from decimal. dfloat_col:dfloat = dfloat_from_decimal [fix_zero] (source_col)
hours_from_time (source_time_col (time)) int8 Hours from time. hours_col:int8 = hours_from_time (time_col)
int32_from_decimal [r_type, fix_zero] (source_decimal_col (decimal)) int32 Int32 from decimal. int32_col:int32 = int32_from_decimal [cal, fix_zero] (dec_col)
int64_from_decimal [r_type, fix_zero] (source_decimal_col (decimal)) int64 Int64 from decimal. int64_col:int64 = int64_from_decimal [ceil] (dec_col)
julian_day_from_date (date_col (date)) uint32 Julian day from date. julianday_col:uint32 = julian_day_from_date (date_col)
lookup_string_from _int16 [table_definition ], (number_col (int16)) string Converts numeric values to strings by means of a lookup table. gendercol:string = lookup_string_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_ustring_from _int16 [table_definition ] (number_col (int16)) ustring Converts numeric values to ustrings by means of a lookup table. gendercol:ustring = lookup_ustring_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_ustring_from _int32 [table_definition ] (number_col (int32)) ustring Converts numeric values to ustrings by means of a lookup table.. gendercol:ustring = lookup_string_from_int32
lookup_string_from _uint32 [table_definition ] (number_col (uint32)) string Converts numeric values to strings by means of a lookup table. gendercol:string = lookup_string_from_uint16
lookup_int16_from _string [table_definition ] (string_col (string)) int16 Converts strings to numeric values by means of a lookup table. int_col:int16 = lookup_int16_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_int16_from _ustring [table_definition ] (ustring_col (ustring)) int16 Converts strings to numeric values by means of a lookup table. int_col:int16 = lookup_int16_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_uint32_from _string [table_definition ] (string_col (string)) uint32 Converts strings to numeric values by means of a lookup table. int_col:uint32 = lookup_uint32_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_uint32_from _ustring [table_definition ] (ustring_col (ustring)) uint32 Converts ustrings to numeric values by means of a lookup table. int_col:uint32 = lookup_uint32_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lowercase_string (instring_col (string)) string Convert strings to all lower case. Non-alphabetic characters are ignored in the conversion. ostring_col:string = lowercase_string (istring_col)
lowercase_ustring (instring_col (ustring)) string Convert ustrings to all lower case. Non-alphabetic characters are ignored in the conversion. ostring_col:ustring = lowercase_string (istring_col)
mantissa_from _decimal (decimal_col (decimal)) dfloat Returns the mantissa from the given decimal matissa_col:dfloat = mantissa_from_decimal (dec_col)
mantissa_from_dfloat (dfloat_col (dfloat)) dfloat Returns the mantissa from the given dfloat matissa_col:dfloat = mantissa_from_dfloat (dfloat_col)
microseconds _from_time (time_col (time)) int32 Returns the microseconds from a time field. msec_col:int32 = microseconds_from_time (time_col)
midnight_seconds _from_time (time_col (time)) dfloat Returns the seconds-from-midnight from the supplied time. midsec_col:dfloat = midnight_seconds_from_time (time_col)
minutes_from_time (time_col (time)) int8 Returns the minutes from a time field. minsec_col:int8 = minutes_from_time (time_col)
month_day _from_date (date_col (date)) int8 Returns the day of month from a date field. monthday_col:int8 = month_day_from_date (date_col)
month_from_date (date_col (date)) int8 Returns the numeric month from a date field. month_col:int8 = month_from_date (date_col)
next_weekday_from _date [day] (date_col (date)) date Returns the date of the specified day of the week soonest after the source date (including the source date). day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. nextday_col:date = next_weekday_from_date [wed](date_col)
notnull (any) int8 Returns true (1) when an expression does not evaluate to the null value. isnotnull_col:int8 = notnull (test_col)
Note: test_col needs to be Nullable checked as yes
null (any) int8 Returns true (1) when an expression does evaluate to the null value isnull_col:int8 = null (test_col)
previous_weekday _from_date [day] (date_col (date)) date The destination contains the closest date for the specified day of the week earlier than the source date (including the source date). The day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. prevday_col:date = previous_weekday_from_date [wed](date_col)
raw_from_string (string_col (string)) raw Returns a string in raw representation. raw_col:raw = raw_from_string (string_col)
raw_length (raw_col (raw)) int32 Returns the length of a raw field. rawlength_col:int32 = raw_length (raw_col)
seconds_from_time (time_col (time)) dfloat Returns the seconds from a time field. sec_col:dfloat = seconds_from_time (time_col)
seconds_since_from _timestamp (timestamp_col (timestamp)) dfloat Seconds since the time given by timestamp. secsince_col:dfloat = seconds_since_from_timestamp (timestamp_col)
string_from_date [date_format] (date_col (date)) string Converts the date to a string representation using the specified date_format. datestring_col:string = string_from_date [%dd-%mm-%yyyy] (date_col)
string_from_decimal [fix_zero] (decimal_col (decimal) string Returns a string from a decimal. string_col:string = string_from_decimal [fix_zero] (dec_col)
string_from_time [time_format] (time_col (time)) string Converts the time to a string representation using the specified time_format. The default time format is %hh:%nn:%ss. timestring_col:string = string_from_time [%hh:%nn:%ss.] (time_col)
string_from _timestamp [timestamp_format] (timestamp_col (timestamp)) string Converts the timestamp to a string representation using the specified timestamp _format. The default timestamp format is %yyyy-%mm-%dd. %hh:%nn:%ss. stringtimestamp_col:string = string_from_timestamp [%yyyy-%mm-%dd. %hh:%nn:%ss.] (timestamp_col)
string_from_ustring (string_col (ustring)) string Returns a string from a ustring. string_col:string = string_from_ustring (ustring_col)
string_length (string_col (string)) int32 Returns an int32 containing the length of a string. length_col:int32 = string_length (string_col)
substring [startPosition,len] (string_col (string)) string Converts long strings to shorter strings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. shorstring_col:string = substring [5,10] (longstring_col)
time_from_midnight _seconds (dfloat_col (dfloat)) time Returns a time from aseconds-from-midnight field. time_col:time = time_from_midnight_seconds (dfloat_col)
time_from_string [time_format] (string_col (string)) time Converts the string to a time representation using the specified time_format. The default time format is %hh:%nn:%ss time_col:time = time_from_string [%hh:%nn:%ss] (string_col)
time_from_timestamp (timestamp_col (timestamp)) time Time from timestamp. time_col:time = time_from_timestamp (timestamp_col)
time_from_ustring (string_col (ustring)) time Returns a time from a ustring. time_col:time = time_from_ustring (string_col)
timestamp_from_date [time](date_col (date)) timestamp Timestamp from date. The time argument optionally specifies the time to be used in building the timestamp result and must be in the form hh:nn:ss. If omitted, the time defaults to midnight. timestamp_col:timestamp = timestamp_from_date [08:20:33] (date_col)
timestamp_from _seconds_since (secondssince_col (dfloat)) timestamp Timestamp from a seconds since value. timestamp_col:timestamp = timestamp_from_seconds_since [1964-10-15.08:20:33](secondssince_col)
timestamp_from _string [timestamp_format] (string_col (string)) timestamp Converts the string to a timestamp representation using the specified timestamp _format. By default, the string format is %yyyy-%mm-%dd hh:nn:ss.

timestamp_col:timestamp = timestamp_from_string [%yyyy-%mm-%dd hh:nn:ss] (string_col)
timestamp_from_time

[date](time_col (time)) timestamp Timestamp from time. The date argument is required. It specifies the date portion of the timestamp and must be in the form yyyy-mm-dd.

timestamp_col:timestamp = timestamp_from_time [1958-08-18] (time_col)
timestamp_from _timet (timet_col (int32)) timestamp Timestamp from time_t. The source field must contain a timestamp as defined by the UNIX time_t representation. timestamp_col:timestamp = timestamp_from_timet (timet_col)
timestamp_from _ustring (string_col (ustring)) timestamp Returns a timestamp from a ustring. timestamp_col:timestamp = timestamp_from_ustring (string_col)
timet_from _timestamp (tstamp_col (timestamp)) int32 Time_t from timestamp. The destination column contains a timestamp as defined by the UNIX time_t representation. timet_col:int32 = timet_from_timestamp (tstamp_col)
uint64_from_decimal [r_type, fix_zero] (dec_col (decimal)) uint64 Uint64 from decimal. int_col:uint64 = uint64_from_decimal [ceil, fix_zero] (dec_col)
uppercase_string (string_col (string)) string Convert strings to all upper case. Non-alphabetic characters are ignored in the conversion. string_col:string = uppercase_string (instring_col)
uppercase_ustring (string_col (ustring)) ustring Convert ustrings to all upper case. Non-alphabetic characters are ignored in the conversion. ustring_col:string = uppercase_ustring (string_col)
u_raw_from_string (string_col (ustring)) raw Returns a raw from a ustring raw_col:raw = u_raw_from_string (string_col)
ustring_from_date (date_col (date)) ustring Returns a ustring from a date. string_col:ustring = ustring_from_date (date_col)
ustring_from_decimal (dec_col (decimal)) ustring Returns a ustring from a decimal. string_col:ustring = ustring_from_decimal (dec_col)
ustring_from_string (string_col (string)) ustring Returns a ustring from a string. string_col:ustring = ustring_from_string (string_col)
ustring_from_time (time_col (time)) ustring Returns a ustring from a time. string_col:ustring = ustring_from_time (time_col)
ustring_from _timestamp (timestamp_col (timestamp)) ustring Returns a ustring from a timestamp. string_col:ustring = ustring_from_timestamp (timestamp_col)
ustring_length (string_col (ustring)) int32 Returns the length of a ustring. length_col:int32 = ustring_length (string_col)
u_substring [startPosition,len] (string_col (string)) ustring Converts long ustrings to shorter ustrings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. shorstring_col:ustring = substring [5,10] (longstring_col)
weekday_from_date [originDay] (date_col (date)) int8 Day of week from date. originDay is a string specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is defined as day zero. The originDay can be either single- or double-quoted or the quotes can be omitted. dow_int:int8 = weekday_from_date [mon] (date_col)
year_day_from_date (date_col (date)) int16 Day of year from date (returned value 1-366). doy_col:int16 = year_day_from_date (date_col)
year_from_date (date_col (date)) int16 Year from date. year_col:int16 = year_from_date (date_col)
year_week_from_date (date_col (date)) int8 Week of year from date. week_col:int8 = year_week_from_date (date_col)

table_definition defines the rows of a string lookup table and has the following form:


{propertyList} ('string' = value; 'string' = value; ... )

where:

  • propertyList is one or more of the following options; the entire list is enclosed in braces and properties are separated by commas if there are more than one:
    • case_sensitive. Perform a case-sensitive search for matching strings; the default is case-insensitive.
    • default_value = defVal. The default numeric value returned for a string that does not match any of the strings in the table.
    • default_string = defString. The default string returned for numeric values that do not match any numeric value in the table.
  • string specifies a comma-separated list of strings associated with value; enclose each string in quotes.
  • value specifies a comma-separated list of 16-bit integer values associated with string.

date_format is the standard date formatting string described in Date and time formats

R_type is a string representing the rounding type and should contain one of the following:

  • ceil. Round the source field toward positive infinity. E.g, 1.4 -> 2, -1.6 -> -1.
  • floor. Round the source field toward negative infinity. E.g, 1.6 -> 1, -1.4 -> -2.
  • round_inf. Round or truncate the source field toward the nearest representable value, breaking ties by rounding positive values toward positive infinity and negative values toward negative infinity. E.g, 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2.
  • trunc_zero. Discard any fractional digits to the right of the rightmost fractional digit supported in the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. E.g, 1.6 -> 1, -1.6 -> -1.

You can specify fix_zero for decimal source columns so that columns containing all zeros (by default illegal) are treated as a valid decimal with a value of zero.