Modify stage: Stage tab (DataStage)

You can specify aspects of the Modify stage by double-clicking the stage and updating settings on the Stage tab.

The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes.

Properties

The modify stage only has one property, although you can repeat this as required.
Table 1. Properties
Category/Property Values Default Mandatory? Repeats? Dependent of
Options/Specification string N/A Y Y N/A

Specification

This is a statement with one of the following the forms:

  • DROP columnname [, columnname]
  • KEEP columnname [, columnname]
  • new_columnname:new_type = [explicit_conversion_function] old_columnname

If you choose to drop a column or columns, all columns are retained except those you explicitly drop. If you chose to keep a column or columns, all columns are excluded except those you explicitly keep.

Each line is a new specification. If you specify multiple specifications each will be carried out sequentially.

Some type conversions IBM® DataStage® can carry out automatically, others need you to specify an explicit conversion function. Some conversions are not available.

The following table summarizes the availability, with the source fields shown vertically and the target fields shown horizontally. A value of "d" indicates automatic (default) conversion, "m" indicates that manual conversion is required, a blank square indicates that conversion is not possible:

  int8 uint8 int16 uint16 int32 uint32 int64 uint64 sfloat dfloat
int8   d d d d d d d d d m
uint8 d   d d d d d d d d
int16 d m d   d d d d d d d
uint16 d d d   d d d d d d
int32 d m d d d   d d d d d
uint32 d d d d d   d d d d
int64 d m d d d d d   d d d
uint64 d d d d d d d   d d
sfloat d m d d d d d d d   d
dfloat d m d d d d d d d d  
decimal d m d d d d m d d m d m d d m
string d m d d m d d d m d d d d m
ustring d m d d m d d d m d d d d m
raw m       m          
date m   m   m m        
time m       m         m
time stamp m       m         m
  decimal string ustring raw date time timestamp
int8 d d m d m   m m m
uint8 d d d        
int16 d d m d m        
uint16 d d m d m        
int32 d d m d m   m   m
uint32 d m m   m    
int64 d d d        
uint64 d d d        
sfloat d d d        
dfloat d m d m d m     m m
decimal   d m d m        
string d m   d   m m m
ustring d m d       m m
raw              
date   m m       m
time   m m       d m
timestamp   m m   m m  

For a default type conversion, your specification would take the following form:


new_columnname:new_type = [explicit_conversion_function] old_columnname 

For example, to produce an int8 column type:


int8col:int8 = uint64col

Where a manual conversion is required, your specification takes the form:


new_columnname:new_type = conversion_function (old_columnname)

For example:


day_column:int8 = month_day_from_date (date_column)

The new_type can be any of the destination types that are supported for conversions from the source (that is, any of the columns marked "m" in the preceding table). For example, you can use the conversion hours_from_time to convert a time to an int8, or to an int16, int32, dfloat, and so on. IBM DataStage warns you when it is performing an implicit data type conversion, for example hours_from_time expects to convert a time to an int8, and will warn you if converting to a int16, int32, or dfloat.

The following table lists the available conversion functions. The source and destinations are always specified in terms of column names. Preliminary arguments are enclosed in square brackets, the source column name is enclosed in round brackets.

Conversion Arguments Output type Description Example
date_from_days_since [base_date (date)] (number_col (int32)) date Converts an integer field into a date by adding the integer to the specified base date. The base_date must be in the format yyyy-mm-dd and must be either double

quoted or a variable.

date_col:date = date_from_days_since ["1958-08-18"] (int_col)
date_from_julian_day (juliandate_col (uint32)) date Date from Julian day. date_col:date = date_from_julian_day (julian_col)
date_from_string [date_format] (string_col (string)) date Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. date_col:date = date_from_string ["%yyyy-%mm-%dd"] (string_col)
date_from_timestamp (timestamp_col (timestamp) ) date Converts the timestamp to a date representation. date_col:date = date_from_timestamp (ts_col)
date_from_ustring [date_format] (string_col (ustring)) date Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. date_col:date = date_from_ustring (ustring_col, "%yyyy-%mm-%dd")
days_since_from_date [source_date (date)] (date_col (string)) int32 Returns a value corresponding to the number of days from source_date to the specified date. source_date must be in the form yyyy-mm-dd and can be quoted or unquoted. dayssince_col:int32 = days_since_from_date ["1958-08-18"] (sourcedate_col,)
decimal_from_decimal [r_type] (source_decimal_col (decimal)) decimal Decimal from decimal. decimal_col:decimal = decimal_from_decimal [ceil] (source_col)
decimal_from_dfloat [r_type] (source_dfloat_col (dfloat)) decimal Decimal from dfloat. decimal_col:decimal = decimal_from_dfloat [ceil] (source_col)
decimal_from_string [r_type] (source_string_col (string)) decimal Decimal from string. decimal_col:decimal = decimal_from_string [ceil] (source_col)
decimal_from_ustring [r_type] (source_ustring_col (ustring)) decimal Decimal from ustring. decimal_col:decimal = decimal_from_ustring [ceil] (source_col)
dfloat_from_decimal [fix_zero] (source_dec_col (decimal)) dfloat Dfloat from decimal. dfloat_col:dfloat = dfloat_from_decimal [fix_zero] (source_col)
hours_from_time (source_time_col (time)) int8 Hours from time. hours_col:int8 = hours_from_time (time_col)
int32_from_decimal [r_type, fix_zero] (source_decimal_col (decimal)) int32 Int32 from decimal. int32_col:int32 = int32_from_decimal [ceil] [fix_zero] (dec_col)
int64_from_decimal [r_type, fix_zero] (source_decimal_col (decimal)) int64 Int64 from decimal. int64_col:int64 = int64_from_decimal [ceil] (dec_col)
julian_day_from_date (date_col (date)) uint32 Julian day from date. julianday_col:uint32 = julian_day_from_from_date (date_col)
lookup_string_from_int16 [table_definition ], (number_col (int16)) string Converts numeric values to strings by means of a lookup table. gendercol:string = lookup_string_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_ustring_from_int16 [table_definition ] (number_col (int16)) ustring Converts numeric values to ustrings by means of a lookup table. gendercol:ustring = lookup_ustring_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_ustring_from_int32 [table_definition ] (number_col (int32)) ustring Converts numeric values to ustrings by means of a lookup table.. gendercol:ustring = lookup_string_from_int32 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_string_from_uint32 [table_definition ] (number_col (uint32)) string Converts numeric values to strings by means of a lookup table. gendercol:string = lookup_string_from_uint32 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_int16_from_string [table_definition ] (string_col (string)) int16 Converts strings to numeric values by means of a lookup table. int_col:int16 = lookup_int16_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_int16_from_ustring [table_definition ] (ustring_col (ustring)) int16 Converts strings to numeric values by means of a lookup table. int_col:int16 = lookup_int16_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_uint32_from_string [table_definition ] (string_col (string)) uint32 Converts strings to numeric values by means of a lookup table. int_col:uint32 = lookup_uint32_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lookup_uint32_from_ustring [table_definition ] (ustring_col (ustring)) uint32 Converts ustrings to numeric values by means of a lookup table. int_col:uint32 = lookup_uint32_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode)
lowercase_string (instring_col (string)) string Convert strings to all lower case. Non-alphabetic characters are ignored in the conversion. ostring_col:string = lowercase_string (istring_col)
lowercase_ustring (instring_col (ustring)) string Convert ustrings to all lower case. Non-alphabetic characters are ignored in the conversion. ostring_col:ustring = lowercase_string (istring_col)
mantissa_from_decimal (decimal_col (decimal)) dfloat Returns the mantissa from the given decimal matissa_col:dfloat = mantissa_from_decimal (dec_col)
mantissa_from_dfloat (dfloat_col (dfloat)) dfloat Returns the mantissa from the given dfloat matissa_col:dfloat = mantissa_from_dfloat (dfloat_col)
microseconds_from_time (time_col (time)) int32 Returns the microseconds from a time field. msec_col:int32 = microseconds_from_time (time_col)
midnight_seconds_from_time (time_col (time)) dfloat Returns the seconds-from-midnight from the supplied time. midsec_col:dfloat = midnight_seconds_from_time (time_col)
minutes_from_time (time_col (time)) int8 Returns the minutes from a time field. minsec_col:int8 = minutes_from_time (time_col)
month_day_from_date (date_col (date)) int8 Returns the day of month from a date field. monthday_col:int8 = month_day_from_date (date_col)
month_from_date (date_col (date)) int8 Returns the numeric month from a date field. month_col:int8 = month_from_date (date_col)
next_weekday_from_date [day] (date_col (date)) date Returns the date of the specified day of the week soonest after the source date (including the source date). day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. nextday_col:date = next_weekday_from_date [wed](date_col)
notnull (any) int8 Returns true (1) when an expression does not evaluate to the null value. isnotnull_col:int8 = notnull (test_col)
null (any) int8 Returns true (1) when an expression does evaluate to the null value isnull_col:int8 = null (test_col)
previous_weekday_from_date [day] (date_col (date)) date The destination contains the closest date for the specified day of the week earlier than the source date (including the source date). The day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. prevday_col:date = previous_weekday_from_date [wed](date_col)
raw_from_string (string_col (string)) raw Returns a string in raw representation. raw_col:raw = raw_from_string (string_col)
raw_length (raw_col (raw)) int32 Returns the length of a raw field. rawlength_col:int32 = raw_length (raw_col)
seconds_from_time (time_col (time)) dfloat Returns the seconds from a time field. sec_col:dfloat = seconds_from_time (time_col)
seconds_since_from_timestamp (timestamp_col (timestamp)) dfloat Seconds since the time given by timestamp. secsince_col:dfloat = seconds_since_from_timestamp (timestamp_col)
string_from_date [date_format] (date_col (date)) string Converts the date to a string representation using the specified date_format. datestring_col:string = string_from_date [%dd-%mm-%yyyy] (date_col)
string_from_decimal [fix_zero] (decimal_col (decimal) string Returns a string from a decimal. string_col:string = string_from_decimal [fix_zero] (dec_col)
string_from_time [time_format] (time_col (time)) string Converts the time to a string representation using the specified time_format. The default time format is %hh:%nn:%ss. timestring_col:string = string_from_time [%hh:%nn:%ss.] (time_col)
string_from_timestamp [timestamp_format] (timestamp_col (timestamp))   Converts the timestamp to a string representation using the specified timestamp_format. The default timestamp format is %yyyy-%mm-%dd. %hh:%nn:%ss. stringtimestamp_col:string = string_from_timestamp [%yyyy-%mm-%dd. %hh:%nn:%ss.] (timestamp_col)
string_from_ustring (string_col (ustring)) string Returns a string from a ustring. string_col:string = string_from_ustring (ustring_col)
string_length (string_col (string)) int32 Returns an int32 containing the length of a string. length_col:int32 = string_length (string_col)
substring [startPosition,len] (string_col (string)) string Converts long strings to shorter strings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. shorstring_col:string = substring [5,10] (longstring_col)
time_from_midnight_seconds (dfloat_col (dfloat)) time Returns a time from aseconds-from-midnight field. time_col:time = time_from_midnight_seconds (dfloat_col)
time_from_string [time_format] (string_col (string)) time Converts the string to a time representation using the specified time_format. The default time format is %hh:%nn:%ss. time_col:time = time_from_string [%hh:%nn:%ss.] (string_col)
time_from_timestamp (timestamp_col (timestamp)) time Time from timestamp. time_col:time = time_from_timestamp (timestamp_col)
time_from_ustring (string_col (ustring)) time Returns a time from a ustring. time_col:time = time_from_ustring (string_col)
timestamp_from_date [time](date_col (date)) time stamp Timestamp from date. The time argument optionally specifies the time to be used in building the timestamp result and must be in the form hh:nn:ss. If omitted, the time defaults to midnight. timestamp_col:timestamp = timestamp_from_date [08:20:33] (date_col)
timestamp_from_seconds_since (secondssince_col (dfloat)) time stamp Timestamp from a seconds since value. timestamp_col:timestamp = timestamp_from_seconds_since (secondssince_col)
timestamp_from_string [timestamp_format] (string_col (string)) time stamp Converts the string to a timestamp representation using the specified timestamp_format. By default, the string format is %yyyy-%mm-%dd hh:nn:ss.

timestamp_col:timestamp = timestamp_from_string [%yyyy-%mm-%dd hh:nn:ss] (string_col)
timestamp_from_time

[date](time_col (time)) time stamp Timestamp from time. The date argument is required. It specifies the date portion of the timestamp and must be in the form yyyy-mm-dd.

timestamp_col:timestamp = timestamp_from_time [1958-08-18] (time_col)
timestamp_from_timet (timet_col (int32)) time stamp Timestamp from time_t. The source field must contain a timestamp as defined by the UNIX time_t representation. timestamp_col:timestamp = timestamp_from_timet (timet_col)
timestamp_from_ustring (string_col (ustring)) time stamp Returns a timestamp from a ustring. timestamp_col:timestamp = timestamp_from_ustring (string_col)
timet_from_timestamp (tstamp_col (timestamp)) int32 Time_t from timestamp. The destination column contains a timestamp as defined by the UNIX time_t representation. timet_col:int32 = timet_from_timestamp (tstamp_col)
uint64_from_decimal [r_type, fix_zero] (dec_col (decimal)) uint64 Uint64 from decimal. int_col:uint64 = uint64_from_decimal [ceil, fix_zero] (dec_col)
uppercase_string (string_col (string)) string Convert strings to all upper case. Non-alphabetic characters are ignored in the conversion. string_col:string = uppercase_string (instring_col)
uppercase_ustring (string_col (ustring)) ustring Convert ustrings to all upper case. Non-alphabetic characters are ignored in the conversion. ustring_col:string = uppercase_ustring (string_col)
u_raw_from_string (string_col (ustring)) raw Returns a raw from a ustring raw_col:raw = u_raw_from_string (string_col)
ustring_from_date (date_col (date)) ustring Returns a ustring from a date. string_col:ustring = ustring_from_date (date_col)
ustring_from_decimal (dec_col (decimal)) ustring Returns a ustring from a decimal. string_col:ustring = ustring_from_decimal (dec_col)
ustring_from_string (string_col (string)) ustring Returns a ustring from a string. string_col:ustring = ustring_from_string (string_col)
ustring_from_time (time_col (time)) ustring Returns a ustring from a time. string_col:ustring = ustring_from_time (time_col)
ustring_from_timestamp (timestamp_col (timestamp)) ustring Returns a ustring from a timestamp. string_col:ustring = ustring_from_timestamp (timestamp_col)
ustring_length (string_col (ustring)) int32 Returns the length of a ustring. length_col:int32 = ustring_length (string_col)
u_substring [startPosition,len] (string_col (string)) ustring Converts long ustrings to shorter ustrings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. shorstring_col:ustring = substring [5,10] (longstring_col)
weekday_from_date [originDay] (date_col (date)) int8 Day of week from date. originDay is a string specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is defined as day zero. The originDay can be either single- or double-quoted or the quotes can be omitted. dow_int:int8 = [mon] (date_col)
year_day_from_date (date_col (date)) int16 Day of year from date (returned value 1-366). doy_col:int16 = year_day_from_date (date_col)
year_from_date (date_col (date)) int16 Year from date. year_col:int16 = year_from_date (date_col)
year_week_from_date (date_col (date)) int8 Week of year from date. week_col:int8 = year_week_from_date (date_col)

table_definition defines the rows of a string lookup table and has the following form:


{propertyList} ('string' = value; 'string' = value; ... )

where:

  • propertyList is one or more of the following options; the entire list is enclosed in braces and properties are separated by commas if there are more than one:
    • case_sensitive. Perform a case-sensitive search for matching strings; the default is case-insensitive.
    • default_value = defVal. The default numeric value returned for a string that does not match any of the strings in the table.
    • default_string = defString. The default string returned for numeric values that do not match any numeric value in the table.
  • string specifies a comma-separated list of strings associated with value; enclose each string in quotes.
  • value specifies a comma-separated list of 16-bit integer values associated with string.

date_format is the standard date formatting string

R_type is a string representing the rounding type and should contain one of the following:

  • ceil. Round the source field toward positive infinity. E.g, 1.4 -> 2, -1.6 -> -1.
  • floor. Round the source field toward negative infinity. E.g, 1.6 -> 1, -1.4 -> -2.
  • round_inf. Round or truncate the source field toward the nearest representable value, breaking ties by rounding positive values toward positive infinity and negative values toward negative infinity. E.g, 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2.
  • trunc_zero. Discard any fractional digits to the right of the rightmost fractional digit supported in the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. E.g, 1.6 -> 1, -1.6 -> -1.

You can specify fix_zero for decimal source columns so that columns containing all zeros (by default illegal) are treated as a valid decimal with a value of zero.

Advanced

This section allows you to specify the following:

  • Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
  • Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
  • Preserve partitioning. This is Set by default. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.