Modify stage: Stage tab (DataStage)
You can specify aspects of the Modify stage by double-clicking the stage and updating settings on the Stage tab.
The Properties section lets you specify what the stage does. The Advanced section allows you to specify how the stage executes.
Properties
Category/Property | Values | Default | Mandatory? | Repeats? | Dependent of |
---|---|---|---|---|---|
Options/Specification | string | N/A | Y | Y | N/A |
Specification
This is a statement with one of the following the forms:
- DROP columnname [, columnname]
- KEEP columnname [, columnname]
- new_columnname:new_type = [explicit_conversion_function] old_columnname
If you choose to drop a column or columns, all columns are retained except those you explicitly drop. If you chose to keep a column or columns, all columns are excluded except those you explicitly keep.
Each line is a new specification. If you specify multiple specifications each will be carried out sequentially.
Some type conversions IBM® DataStage® can carry out automatically, others need you to specify an explicit conversion function. Some conversions are not available.
The following table summarizes the availability, with the source fields shown vertically and the target fields shown horizontally. A value of "d" indicates automatic (default) conversion, "m" indicates that manual conversion is required, a blank square indicates that conversion is not possible:
int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | sfloat | dfloat | |
---|---|---|---|---|---|---|---|---|---|---|
int8 | d | d | d | d | d | d | d | d | d m | |
uint8 | d | d | d | d | d | d | d | d | d | |
int16 | d m | d | d | d | d | d | d | d | d | |
uint16 | d | d | d | d | d | d | d | d | d | |
int32 | d m | d | d | d | d | d | d | d | d | |
uint32 | d | d | d | d | d | d | d | d | d | |
int64 | d m | d | d | d | d | d | d | d | d | |
uint64 | d | d | d | d | d | d | d | d | d | |
sfloat | d m | d | d | d | d | d | d | d | d | |
dfloat | d m | d | d | d | d | d | d | d | d | |
decimal | d m | d | d | d | d m | d | d m | d m | d | d m |
string | d m | d | d m | d | d | d m | d | d | d | d m |
ustring | d m | d | d m | d | d | d m | d | d | d | d m |
raw | m | m | ||||||||
date | m | m | m | m | ||||||
time | m | m | m | |||||||
time stamp | m | m | m |
decimal | string | ustring | raw | date | time | timestamp | |
---|---|---|---|---|---|---|---|
int8 | d | d m | d m | m | m | m | |
uint8 | d | d | d | ||||
int16 | d | d m | d m | ||||
uint16 | d | d m | d m | ||||
int32 | d | d m | d m | m | m | ||
uint32 | d | m | m | m | |||
int64 | d | d | d | ||||
uint64 | d | d | d | ||||
sfloat | d | d | d | ||||
dfloat | d m | d m | d m | m | m | ||
decimal | d m | d m | |||||
string | d m | d | m | m | m | ||
ustring | d m | d | m | m | |||
raw | |||||||
date | m | m | m | ||||
time | m | m | d m | ||||
timestamp | m | m | m | m |
For a default type conversion, your specification would take the following form:
new_columnname:new_type = [explicit_conversion_function] old_columnname
For example, to produce an int8 column type:
int8col:int8 = uint64col
Where a manual conversion is required, your specification takes the form:
new_columnname:new_type = conversion_function (old_columnname)
For example:
day_column:int8 = month_day_from_date (date_column)
The new_type can be any of the destination types that are supported for conversions from the source (that is, any of the columns marked "m" in the preceding table). For example, you can use the conversion hours_from_time to convert a time to an int8, or to an int16, int32, dfloat, and so on. IBM DataStage warns you when it is performing an implicit data type conversion, for example hours_from_time expects to convert a time to an int8, and will warn you if converting to a int16, int32, or dfloat.
The following table lists the available conversion functions. The source and destinations are always specified in terms of column names. Preliminary arguments are enclosed in square brackets, the source column name is enclosed in round brackets.
Conversion | Arguments | Output type | Description | Example |
---|---|---|---|---|
date_from_days_since | [base_date (date)] (number_col (int32)) | date | Converts an integer field into a date by adding the integer to the specified base date. The
base_date must be in the format yyyy-mm-dd and must be either double quoted or a variable. |
date_col:date = date_from_days_since ["1958-08-18"] (int_col) |
date_from_julian_day | (juliandate_col (uint32)) | date | Date from Julian day. | date_col:date = date_from_julian_day (julian_col) |
date_from_string | [date_format] (string_col (string)) | date | Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. | date_col:date = date_from_string ["%yyyy-%mm-%dd"] (string_col) |
date_from_timestamp | (timestamp_col (timestamp) ) | date | Converts the timestamp to a date representation. | date_col:date = date_from_timestamp (ts_col) |
date_from_ustring | [date_format] (string_col (ustring)) | date | Converts the string to a date representation using the specified date_format. By default the string format is yyyy-mm-dd. | date_col:date = date_from_ustring (ustring_col, "%yyyy-%mm-%dd") |
days_since_from_date | [source_date (date)] (date_col (string)) | int32 | Returns a value corresponding to the number of days from source_date to the specified date. source_date must be in the form yyyy-mm-dd and can be quoted or unquoted. | dayssince_col:int32 = days_since_from_date ["1958-08-18"] (sourcedate_col,) |
decimal_from_decimal | [r_type] (source_decimal_col (decimal)) | decimal | Decimal from decimal. | decimal_col:decimal = decimal_from_decimal [ceil] (source_col) |
decimal_from_dfloat | [r_type] (source_dfloat_col (dfloat)) | decimal | Decimal from dfloat. | decimal_col:decimal = decimal_from_dfloat [ceil] (source_col) |
decimal_from_string | [r_type] (source_string_col (string)) | decimal | Decimal from string. | decimal_col:decimal = decimal_from_string [ceil] (source_col) |
decimal_from_ustring | [r_type] (source_ustring_col (ustring)) | decimal | Decimal from ustring. | decimal_col:decimal = decimal_from_ustring [ceil] (source_col) |
dfloat_from_decimal | [fix_zero] (source_dec_col (decimal)) | dfloat | Dfloat from decimal. | dfloat_col:dfloat = dfloat_from_decimal [fix_zero] (source_col) |
hours_from_time | (source_time_col (time)) | int8 | Hours from time. | hours_col:int8 = hours_from_time (time_col) |
int32_from_decimal | [r_type, fix_zero] (source_decimal_col (decimal)) | int32 | Int32 from decimal. | int32_col:int32 = int32_from_decimal [ceil] [fix_zero] (dec_col) |
int64_from_decimal | [r_type, fix_zero] (source_decimal_col (decimal)) | int64 | Int64 from decimal. | int64_col:int64 = int64_from_decimal [ceil] (dec_col) |
julian_day_from_date | (date_col (date)) | uint32 | Julian day from date. | julianday_col:uint32 = julian_day_from_from_date (date_col) |
lookup_string_from_int16 | [table_definition ], (number_col (int16)) | string | Converts numeric values to strings by means of a lookup table. | gendercol:string = lookup_string_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_ustring_from_int16 | [table_definition ] (number_col (int16)) | ustring | Converts numeric values to ustrings by means of a lookup table. | gendercol:ustring = lookup_ustring_from_int16 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_ustring_from_int32 | [table_definition ] (number_col (int32)) | ustring | Converts numeric values to ustrings by means of a lookup table.. | gendercol:ustring = lookup_string_from_int32 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_string_from_uint32 | [table_definition ] (number_col (uint32)) | string | Converts numeric values to strings by means of a lookup table. | gendercol:string = lookup_string_from_uint32 [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_int16_from_string | [table_definition ] (string_col (string)) | int16 | Converts strings to numeric values by means of a lookup table. | int_col:int16 = lookup_int16_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_int16_from_ustring | [table_definition ] (ustring_col (ustring)) | int16 | Converts strings to numeric values by means of a lookup table. | int_col:int16 = lookup_int16_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_uint32_from_string | [table_definition ] (string_col (string)) | uint32 | Converts strings to numeric values by means of a lookup table. | int_col:uint32 = lookup_uint32_from_string [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lookup_uint32_from_ustring | [table_definition ] (ustring_col (ustring)) | uint32 | Converts ustrings to numeric values by means of a lookup table. | int_col:uint32 = lookup_uint32_from_ustring [{default_value = 2} ('f' = 1; 'm' = 2)] (gendercode) |
lowercase_string | (instring_col (string)) | string | Convert strings to all lower case. Non-alphabetic characters are ignored in the conversion. | ostring_col:string = lowercase_string (istring_col) |
lowercase_ustring | (instring_col (ustring)) | string | Convert ustrings to all lower case. Non-alphabetic characters are ignored in the conversion. | ostring_col:ustring = lowercase_string (istring_col) |
mantissa_from_decimal | (decimal_col (decimal)) | dfloat | Returns the mantissa from the given decimal | matissa_col:dfloat = mantissa_from_decimal (dec_col) |
mantissa_from_dfloat | (dfloat_col (dfloat)) | dfloat | Returns the mantissa from the given dfloat | matissa_col:dfloat = mantissa_from_dfloat (dfloat_col) |
microseconds_from_time | (time_col (time)) | int32 | Returns the microseconds from a time field. | msec_col:int32 = microseconds_from_time (time_col) |
midnight_seconds_from_time | (time_col (time)) | dfloat | Returns the seconds-from-midnight from the supplied time. | midsec_col:dfloat = midnight_seconds_from_time (time_col) |
minutes_from_time | (time_col (time)) | int8 | Returns the minutes from a time field. | minsec_col:int8 = minutes_from_time (time_col) |
month_day_from_date | (date_col (date)) | int8 | Returns the day of month from a date field. | monthday_col:int8 = month_day_from_date (date_col) |
month_from_date | (date_col (date)) | int8 | Returns the numeric month from a date field. | month_col:int8 = month_from_date (date_col) |
next_weekday_from_date | [day] (date_col (date)) | date | Returns the date of the specified day of the week soonest after the source date (including the source date). day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. | nextday_col:date = next_weekday_from_date [wed](date_col) |
notnull | (any) | int8 | Returns true (1) when an expression does not evaluate to the null value. | isnotnull_col:int8 = notnull (test_col) |
null | (any) | int8 | Returns true (1) when an expression does evaluate to the null value | isnull_col:int8 = null (test_col) |
previous_weekday_from_date | [day] (date_col (date)) | date | The destination contains the closest date for the specified day of the week earlier than the source date (including the source date). The day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted. | prevday_col:date = previous_weekday_from_date [wed](date_col) |
raw_from_string | (string_col (string)) | raw | Returns a string in raw representation. | raw_col:raw = raw_from_string (string_col) |
raw_length | (raw_col (raw)) | int32 | Returns the length of a raw field. | rawlength_col:int32 = raw_length (raw_col) |
seconds_from_time | (time_col (time)) | dfloat | Returns the seconds from a time field. | sec_col:dfloat = seconds_from_time (time_col) |
seconds_since_from_timestamp | (timestamp_col (timestamp)) | dfloat | Seconds since the time given by timestamp. | secsince_col:dfloat = seconds_since_from_timestamp (timestamp_col) |
string_from_date | [date_format] (date_col (date)) | string | Converts the date to a string representation using the specified date_format. | datestring_col:string = string_from_date [%dd-%mm-%yyyy] (date_col) |
string_from_decimal | [fix_zero] (decimal_col (decimal) | string | Returns a string from a decimal. | string_col:string = string_from_decimal [fix_zero] (dec_col) |
string_from_time | [time_format] (time_col (time)) | string | Converts the time to a string representation using the specified time_format. The default time format is %hh:%nn:%ss. | timestring_col:string = string_from_time [%hh:%nn:%ss.] (time_col) |
string_from_timestamp | [timestamp_format] (timestamp_col (timestamp)) | Converts the timestamp to a string representation using the specified timestamp_format. The default timestamp format is %yyyy-%mm-%dd. %hh:%nn:%ss. | stringtimestamp_col:string = string_from_timestamp [%yyyy-%mm-%dd. %hh:%nn:%ss.] (timestamp_col) | |
string_from_ustring | (string_col (ustring)) | string | Returns a string from a ustring. | string_col:string = string_from_ustring (ustring_col) |
string_length | (string_col (string)) | int32 | Returns an int32 containing the length of a string. | length_col:int32 = string_length (string_col) |
substring | [startPosition,len] (string_col (string)) | string | Converts long strings to shorter strings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. | shorstring_col:string = substring [5,10] (longstring_col) |
time_from_midnight_seconds | (dfloat_col (dfloat)) | time | Returns a time from aseconds-from-midnight field. | time_col:time = time_from_midnight_seconds (dfloat_col) |
time_from_string | [time_format] (string_col (string)) | time | Converts the string to a time representation using the specified time_format. The default time format is %hh:%nn:%ss. | time_col:time = time_from_string [%hh:%nn:%ss.] (string_col) |
time_from_timestamp | (timestamp_col (timestamp)) | time | Time from timestamp. | time_col:time = time_from_timestamp (timestamp_col) |
time_from_ustring | (string_col (ustring)) | time | Returns a time from a ustring. | time_col:time = time_from_ustring (string_col) |
timestamp_from_date | [time](date_col (date)) | time stamp | Timestamp from date. The time argument optionally specifies the time to be used in building the timestamp result and must be in the form hh:nn:ss. If omitted, the time defaults to midnight. | timestamp_col:timestamp = timestamp_from_date [08:20:33] (date_col) |
timestamp_from_seconds_since | (secondssince_col (dfloat)) | time stamp | Timestamp from a seconds since value. | timestamp_col:timestamp = timestamp_from_seconds_since (secondssince_col) |
timestamp_from_string | [timestamp_format] (string_col (string)) | time stamp | Converts the string to a timestamp representation using the specified timestamp_format. By default, the string format is %yyyy-%mm-%dd hh:nn:ss. | timestamp_col:timestamp = timestamp_from_string [%yyyy-%mm-%dd hh:nn:ss] (string_col) |
timestamp_from_time | [date](time_col (time)) | time stamp | Timestamp from time. The date argument is required. It specifies the date portion of the timestamp and must be in the form yyyy-mm-dd. | timestamp_col:timestamp = timestamp_from_time [1958-08-18] (time_col) |
timestamp_from_timet | (timet_col (int32)) | time stamp | Timestamp from time_t. The source field must contain a timestamp as defined by the UNIX time_t representation. | timestamp_col:timestamp = timestamp_from_timet (timet_col) |
timestamp_from_ustring | (string_col (ustring)) | time stamp | Returns a timestamp from a ustring. | timestamp_col:timestamp = timestamp_from_ustring (string_col) |
timet_from_timestamp | (tstamp_col (timestamp)) | int32 | Time_t from timestamp. The destination column contains a timestamp as defined by the UNIX time_t representation. | timet_col:int32 = timet_from_timestamp (tstamp_col) |
uint64_from_decimal | [r_type, fix_zero] (dec_col (decimal)) | uint64 | Uint64 from decimal. | int_col:uint64 = uint64_from_decimal [ceil, fix_zero] (dec_col) |
uppercase_string | (string_col (string)) | string | Convert strings to all upper case. Non-alphabetic characters are ignored in the conversion. | string_col:string = uppercase_string (instring_col) |
uppercase_ustring | (string_col (ustring)) | ustring | Convert ustrings to all upper case. Non-alphabetic characters are ignored in the conversion. | ustring_col:string = uppercase_ustring (string_col) |
u_raw_from_string | (string_col (ustring)) | raw | Returns a raw from a ustring | raw_col:raw = u_raw_from_string (string_col) |
ustring_from_date | (date_col (date)) | ustring | Returns a ustring from a date. | string_col:ustring = ustring_from_date (date_col) |
ustring_from_decimal | (dec_col (decimal)) | ustring | Returns a ustring from a decimal. | string_col:ustring = ustring_from_decimal (dec_col) |
ustring_from_string | (string_col (string)) | ustring | Returns a ustring from a string. | string_col:ustring = ustring_from_string (string_col) |
ustring_from_time | (time_col (time)) | ustring | Returns a ustring from a time. | string_col:ustring = ustring_from_time (time_col) |
ustring_from_timestamp | (timestamp_col (timestamp)) | ustring | Returns a ustring from a timestamp. | string_col:ustring = ustring_from_timestamp (timestamp_col) |
ustring_length | (string_col (ustring)) | int32 | Returns the length of a ustring. | length_col:int32 = ustring_length (string_col) |
u_substring | [startPosition,len] (string_col (string)) | ustring | Converts long ustrings to shorter ustrings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. | shorstring_col:ustring = substring [5,10] (longstring_col) |
weekday_from_date | [originDay] (date_col (date)) | int8 | Day of week from date. originDay is a string specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is defined as day zero. The originDay can be either single- or double-quoted or the quotes can be omitted. | dow_int:int8 = [mon] (date_col) |
year_day_from_date | (date_col (date)) | int16 | Day of year from date (returned value 1-366). | doy_col:int16 = year_day_from_date (date_col) |
year_from_date | (date_col (date)) | int16 | Year from date. | year_col:int16 = year_from_date (date_col) |
year_week_from_date | (date_col (date)) | int8 | Week of year from date. | week_col:int8 = year_week_from_date (date_col) |
table_definition defines the rows of a string lookup table and has the following form:
{propertyList} ('string' = value; 'string' = value; ... )
where:
- propertyList is one or more of the following options; the entire list is
enclosed in braces and properties are separated by commas if there are more than one:
- case_sensitive. Perform a case-sensitive search for matching strings; the default is case-insensitive.
- default_value = defVal. The default numeric value returned for a string that does not match any of the strings in the table.
- default_string = defString. The default string returned for numeric values that do not match any numeric value in the table.
- string specifies a comma-separated list of strings associated with value; enclose each string in quotes.
- value specifies a comma-separated list of 16-bit integer values associated with string.
date_format is the standard date formatting string
R_type is a string representing the rounding type and should contain one of the following:
- ceil. Round the source field toward positive infinity. E.g, 1.4 -> 2, -1.6 -> -1.
- floor. Round the source field toward negative infinity. E.g, 1.6 -> 1, -1.4 -> -2.
- round_inf. Round or truncate the source field toward the nearest representable value, breaking ties by rounding positive values toward positive infinity and negative values toward negative infinity. E.g, 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2.
- trunc_zero. Discard any fractional digits to the right of the rightmost fractional digit supported in the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. E.g, 1.6 -> 1, -1.6 -> -1.
You can specify fix_zero for decimal source columns so that columns containing all zeros (by default illegal) are treated as a valid decimal with a value of zero.
Advanced
This section allows you to specify the following:
- Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
- Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
- Preserve partitioning. This is Set by default. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.