This section defines functions that are provided by the Transformation Language. It is presented in a series of tables that deal with data transformation functions of the following types:
When a function generates an output value, it returns the result. For functions with optional arguments, simply omit the optional argument to accept the default value. Default conversions among integer, float, decimal, and numeric string types are supported in the input arguments and the return value of the function. All integers can be signed or unsigned.
The transform operator has default NULL handling at the record-level with individual field "overrides". Options can be entered at the record level or the field level.
Function | Description |
---|---|
lookup( lookup_table ) | Performs a lookup on the table using the current input record. It fills the current record of the lookup table with the first record found. If a match is not found, the current record is empty. If this is called multiple times on the same record, the record is filled with the current match if there is one and a new lookup will not be done. |
next_match( lookup_table ) | Gets the next record matched in the lookup and puts it into the current record of the table. |
clear_lookup( lookup_table ) | Checks to see if the current lookup record has a match. If this method returns false directly after the lookup() call, no matches were found in the table. Returns a boolean value specifying whether the record is empty or not. |
int8 is_match( lookup_table ) | Checks to see if the current lookup record has a match. If this method returns false directly after the lookup() call, no matches were found in the table. Returns a boolean value specifying whether the record is empty or not. |
date field functions
InfoSphere® DataStage® performs no automatic type conversion of date fields. Either an input data set must match the operator interface or you must effect a type conversion by means of the transform or modify operator.
A date conversion to or from a numeric field can be specified with any InfoSphere DataStage numeric data type. InfoSphere DataStage performs the necessary modifications and either translates a numeric field to the source data type or translates a conversion result to the numeric data type of the destination. For example, you can use the transformation function month_day_from_date() to convert a date to an int8, or to an int16, int32, dfloat, and so on
date format
The default format of the date contained in the string is yyyy-mm-dd. However, you can specify an optional format string that defines another format. The format string requires that you provide enough information for InfoSphere DataStage to determine a complete date (either day, month, and year, or year and day of year).
The format_string can contain one or a combination of the following elements:
Tag | Variable width availability | Description | Value range | Options |
---|---|---|---|---|
%d | import | Day of month, variable width | 1...31 | s |
%dd | Day of month, fixed width | 01...31 | s | |
%ddd | with v option | Day of year | 1...366 | s, v |
%m | import | Month of year, variable width | 1...12 | s |
%mm | Month of year, fixed width | 01...12 | s | |
%mmm | Month of year, short name, locale specific | Jan, Feb ... | t, u, w | |
%mmmm | import/export | Month of year, full name, locale specific | January, February ... | t, u, w, -N, +N |
%yy | Year of century | 00...99 | s | |
%yyyy | Four digit year | 0001 ...9999 | ||
%NNNNyy | Cutoff year plus year of century | yy = 00...99 | s | |
%e | Day of week, Sunday = day 1 | 1...7 | ||
%E | Day of week, Monday = day 1 | 1...7 | ||
%eee | Weekday short name, locale specific | Sun, Mon ... | t, u, w | |
%eeee | import/export | Weekday long name, locale specific | Sunday, Monday ... | t, u, w, -N, +N |
%W | import | Week of year (ISO 8601, Mon) | 1...53 | s |
%WW | Week of year (ISO 8601, Mon) | 01...53 | s |
When you specify a date format string, prefix each component with the percent symbol (%) and separate the string's components with a suitable literal character.
The default date_format is %yyyy-%mm-%dd.
Where indicated the tags can represent variable-width data elements. Variable-width date elements can omit leading zeroes without causing errors.
The following options can be used in the format string where indicated in the table:
%dd-%(mmmm,-5)-%yyyyy
21-Augus-2006
%dd-%(mmmm,-10)-%yyyyy
21-August -2005
%dd-%(mmmm,+10)-%yyyyy
21- August-2005
The locale for determining the setting of the day and month names can be controlled through the locale tag. This has the format:
%(L,'locale')
Where locale specifies the locale to be set using the language_COUNTRY.variant naming convention supported by ICU. See IBM® InfoSphere DataStage and QualityStage® Globalization Guide for a list of locales. The default locale for month names and weekday names markers is English unless overridden by a %L tag or the APT_IMPEXP_LOCALE environment variable (the tag takes precedence over the environment variable if both are set).
Use the locale tag in conjunction with your time format, for example the format string:
%(L,'es')%eeee, %dd %mmmm %yyyySpecifies the Spanish locale and would result in a date with the following format:
miércoles, 21 septembre 2005
Element | Numeric format tags | Text format tags | Requires | Incompatible with |
---|---|---|---|---|
year | %yyyy, %yy, %[nnnn]yy | - | - | - |
month | %mm, %m | %mmm, %mmmm | year | week of year |
day of month | %dd, %d | - | month | day of week, week of year |
day of year | %ddd | year | day of month, day of week, week of year | |
day of week | %e, %E | %eee, %eeee | month, week of year | day of year |
week of year | %WW | year | month, day of month, day of year |
When a numeric variable-width input tag such as %d or %m is used, the field to the immediate right of the tag (if any) in the format string cannot be either a numeric tag, or a literal substring that starts with a digit. For example, all of the following format strings are invalid because of this restriction:
%d%m-%yyyy
%d%mm-%yyyy
%(d)%(mm)-%yyyy
%h00 hours
The year_cutoff is the year defining the beginning of the century in which all two-digit years fall. By default, the year cutoff is 1900; therefore, a two-digit year of 97 represents 1997.
You can specify any four-digit year as the year cutoff. All two-digit years then specify the next possible year ending in the specified two digits that is the same or greater than the cutoff. For example, if you set the year cutoff to 1930, the two-digit year 30 corresponds to 1930, and the two-digit year 29 corresponds to 2029.
On import and export, the year_cutoff is the base year.
This property is mutually exclusive with days_since, text, and julian.
Tag | Escape sequence |
---|---|
%% | literal percent sign |
\% | literal percent sign |
\n | newline |
\t | horizontal tab |
\\ | single backslash |
date Uformat
The date uformat provides support for international components in date fields. It's syntax is:
String%macroString%macroString%macroString
where %macro is a date formatting macro such as %mmm for a 3-character English month. Only the String components of date uformat can include multi-byte Unicode characters.
Function | Description |
---|---|
date date_from_days_since ( int32 , " date " | format_variable ) | Returns date by adding the given integer to the baseline date. Converts an integer field into a date by adding the integer to the specified base date. The date must be in the format yyyy-mm-dd and must be either double quoted or a variable. |
date date_from_julian_day( uint32 ) | Returns the date given a Julian day. |
date date_from_string ( string , " date_format " | date_uformat | format_variable ) | Returns a date from the given string formatted
in the optional format specification. By default the string format is yyyy-mm-dd. For format descriptions, see and "date Uformat" . |
date date_from_ustring ( string , " date_format " | date_uformat | format_variable ) | Returns a date from the given ustring formatted
in the optional format specification. By default the ustring format is yyyy-mm-dd. For format descriptions, see and "date Uformat" . |
string string_from_date ( date , " date_format " | date_uformat ) | Converts the date to a string representation
using the given format specification. By default the ustring format is yyyy-mm-dd. For format descriptions, see and "date Uformat" . |
ustring ustring_from_date ( date , " date_format " | date_uformat ) | Converts the date to a ustring representation using the given
format specification. By default the ustring format is yyyy - mm -dd. For format descriptions, see and "date Uformat" . |
date date_from_timestamp( timestamp ) | Returns the date from the given timestamp . |
int32 days_since_from_date ( date , " source_date " | format_variable ) | Returns a value corresponding to the number of days from source_date to date . source_date must be in the form yyyy - mm -dd and must be double quoted or be a variable. |
uint32 julian_day_from_date( date ) | Returns a Julian date given the date . |
int8 month_day_from_date( date ) | Returns the day of the month given the date . For example, the date 07-23-2001 returns 23. |
int8 month_from_date( date ) | Returns the month from the given date . For example, the date 07-23-2001 returns 7. |
date next_weekday_from_date ( date , " day " | format_variable ) | The value returned is the date of the specified day of the
week soonest after date (including the date ). The day argument is optional. It is a string or variable specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. By default, the value is Sunday. |
date previous_weekday_from_date ( date , " day " | format_variable ) | Returns the previous weekday date from date . The destination contains the closest date for the specified day of the week earlier than the source date (including the source date) The day argument is optional. It is a string or variable specifying a day of the week. You can specify day using either the first three characters of the day name or the full day name. By default, the value is Sunday. |
int8 weekday_from_date ( date , " origin_day " | format_variable ) | Returns the day of the week from date . The optional argument origin_day is a string or variable specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is day zero. |
int16 year_day_from_date( date ) | Returns the day of the year (1-366) from date . |
int16 year_from_date( date ) | Returns the year from date . For example, the date 07-23-2001 returns 2001. |
int8 year_week_from_date( date ) | Returns the week of the year from date . For example, the date 07-23-2001 returns 30. |
decimal and float Field Functions
You can do the following transformations using the decimal and float field functions.
Examples: 1.4 -> 2, -1.6 -> -1
Examples: 1.6 -> 1, -1.4 -> -2
Examples: 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2
Examples: 1.6 -> 1, -1.6 -> -1
Function | Description |
---|---|
decimal decimal_from_decimal ( decimal , " r_type " | format_variable ) | Returns decimal in decimal representation, changing the precision and scale according to the returned type. The rounding type, r_type, might be ceil, floor, round_inf, or trunc_zero as described above this table. The default rtype is trunc_zero. |
decimal decimal_from_dfloat ( dfloat , " r_type " | format_variable ) | Returns dfloat in decimal representation. The rounding type, r_type , might be ceil, floor, round_inf, or trunc_zero as described above this table. The default is trunc_zero. |
decimal decimal_from_string ( string , " r_type " | format_variable ) | Returns string in decimal representation. The rounding type, r_type , might be ceil, floor, round_inf, or trunc_zero as described above this table. The default is trunc_zero. |
decimal decimal_from_ustring ( ustring , " r_type " | format_variable ) | Returns ustring in decimal representation. The rounding type, r_type , might be ceil, floor, round_inf, or trunc_zero as described above this table. The default is trunc_zero. |
dfloat dfloat_from_decimal ( decimal , "fix-zero" | format_variable ) | Returns decimal in dfloat representation. |
int32 int32_from_decimal ( decimal , "r_type fix_zero") | Returns int32 in decimal representation. The rounding type, r_type , might be ceil, floor, round_inf, or trunc_zero as described above this table. The default is trunc_zero. |
int64 int64_from_decimal ( decimal , "r_type fix_zero") |
Returns int64 in decimal representation. The rounding type, r_type , might be ceil, floor, round_inf, or trunc_zero as described above this table. The default is trunc_zero. |
uint64 uint64_from_decimal ( decimal , "r_type fix_zero") | Returns uint64 in decimal representation. The rounding type, r_type , might be ceil, floor, round_inf, or trunc_zero as described above this table. The default is trunc_zero. |
string string_from_decimal ( decimal , "fix_zero suppress_zero") | Returns string in decimal representation. fix_zero causes a decimal field containing all zeros to be treated as a valid zero. suppress_zero argument specifies that the returned ustring value will have no leading or trailing zeros. Examples: 000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1 |
ustring ustring_from_decimal ( decimal , "fix_zero suppress_zero" | format_variable ) | Returns ustring in decimal representation. fix_zero causes a decimal field containing all zeros to be treated as a valid zero. suppress_zero argument specifies that the returned ustring value will have no leading or trailing zeros. Examples: 000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1 |
string string_from_decimal ( decimal , "fix_zero suppress_zero" | format_variable ) | Returns ustring in decimal representation. fix_zero causes a decimal field containing all zeros to be treated as a valid zero. suppress_zero argument specifies that the returned ustring value will have no leading or trailing zeros. Examples: 000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1 |
dfloat mantissa_from_dfloat( dfloat ) | Returns the mantissa (the digits right of the decimal point) from dfloat . |
dfloat mantissa_from_decimal ( decimal ) | Returns the mantissa (the digits right of the decimal point) from decimal . |
raw Field Functions
Use the raw field functions to transform a string into a raw data type and to determine the length of a raw value.
Function | Description |
---|---|
raw raw_from_string( string ) | Returns string in raw representation. |
raw u_raw_from_string( ustring ) | Returns ustring in raw representation. |
int32 raw_length( raw ) | Returns the length of the raw field. |
InfoSphere DataStage performs no automatic conversions to or from the time and timestamp data types. You must use the modify or transform operator if you want to convert a source or destination field. Most field conversions extract a portion of the time, such as hours or minutes, and write it into a destination field.
Time conversion to a numeric field can be used with any InfoSphere DataStage numeric data type. InfoSphere DataStage performs the necessary modifications to translate a conversion result to the numeric data type of the destination. For example, you can use the transformation function hours_from_time() to convert a time to an int8, or to an int16, int32, dfloat, and so on.
The string_from_time() and time_from_string() conversion functions take a format as a parameter of the conversion. The default format of the time in the string is hh:nn:ss. However, you can specify an optional format string defining another time format. The format string must contain a specification for hours, minutes, and seconds.
time Uformat
The time uformat provides support for international components in time fields. It's syntax is:
String % macroString % macroString % macroString
where %macro is a time formatting macro such as %hh for a two-digit hour. See below for a description of the date format macros. Only the String components of time uformat can include multi-byte Unicode characters.
timestamp Uformat
This format is a concatenation of the date uformat and time uformat which are described in "date Uformat" and "time Uformat". The order of the formats does not matter, but the two formats cannot be mixed.
time Format
Tag | Variable width availability | Description | Value range | Options |
---|---|---|---|---|
%h | import | Hour (24), variable width | 0...23 | s |
%hh | Hour (24), fixed width | 0...23 | s | |
%H | import | Hour (12), variable width | 1...12 | s |
%HH | Hour (12), fixed width | 01...12 | s | |
%n | import | Minutes, variable width | 0...59 | s |
%nn | Minutes, fixed width | 0...59 | s | |
%s | import | Seconds, variable width | 0...59 | s |
%ss | Seconds, fixed width | 0...59 | s | |
%s.N | import | Seconds + fraction (N = 0...6) | – | s, c, C |
%ss.N | Seconds + fraction (N = 0...6) | – | s, c, C | |
%SSS | with v option | Milliseconds | 0...999 | s, v |
%SSSSSS | with v option | Microseconds | 0...999999 | s, v |
%aa | German | am/pm marker, locale specific | am, pm | u, w |
By default, the format of the time contained in the string is %hh:%nn:%ss. However, you can specify a format string defining the format of the string field.
You must prefix each component of the format string with the percent symbol. Separate the string's components with any character except the percent sign (%).
Where indicated the tags can represent variable-fields on import, export, or both. Variable-width date elements can omit leading zeroes without causing errors.
The locale for determining the setting of the am/pm string and the default decimal separator can be controlled through the locale tag. This has the format:
%(L,'locale')
Where locale specifies the locale to be set using the language_COUNTRY.variant naming convention supported by ICU. See IBM InfoSphere DataStage and QualityStage Globalization Guide for a list of locales. The default locale for am/pm string and separators markers is English unless overridden by a %L tag or the APT_IMPEXP_LOCALE environment variable (the tag takes precedence over the environment variable if both are set).
Use the locale tag in conjunction with your time format, for example:
%L('es')%HH:%nn %aaSpecifies the Spanish locale.
Element | Numeric format tags | Text format tags | Requires | Incompatible with |
---|---|---|---|---|
hour | %hh, %h, %HH, %H | - | - | - |
am/pm marker | - | %aa | hour (%HH) | hour (%hh) |
minute | %nn, %n | - | - | - |
second | %ss, %s | - | - | - |
fraction of a second | %ss.N, %s.N, %SSS, %SSSSSS | - | - | - |
Tag | Escape sequence |
---|---|
%% | literal percent sign |
\% | literal percent sign |
\n | newline |
\t | horizontal tab |
\\ | single backslash |
Function | Description |
---|---|
int8 hours_from_time( time ) | Returns the hour portion of the given time. |
int32 microseconds_from_time( time ) | Returns the number of microseconds from the given time. |
dfloat midnight_seconds_from_time ( time ) | Returns the number of seconds from midnight to time . |
int8 minutes_from_time( time ) | Returns the number of minutes from time . |
dfloat seconds_from_time( time ) | Returns the number of seconds from time. |
dfloat seconds_since_from_timestamp ( timestamp , " source_timestamp_string " | format_variable ) | Returns the number of seconds from timestamp to the base timestamp, or optionally the second timestamp argument for the number of seconds between timestamps. The source_timestamp_string argument must be double quoted or be a variable. |
time time_from_midnight_seconds( dfloat ) | Returns the time given the number of seconds ( dfloat ) since midnight. |
time time_from_string ( string , time_format | time_uformat | format_variable) | Returns a time representation of string using the optional time_format , time_uformat , or format_variable. By default, the time format is hh:nn:ss. For format descriptions, see and "time Uformat" . |
time time_from_ustring ( ustring , time_format | time_uformat | format_variable) | Returns a time representation of ustring using the optional time_format , time_uformat , or format_variable specification. By default, the time format is hh:nn:ss. For format descriptions, see and "time Uformat" . |
string string_from_time ( time , " time_format " | format_variable | time_uformat ) | Returns a string from time. The format argument is optional.The default time format is hh:nn:ss. For format descriptions, see and "time Uformat" . |
string string_from_time ( time , " time_format " | format_variable | time_format ) | Returns a ustring from time . The format argument is optional.The default time format is hh:nn:ss. For format descriptions, see and "time Uformat" . |
time time_from_timestamp( timestamp ) | Returns the time from timestamp . |
date date_from_timestamp( timestamp ) | Returns the date from the given timestamp . |
timestamp timestamp_from_date_time ( date , time ) | Returns a timestamp from date and time . The date specifies the date portion ( yyyy - nn - dd ) of the timestamp. The time argument specifies the time to be used when building the timestamp. The time argument must be in the hh : nn :ss format. |
timestamp timestamp_from_seconds_since ( dfloat , " original_timestamp_string " | format_variable ) | Returns the timestamp from the number of seconds ( dfloat ) from the base timestamp or the original_timestamp_string argument. The original_timestamp_string must be double quoted or be a variable. |
timestamp timestamp_from_string ( string , " timestamp_format " | timestamp_uformat | format_variable) | Returns a timestamp from string, in the optional timestamp_format , timestamp_uformat, or format_variable . The timestamp_format must be double quoted or be a variable. The default format is yyyy - nn - dd hh : nn : ss . timestamp_format is described in . |
timestamp timestamp_from_ustring ( ustring , " timestamp_format " | timestamp_uformat | format_variable) | Returns a timestamp from ustring , in the optional format specification. The timestamp_format must be a double quoted string, a uformat , or a variable. The default format is yyyy - nn - dd hh : nn : ss . timestamp_uformat is described in . |
string string_from_timestamp ( timestamp , " timestamp_format " | format_variable ) | Returns a string from timestamp . The formatting specification is optional. The default format is yyyy - mm - dd hh : mm : ss . |
ustring ustring_from_timestamp ( timestamp , " timestamp_format " | format_variable ) | Returns a ustring from timestamp . The formatting specification is optional. The default format is yyyy - mm - dd hh : mm : ss . |
timestamp timestamp_from_time ( time , time_format | time_uformat ) | Returns a timestamp from time . For format descriptions, see and "time Uformat" |
date date_from_timestamp( timestamp ) | Returns the date from the given timestamp . |
timestamp timestamp_from_timet( int32 ) | Returns a timestamp from the given UNIX time_t representation ( int32 ). |
int32 timet_from_timestamp( timestamp ) | Returns the UNIX time_t representation of timestamp. |
null handling functions
lists the transformation functions for NULL handling.
All data types support nulls. As part of processing a record, an operator can detect a null and take the appropriate action, for example, it can omit the null field from a calculation or signal an error condition.
InfoSphere DataStage represents nulls in two ways.
The null-handling functions can change a null representation from an out-of-band null to an in-band null and from an in-band null to an out-of-band null.
Function | Description |
---|---|
destination_field handle_null ( source_field , value ) | Change the source_field NULL representations from out-of-band representation to an in-band representation. The value field assigns the value that corresponds to NULL. |
destination_field make_null ( source_field , value ) | Changes source_field NULL representation from in-band NULL representation to out-of-band. The value field allows multiple valid NULL values to be inputted as arguments. |
int8 notnull( source_field ) | Returns 1 if source_field is not NULL, otherwise returns 0. |
int8 null( source_field ) | Returns 1 if source_field is NULL, otherwise returns 0. |
set_null() | This function is used with "=" to set the left side output
field, when it is nullable, to null. For example: a-field = set_null(); |
int8 is_dfloat_inband_null ( dfloat ) | Returns 1 if dfloat is an inband null; otherwise it returns 0. |
int8 is_int16_inband_null ( int16 ) | Returns 1 if int16 is an inband null; otherwise it returns 0. |
int8 is_int32_inband_null ( int32 ) | Returns 1 if int32 is an inband null; otherwise it returns 0. |
int8 is_int64_inband_null ( int64 ) | Returns 1 if int64 is an inband null; otherwise it returns 0. |
int8 is_sfloat_inband_null ( sfloat ) | Returns 1 if sfloat is an inband null; otherwise it returns 0. |
int8 is_string_inband_null ( string ) | Returns 1 if string is an inband null; otherwise it returns 0. |
int8 u_is_string_inband_null ( ustring ) | Returns 1 if ustring is an inband null; otherwise it returns 0. |
Function |
Description |
---|---|
int32 abs( int32 ) | Returns the absolute value of int32 . |
dfloat acos( dfloat ) | Returns the principal value of the arc cosine of dfloat . |
dfloat asin( dfloat ) | Returns the principal value of the arc sine of dfloat . |
dfloat atan( dfloat ) | Returns the principal value of the arc tangent of dfloat. |
dfloat atan2( dfloat , dfloat ) | Returns the principal value of the arc tangent of y/x (where y is the first argument). |
dfloat ceil( decimal ) | Returns the smallest dfloat value greater than or equal to decimal . |
dfloat cos( dfloat ) | Returns the cosine of the given angle ( dfloat ) expressed in radians. |
dfloat cosh( dfloat ) | Returns the hyperbolic cosine of dfloat . |
dfloat exp( dfloat ) | Returns the exponential of dfloat. |
dfloat fabs( dfloat ) | Returns the absolute value of dfloat . |
dfloat floor( decimal ) | Returns the largest dfloat value less than or equal to decimal . |
dfloat ldexp( dfloat , int32 ) | Reconstructs dfloat out of the mantissa and exponent of int32 . |
uint64 llabs( int64 ) | Returns the absolute value of int64 . |
dfloat log( dfloat ) | Returns the natural (base e) logarithm of dfloat . |
dfloat log10( dfloat ) | Returns the logarithm to the base 10 of dfloat . |
int32 max( int32 , int32 ) | Returns the larger of the two integers. |
int32 min( int32 , int32 ) | Returns the smaller of the two integers. |
dfloat pow( dfloat , dfloat ) | Returns the result of raising x (the first argument) to the power y (the second argument). |
uint32 rand() | Returns a pseudo-random integer between 0 and 232 - 1. The function uses a multiplicative congruential random-number generator with period 232. See the UNIX man page for rand for more details. |
uint32 random() | Returns a random integer between 0 and 231 - 1. The function uses a nonlinear additive feedback random-number generator employing a default state array size of 31 long integers to return successive pseudo-random numbers. The period of this random-number generator is approximately 16 x (231 - 1). Compared with rand, random is slower but more random. See the UNIX man page for random for more details. |
dfloat sin( dfloat ) | Returns the sine of dfloat expressed in radians. |
dfloat sinh( dfloat ) | Returns the hyperbolic sine of dfloat. |
dfloat sqrt( dfloat ) | Returns the square root of dfloat . |
int32 quotient_from_dfloat ( dfloat1 , dfloat2) | Returns the value of the quotient after dfloat1 is divided by dfloat2 . |
srand( uint32 ) | Sets a new seed ( uint32 ) for the frand() or srand() random number generator. |
srandom( uint32 ) | Sets a random seed for the random() number generator. See the UNIX man page for srandom for more details. |
dfloat tan( dfloat ) | Returns the tangent of the given angle ( dfloat ) expressed in radians. |
dfloat tanh( dfloat ) | Returns the hyperbolic tangent of dfloat . |
Strings can be assigned (=), compared (==, <, >=, and so on), and concatenated (+) in the Transformation Language. In addition, the functions described in below are available for string manipulations, and the functions described in are available for ustring manipulations. When a long string is assigned to a short string, the long string is truncated to the length of the short string. The term white space refers to spaces, tabs, and any other blank space.
String Conversions and Lookup Tables
You can construct a string lookup table to use when default conversions do not yield satisfactory results. A string lookup table is a table of two columns and as many rows as are required to perform a conversion to or from a string as shown in the following table.
Numeric Value | String or Ustring |
---|---|
numVal1 | string1 | ustring1 |
numVal2 | string2 | ustring1 |
... | ... |
numVal3 | stringn | ustringn |
Each row of the lookup table specifies an association between a 16-bit integer or unsigned 32-bit integer value and a string or ustring. InfoSphere DataStage scans the Numeric Value or the String or Ustring column until it encounters the value or string to be translated. The output is the corresponding entry in the row.
The numeric value to be converted might be of the int16 or the uint32 data type. InfoSphere DataStage converts strings to values of the int16 or uint32 data type using the same table.
If the input contains a numeric value or string that is not listed in the table, InfoSphere DataStage operates as follows:
A table definition defines the rows of a string or ustring lookup table and has the following form:
{propertyList} ('string' | 'ustring' = value; 'string' | 'ustring'= value; ... )
where:
propertyList is one or more of the following options; the entire list is enclosed in braces and properties are separated by commas if there are more than one:
Function | Description |
---|---|
int8 is_alnum( string ) | Returns 1 true if string consists entirely of alphanumeric characters. |
int8 is_alpha( string ) | Returns 1 true if string consists entirely of alphabetic characters. |
int8 is_numeric( string ) | Returns 1 true if string consists entirely of numeric characters, including decimal and sign. |
int8 is_valid (" type_string ", " value_string ") | Returns 1 (true) if value_string is valid
according to type_string , including NULL. The type_string argument
is required. It must specify an InfoSphere DataStage schema
data type. Integer types are checked to ensure the value_string is numeric (signed or unsigned), a whole number, and a valid value (for example, 1024 can not be assigned to an int8 type). Decimal types are checked to ensure the value_string is numeric (signed or unsigned) and a valid value. Float types are checked to ensure the value_string is numeric (signed or unsigned) and a valid value (exponent is valid). String is always valid with the NULL exception below. For all types, if the field cannot be set to NULL and the string is NULL, 0 (false) is returned. Date, time, and timestamp types are checked to ensure they are correct, using the optional format argument, and valid values. Raw cannot be checked since the input is a string. |
int16 lookup_int16_from_string ( string , " table_definition " | table_variable ) | Returns an integer corresponding to string using table_definition string or variable. See "String Conversions and Lookup Tables" for more information. |
string lookup_string_from_int16 ( int16 , " table_definition " | table_variable ) | Returns a string corresponding to int16 using table_definition string or variable. See "String Conversions and Lookup Tables" for more information. |
string lookup_string_from_uint32 ( uint32 , " table_definition " | table_variable ) | Returns a string corresponding to uint32 using table_definition string or variable. See "String Conversions and Lookup Tables" for more information. |
uint32 lookup_uint32_from_ string ( string , " table_definition " | table_variable ) | Returns an unsigned integer from string using table_definition string or variable. |
string lower_case( string ) | Converts string to lowercase. Non-alphabetic characters are ignored in the transformation. |
string string_from_date ( date , " date_format " | format_variable | date_uformat) | Converts date to a string representation
using the specified optional formatting specification. By default, the date format is yyyy - mm - dd . For format descriptions, see "date Uformat" . |
string string_from_decimal ( decimal , "fix_zero suppress_zero" | format_variable ) | Returns a string from decimal . fix_zero causes a decimal field containing all zeros to be treated as a valid zero. suppress_zero argument specifies that the returned ustring value will have no leading or trailing zeros. Examples: 000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1 The formatting specification is optional. |
string string_from_time ( time ," time_format " | format_variable | time_uformat) | Returns a string from time . The format argument is optional.The default time format is hh : nn :ss. For format descriptions, see and "time Uformat" . |
string string_from_timestamp ( timestamp , " timestamp_format " | format_variable ) | Returns a string from timestamp . The formatting specification is optional. The default format is yyyy-mm-dd hh:mm:ss. |
string soundex ( input_string , length , censusOption) | Returns a string which represents the phonetic code for the
string input word. Input words that produce the same code are considered
phonetically equivalent. The empty string is returned if the input
string is empty. length is an int8 and can be any value between 4 and 10. The default is 4. censusOption can be 0, 1, or 2 where 0 (the default) is enhanced soundex and not a census code; 1 are the normal census codes used in all censuses from 1920 on; and 2 are special census codes used intermittently in 1880, 1900, and 1910. |
string upper_case( string ) | Converts string to uppercase. Non-alphabetic characters are ignored in the transformation. |
string compact_whitespace ( string ) | Returns a string after reducing all consecutive white space in string to a single space. |
string pad_string ( string , pad_string , pad_length ) | Returns the string with the pad_string appended
to the bounded length string for pad_length number
of characters. pad_length is an int16. When the given string is a variable-length string, it defaults to a bounded-length of 1024 characters. If the given string is a fixed-length string, this function has no effect. |
string strip_whitespace( string ) | Returns string after stripping all white space in the string. |
string trim_leading_trailing ( string ) | Returns string after removing all leading and trailing white space. |
string trim_leading( string ) | Returns a string after removing all leading white space. |
string trim_trailing( string ) | Returns a string after removing all trailing white space. |
int32 string_order_compare ( string1 , string2, justification) | Returns a numeric value specifying the result of the comparison.
The numeric values are: -1: string1 is less than string2 0: string1 is equal to string2 1: string1 is greater than string2 The string justification argument is either 'L' or 'R'. It defaults to 'L' if not specified. 'L' means a standard character comparison, left to right. 'R' means that any numeric substrings within the strings starting at the same position are compared as numbers. For example an 'R' comparison of "AB100" and "AB99" indicates that AB100 is great than AB99, since 100 is greater than 99. The comparisons are case sensitive. |
string replace_substring ( expression1 , expression2, string) | Returns a string value that contains the given string ,
with any characters in expression1 replaced by their corresponding
characters in expression2. For example: replace_substring ("ABC:, "abZ", "AGDCBDA") returns "aGDZbDa", where any "A" gets replaced by "a", any "B" gets replaced by "b" and any "C" gets replaced by "Z". If expression2 is longer than expression1, the extra characters are ignored. If expression1 is longer than expression2, the extra characters in expression1 are deleted from the given string (the corresponding characters are removed.) For example: replace_substring("ABC", "ab", "AGDCBDA") returns "aGDbDa". |
int32 count_substring ( string , substring ) | Returns the number of times that substring occurs in string . If substring is an empty string, the number of characters in string is returned. |
int32 dcount_substring ( string , delimiter) | Returns the number of fields in string delimited
by delimiter , where delimiter is
a string. For example, dcount_substring("abcFdefFghi", "F") returns 3. If delimiter is an empty string, the number of characters in the string + 1 is returned. If delimiter is not empty, but does not exist in the given string, 1 is returned. |
string double_quote_string ( expression ) | Returns the given string expression enclosed in double quotes. |
string substring_by_delimiter (string, delimiter, occurrence,numsubstr) | The string and delimiter arguments
are string values, and the occurrence and numsubstr arguments
are int32 values. This function returns numsubstr substrings from string , delimited by delimiter and starting at substring number occurrence . An example is: substring_by_delimiter ("abcFdefFghiFjkl", "F", 2, 2) The string "defFghi" is returned. If occurrence is < 1, then 1 is assumed. If occurrence does not point to an existing field, the empty string is returned. If numsubstr is not specified or is less than 1, it defaults to 1. |
int32 index_of_substring ( string , substring, occurrence) | Returns the starting position of the nth occurrence of substring in string .
The occurrence argument is an integer indicating
the nth occurrence. If there is no nth occurrence or string doesn't contain any substring , -1 is returned. If substring is an empty string, -2 is returned. |
string left_substring ( string , length) | Returns the first length characters of string . If length is 0, it returns the empty string. If length is greater than the length of the string, the entire string is returned. |
string right_substring ( string , length) | Returns the last length characters of string. If length is 0, it returns the empty string. If length is greater than the length of string , the entire string is returned. |
string string_of_space( count) | Returns a string containing count spaces. The empty string is returned for a count of 0 or less. |
string single_quote_string ( expression) | Returns the expression string enclosed in single quotes. |
string string_of_substring ( string , count) | Returns a string containing count occurrences of string. The empty string is returned for a count of 0 or less. |
string trimc_string ( string [, character [, option ]]) | If only string is specified, all leading
and trailing spaces and tabs are removed, and all multiple occurrences
of spaces and tabs are reduced to a single space or tab. If string and character are specified, option defaults to 'R' The available option values are: 'A' remove all occurrences of character 'B' remove both leading and trailing occurrences of character. 'D' remove leading, trailing, and redundant white-space characters. 'E' remove trailing white-space characters 'F' remove leading white-space characters 'L' remove all leading occurrences of character 'R' remove all leading, trailing, and redundant occurrences of character 'T' remove all trailing occurrences of character |
string system_time_date() | Returns the current system time in this 24-hour format: hh : mm :ss dd : mmm : yyyy |
int32 offset_of_substring ( string , substring , position) | Searches for the substring in the string beginning at character number position , where position is an uint32. Returns the starting position of the substring. |
int8 string_case_compare ( string , string) | This is a case-insensitive version of string_compare() below. |
int8 string_compare ( string , string) | Compares two strings and returns the index (0 or 1) of the greater string. |
int8 string_num_case_compare ( string , string, uint16) | This is a case-insensitive version of string_num_compare() below. |
string string_num_concatenate ( string , string, uint16) | Returns a string after appending uint16 characters from the second string onto the first string. |
int8 string_num_compare ( string , string, uint16) | Compares first uint16 characters of two given strings and returns the index (0 or 1) of the greater string. |
string string_num_copy ( string , uint16) | Returns the first uint16 characters from the given string . |
int32 string_length( string ) | Returns the length of the string. |
string substring ( string , starting_position, length) | Copies parts of strings to shorter strings by string extraction.
The starting_position specifies the starting location
of the substring; length specifies the substring
length. The arguments starting_position and length are uint16 types and must be positive (>= 0). |
string char_from_num( int32 ) | Returns an ASCII character from the given int32 .
If given a value that is not associated with a character such as -1,
the function returns a space. An example use is: char_from_num(38) which returns "&" |
int32 num_from_char( string) | Returns the numeric value of the ASCII-character in the string.
When this function is given an empty string, it returns 0; and when
it is given a multi-character string, it uses the first character
in the string. An example use is: num_from_char("&") which returns 38. |
InfoSphere DataStage provides the ustring type for multi-byte Unicode-character strings.ustrings can be assigned (=), compared (==, <, >=, and so on), and concatenated (+) in the Transformation Language. In addition, the functions described in are available for ustring manipulations. When a long string is assigned to a short string, the long string is truncated to the length of the short string. The term white space refers to spaces, tabs, and any other blank space.
Function | Description |
---|---|
ustring ustring_from_date ( date , " date_format " | date_format | format_variable) | Converts date to a ustring representation
using the optional format specification. By default, the format is yyyy -mm-dd. For format descriptions, see and "date Uformat" . |
ustring ustring_from_decimal ( decimal , "fix_zero suppress_zero" | format_variable ) | Returns a ustring from decimal. fix_zero causes a decimal field containing all zeros to be treated as a valid zero. suppress_zero argument specifies that the returned ustring value will have no leading or trailing zeros. Examples: 000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1 The format specification is optional. |
ustring ustring_from_time ( time , " time_format " | time_uformat | format_variable) | Returns a ustring from time using an optional format specification.The default time format is hh:nn:ss. For format descriptions, see and "time Uformat" . |
ustring ustring_from_timestamp ( timestamp , " timestamp_format " | format_variable ) | Returns a ustring from timestamp . The format specification is optional. The default format is yyyy-mm-dd hh:mm:ss. |
int8 u_is_alnum( ustring ) | Returns 1 (true) if ustring consists entirely of alphanumeric characters. |
int8 u_is_alpha( ustring ) | Returns 1 (true) if ustring consists entirely of alphabetic characters. |
int8 u_is_numeric( ustring ) | Returns 1 (true) if ustring consists entirely of numeric characters, including decimal and sign. |
int16 lookup_int16_from_ustring ( ustring , " table_definition " | table_variable ) | Returns an integer corresponding to ustring using table_definition string or variable. See "String Conversions and Lookup Tables" for more information. |
ustring lookup_ustring_from_int16 ( int16 , " table_definition " | table_variable ) | Returns a ustring corresponding to int16 using table_definition string or table_variable . See "String Conversions and Lookup Tables" for more information. |
ustring lookup_ustring_from_uint32 ( uint32 , " table_definition " | table_variable ) | Returns a ustring corresponding to uint32 using table_definition string or variable. See"String Conversions and Lookup Tables" for more information. |
uint32 lookup_uint32_from_ustring ( string , " table_definition " | table_variable ) | Returns an unsigned integer from ustring using table_definition or table_variable . |
int8 u_is_valid (" type_ustring ", " value_ustring ") | Returns 1 (true) if value_ustring is valid
according to type_ustring , including NULL. The type_ustring argument
is required. It must specify an InfoSphere DataStage schema
data type. Integer types are checked to ensure the value_ustring is numeric (signed or unsigned), a whole number, and a valid value (for example, 1024 can not be assigned to an int8 type). Decimal types are checked to ensure the value_ustring is numeric (signed or unsigned) and a valid value. Float types are checked to ensure the value_ustring is numeric (signed or unsigned) and a valid value (exponent is valid). String is always valid with the NULL exception below. For all types, if the field cannot be set to NULL and the string is NULL, 0 (false) is returned. Date, time, and timestamp types are checked to ensure they are correct, using the optional format argument, and valid values. Raw cannot be checked since the input is a string. |
ustring u_lower_case( ustring ) | Converts ustring to lowercase. Non-alphabetic characters are ignored in the transformation. |
ustring u_upper_case( ustring ) | Converts ustring to uppercase. Non-alphabetic characters are ignored in the transformation. |
ustring u_compact_whitespace ( ustring ) | Returns the ustring after reducing all consecutive white space in ustring to a single space. |
ustring u_pad_string ( ustring , pad_ustring , pad_length ) | Returns the ustring with pad_ustring appended
to the bounded length string for pad_length number
of characters. pad_length is an int16. When the given ustring is a variable-length string, it defaults to a bounded-length of 1024 characters. If the given ustring is a fixed-length string, this function has no effect. |
ustring u_strip_whitespace ( ustring ) | Returns ustring after stripping all white space in the string. |
ustring u_trim_leading_trailing ( ustring ) | Returns ustring after removing all leading and trailing white space. |
ustring u_trim_leading( ustring ) | Returns ustring after removing all leading white space. |
ustring u_trim_trailing ( ustring ) | Returns a ustring after removing all trailing white space. |
int32 u_string_order_compare ( ustring1 , ustring2, justification) | Returns a numeric value specifying the result of the comparison.
The numeric values are: -1: ustring1 is less than ustring2 0: ustring1 is equal to ustring2 1: ustring1 is greater than ustring2 The string justification argument is either 'L' or 'R'. It defaults to 'L' if not specified. 'L' means a standard character comparison, left to right. 'R' means that any numeric substrings within the strings starting at the same position are compared as numbers. For example an 'R' comparison of "AB100" and "AB99" indicates that AB100 is great than AB99, since 100 is greater than 99. The comparisons are case sensitive. |
ustring u_replace_substring ( expression1 , expression2, ustring) | Returns a ustring value that contains the given ustring ,
with any characters in expression1 replaced by
their corresponding characters in expression2 . For
example: u_replace_substring ("ABC", "abZ", "AGDCBDA") returns "aGDZbDa", where any "A" gets replaced by "a", any "B" gets replaced by "b" and any "C" gets replaced by "Z". If expression2 is longer than expression1 , the extra characters are ignored. If expression1 is longer than expression2, the extra characters in expression1 are deleted from the given string (the corresponding characters are removed.) For example: u_replace_substring("ABC", "ab", "AGDCBDA") returns "aGDbDa". |
int32 u_count_substring ( ustring , sub_ustring ) | Returns the number of times that sub_ustring occurs in ustring . If sub_ustring is an empty string, the number of characters in ustring is returned. |
int32 u_dcount_substring ( ustring , delimiter) | Returns the number of fields in ustring delimited
by delimiter , where delimiter is
a string. For example, dcount_substring("abcFdefFghi", "F") returns 3. If delimiter is an empty string, the number of characters in the string + 1 is returned. If delimiter is not empty, but does not exist in the given string, 1 is returned. |
ustring u_double_quote_string ( expression) | Returns the given ustring expression enclosed in double quotes. |
ustring u_substring_by_delimiter ( ustring, delimiter, occurrence, numsubstr) | The delimiter argument is a ustring value,
and the occurrence and numsubstr arguments
are int32 values. This function returns numsubstr substrings from ustring , delimited by delimiter and starting at substring number occurrence . An example is: u_substring_by_delimiter ("abcFdefFghiFjkl", "F", 2, 2) The string "defFghi" is returned. If occurrence is < 1, then 1 is assumed. If occurrence does not point to an existing field, the empty string is returned. If numsubstr is not specified or is less than 1, it defaults to 1. |
int32 u_index_of_substring ( ustring, sub_ustring, occurrence ) | Returns the starting position of the nth occurrence of sub_ustring in ustring.
The occurrence argument is an integer indicating
the nth occurrence . If there is no nth occurrence, 0 is returned; if sub_ustring is an empty string, -2 is returned; and if ustring doesn't contain any sub_ustring , -1 is returned. |
ustring u_left_substring ( ustring , length) | Returns the first length characters of ustring. If length is 0, it returns the empty string. If length is greater than the length of the ustring , the entire ustring is returned. |
ustring u_right_substring ( ustring , length) | Returns the last length characters of ustring . If length is 0, it returns the empty string. If length is greater than the length of ustring , the entire ustring is returned. |
ustring u_string_of_space( count) | Returns a ustring containing count spaces. The empty string is returned for a count of 0 or less. |
ustring u_single_quote_string ( expression) | Returns expression enclosed in single quotes. |
ustring u_string_of_substring ( ustring , count) | Returns a ustring containing count occurrences of ustring. The empty string is returned for a count of 0 or less. |
ustring u_trimc_string ( ustring [, character [, option ]]) | If only ustring is specified, all leading
and trailing spaces and tabs are removed, and all multiple occurrences
of spaces and tabs are reduced to a single space or tab. If ustring and character are specified, option defaults to 'R' The available option values are: 'A' remove all occurrences of character 'B' remove both leading and trailing occurrences of character. 'D' remove leading, trailing, and redundant white-space characters. 'E' remove trailing white-space characters 'F' remove leading white-space characters 'L' remove all leading occurrences of character 'R' remove all leading, trailing, and redundant occurrences of character 'T' remove all trailing occurrences of character |
ustring u_system_time_date() | Returns the current system time in this 24-hour format: hh:mm:ss dd:mmm:yyyy |
int32 u_offset_of_substring ( ustring , sub_ustring , position) | Searches for the sub_ustring in the ustring beginning at character number position, where position is an uint32. Returns the starting position of the substring. |
int8 u_string_case_compare ( ustring , ustring) | This is a case-insensitive version of u_string_compare() below. |
int8 u_string_compare ( ustring , ustring) | Compares two ustrings and returns the index (0 or 1) of the greater string. |
int8 u_string_num_case_compare ( ustring , ustring, uint16) | This is a case-insensitive version of u_string_num_compare() below. |
ustring u_string_num_concatenate ( ustring , ustring, uint16) | Returns a ustring after appending uint16 characters from the second ustring onto the first ustring . |
int8 u_string_num_compare ( utring , ustring, uint16) | Compares first uint16 characters of two given ustrings and returns the index (0 or 1) of the greater ustring . |
ustring u_string_num_copy ( ustring , uint16) | Returns the first uint16 characters from the given ustring . |
int32 u_string_length( ustring ) | Returns the length of the ustring . |
ustring u_substring ( ustring , starting_position, length) | Copies parts of ustrings to shorter strings by string extraction.
The starting_position specifies the starting location
of the substring; length specifies the substring
length. The arguments starting_position and length are uint16 types and must be positive (>= 0). |
ustring u_char_from_num( int32 ) | Returns a ustring character value from the given int32 .
If given a value that is not associated with a character such as -1,
the function returns a space. An example use is: u_char_from_num(38) which returns "&" |
int32 u_num_from_char( ustring) | Returns the numeric value of the character in the ustring .
When this function is given an empty string, it returns 0; and when
it is given a multi-character string, it uses the first character
in the string. An example use is: u_num_from_char("&") which returns 38 |
Function | Description |
---|---|
string bit_expand ( uint64) | Expands the given uint64 to a string containing the binary representation. |
ustring u_bit_expand ( uint64) | Expands the given uint64 to a ustring containing the binary representation. |
uint64 bit_compress( string ) | Converts the string binary representation to an uint64 field. |
uint64 u_bit_compress( ustring ) | Converts the ustring binary representation to an uint64 field. |
uint64 set_bit ( uint64 , list_of_bits , bit_state) | Turns the uint64 bits that are listed by
number in the string list_of_bits on or off, depending
on whether the value of the bit_state integer is
1 or 0. bit_state is an optional argument, and
has a default value of 1 which turns the list of bits on. An example use is: set_bit(0, "1,3,5,7") which returns 85. |
uint64 u_set_bit ( uint64 , list_of_bits , bit_state) | This function is a internationalized version of set_bit() above. |
The Job Monitor reports on the current state of a job and supplies information about the operators in your data flow. By default, it continually gathers information about your jobs, but it does not send the information unless you request it from your user-written client.
The information it supplies falls into four categories: job status, metadata, monitor, and summary, and it is given in XML notation. If you do not have job monitoring disabled with the top-level -nomonitor osh option, you can also obtain custom report and summary information about the transform operator using the functions in the table below.
There are six functions: three for the string type and three for the ustring type. The name_string or name_ustring argument can be used to specify a name for the custom information, the description_string or description_ustring argument can be used to describe the type of information, and the value_string or value_ustring argument can be used to give the details of the information.
Function | Description |
---|---|
string set_custom_summary_info ( name_string , description_string, value_string) ustring u_set_custom_summary_info ( name_ustring ,description_ustring, value_ustring) |
Call this function in the finish code segment. |
string send_custom_report ( name_string , description_string, value_string) ustring u_send_custom_report ( name_ustring , description_ustring , value_ustring) |
Call this function in the initialize code segment. |
string set_custom_instance_report ( name_string , description_string, value_string) ustring u_set_custom_instance_report ( name_ustring , description_ustring, value_ustring) |
Call this function in the mainloop code segment. |
The following table describes functions in the Transformation Language that do not fit into any of the above categories.
Function | Description |
---|---|
void force_error ( error_message_string ) | Terminates the data flow when an error is detected, and prints error_message_string to stderr. |
u_force_error ( error_message_ustring ) | Terminates the data flow when an error is detected, and prints error_message_ustring to stderr. |
string get_environment( string ) | Returns the current value of string , a UNIX environment variable. The functionality is the same as the C getenv function. |
ustring u_get_environment ( ustring ) | Returns the current value of utring , a UNIX environment variable. The functionality is the same as the C getenv function. |
int16 get_partition_num() | Returns the current partition number. |
int16 get_num_of_partitions() | Returns the number of partitions. |
void print_message ( message_string ) | Prints message_string to stdout. |
u_print_message ( message_ustring ) | Prints message_ustring to stdout. |
uint32 size_of( value ) | Returns the actual size value when it is stored, not the length. |
The Transformation Language supports the placement of binary operators between two decimals and between a decimal and one of the numeric field types.
The binary operators are +, -, *, and /.
The numeric field types are int8, uint8, int16, uint16, int32, uint32, int64, uint64, sfloat, and dfloat.
Generally there are no restrictions on the form of an operand. It can be a constant, a variable, a function call that returns a numeric value, a simple expression such as an addition of two variables, or a complicated expression that consists of function calls as well as arithmetic operations.
By default, the transform operator sets the precision and scale of any temporary internal decimal variable created during arithmetic operations to:
[TRX_DEFAULT_MAX_PRECISION=38,
TRX_DEFAULT_MAX_SCALE=10]
with RoundMode equal to eRoundInf.
You can override the default values using these environment variables:
Fatal errors might occur at runtime if the precision of the destination decimal is smaller than that of the source decimal