Information icon IBM InfoSphere DataStage and InfoSphere QualityStage, Version 8.5
space Feedback

Built-in functions

This section defines functions that are provided by the Transformation Language. It is presented in a series of tables that deal with data transformation functions of the following types:

When a function generates an output value, it returns the result. For functions with optional arguments, simply omit the optional argument to accept the default value. Default conversions among integer, float, decimal, and numeric string types are supported in the input arguments and the return value of the function. All integers can be signed or unsigned.

The transform operator has default NULL handling at the record-level with individual field "overrides". Options can be entered at the record level or the field level.

Lookup table functions

Function Description
lookup( lookup_table ) Performs a lookup on the table using the current input record. It fills the current record of the lookup table with the first record found. If a match is not found, the current record is empty. If this is called multiple times on the same record, the record is filled with the current match if there is one and a new lookup will not be done.
next_match( lookup_table ) Gets the next record matched in the lookup and puts it into the current record of the table.
clear_lookup( lookup_table ) Checks to see if the current lookup record has a match. If this method returns false directly after the lookup() call, no matches were found in the table. Returns a boolean value specifying whether the record is empty or not.
int8 is_match( lookup_table ) Checks to see if the current lookup record has a match. If this method returns false directly after the lookup() call, no matches were found in the table. Returns a boolean value specifying whether the record is empty or not.

Data conversion functions

date field functions

InfoSphere® DataStage® performs no automatic type conversion of date fields. Either an input data set must match the operator interface or you must effect a type conversion by means of the transform or modify operator.

A date conversion to or from a numeric field can be specified with any InfoSphere DataStage numeric data type. InfoSphere DataStage performs the necessary modifications and either translates a numeric field to the source data type or translates a conversion result to the numeric data type of the destination. For example, you can use the transformation function month_day_from_date() to convert a date to an int8, or to an int16, int32, dfloat, and so on

date format

The default format of the date contained in the string is yyyy-mm-dd. However, you can specify an optional format string that defines another format. The format string requires that you provide enough information for InfoSphere DataStage to determine a complete date (either day, month, and year, or year and day of year).

The format_string can contain one or a combination of the following elements:

Table 1. Date format tags
Tag Variable width availability Description Value range Options
%d import Day of month, variable width 1...31 s
%dd   Day of month, fixed width 01...31 s
%ddd with v option Day of year 1...366 s, v
%m import Month of year, variable width 1...12 s
%mm   Month of year, fixed width 01...12 s
%mmm   Month of year, short name, locale specific Jan, Feb ... t, u, w
%mmmm import/export Month of year, full name, locale specific January, February ... t, u, w, -N, +N
%yy   Year of century 00...99 s
%yyyy   Four digit year 0001 ...9999  
%NNNNyy   Cutoff year plus year of century yy = 00...99 s
%e   Day of week, Sunday = day 1 1...7  
%E   Day of week, Monday = day 1 1...7  
%eee   Weekday short name, locale specific Sun, Mon ... t, u, w
%eeee import/export Weekday long name, locale specific Sunday, Monday ... t, u, w, -N, +N
%W import Week of year (ISO 8601, Mon) 1...53 s
%WW   Week of year (ISO 8601, Mon) 01...53 s

When you specify a date format string, prefix each component with the percent symbol (%) and separate the string's components with a suitable literal character.

The default date_format is %yyyy-%mm-%dd.

Where indicated the tags can represent variable-width data elements. Variable-width date elements can omit leading zeroes without causing errors.

The following options can be used in the format string where indicated in the table:

s
Specify this option to allow leading spaces in date formats. The s option is specified in the form:
%(tag,s)
Where tag is the format string. For example:
%(m,s)
indicates a numeric month of year field in which values can contain leading spaces or zeroes and be one or two characters wide. If you specified the following date format property:
%(d,s)/%(m,s)/%yyyy
Then the following dates would all be valid:
8/ 8/1958
08/08/1958
8/8/1958
v
Use this option in conjunction with the %ddd tag to represent day of year in variable-width format. So the following date property:
%(ddd,v)
represents values in the range 1 to 366. (If you omit the v option then the range of values would be 001 to 366.)
u
Use this option to render uppercase text on output.
w
Use this option to render lowercase text on output.
t
Use this option to render titlecase text (initial capitals) on output.
The u, w, and t options are mutually exclusive. They affect how text is formatted for output. Input dates will still be correctly interpreted regardless of case.
-N
Specify this option to left justify long day or month names so that the other elements in the date will be aligned.
+N
Specify this option to right justify long day or month names so that the other elements in the date will be aligned.
Names are left justified or right justified within a fixed width field of N characters (where N is between 1 and 99). Names will be truncated if necessary. The following are examples of justification in use:

%dd-%(mmmm,-5)-%yyyyy

21-Augus-2006

%dd-%(mmmm,-10)-%yyyyy

21-August    -2005

%dd-%(mmmm,+10)-%yyyyy

21-    August-2005

The locale for determining the setting of the day and month names can be controlled through the locale tag. This has the format:

%(L,'locale')

Where locale specifies the locale to be set using the language_COUNTRY.variant naming convention supported by ICU. See IBM® InfoSphere DataStage and QualityStage® Globalization Guide for a list of locales. The default locale for month names and weekday names markers is English unless overridden by a %L tag or the APT_IMPEXP_LOCALE environment variable (the tag takes precedence over the environment variable if both are set).

Use the locale tag in conjunction with your time format, for example the format string:

%(L,'es')%eeee, %dd %mmmm %yyyy

Specifies the Spanish locale and would result in a date with the following format:

miércoles, 21 septembre 2005

The format string is subject to the restrictions laid out in the following table. A format string can contain at most one tag from each row. In addition some rows are mutually incompatible, as indicated in the 'incompatible with' column. When some tags are used the format string requires that other tags are present too, as indicated in the 'requires' column.
Table 2. Format tag restrictions
Element Numeric format tags Text format tags Requires Incompatible with
year %yyyy, %yy, %[nnnn]yy - - -
month %mm, %m %mmm, %mmmm year week of year
day of month %dd, %d - month day of week, week of year
day of year %ddd   year day of month, day of week, week of year
day of week %e, %E %eee, %eeee month, week of year day of year
week of year %WW   year month, day of month, day of year

When a numeric variable-width input tag such as %d or %m is used, the field to the immediate right of the tag (if any) in the format string cannot be either a numeric tag, or a literal substring that starts with a digit. For example, all of the following format strings are invalid because of this restriction:

%d%m-%yyyy

%d%mm-%yyyy

%(d)%(mm)-%yyyy

%h00 hours

The year_cutoff is the year defining the beginning of the century in which all two-digit years fall. By default, the year cutoff is 1900; therefore, a two-digit year of 97 represents 1997.

You can specify any four-digit year as the year cutoff. All two-digit years then specify the next possible year ending in the specified two digits that is the same or greater than the cutoff. For example, if you set the year cutoff to 1930, the two-digit year 30 corresponds to 1930, and the two-digit year 29 corresponds to 2029.

On import and export, the year_cutoff is the base year.

This property is mutually exclusive with days_since, text, and julian.

You can include literal text in your date format. Any Unicode character other than null, backslash, or the percent sign can be used (although it is better to avoid control codes and other non-graphic characters). The following table lists special tags and escape sequences:
Tag Escape sequence
%% literal percent sign
\% literal percent sign
\n newline
\t horizontal tab
\\ single backslash

date Uformat

The date uformat provides support for international components in date fields. It's syntax is:

String%macroString%macroString%macroString

where %macro is a date formatting macro such as %mmm for a 3-character English month. Only the String components of date uformat can include multi-byte Unicode characters.

Note: Any argument that has to be double quoted cannot be a field name or a local variable. An argument must have the data format of its type.

Function Description
date date_from_days_since ( int32 , " date " | format_variable ) Returns date by adding the given integer to the baseline date.

Converts an integer field into a date by adding the integer to the specified base date. The date must be in the format yyyy-mm-dd and must be either double quoted or a variable.

date date_from_julian_day( uint32 ) Returns the date given a Julian day.
date date_from_string ( string , " date_format " | date_uformat | format_variable ) Returns a date from the given string formatted in the optional format specification.

By default the string format is yyyy-mm-dd. For format descriptions, see and "date Uformat" .

date date_from_ustring ( string , " date_format " | date_uformat | format_variable ) Returns a date from the given ustring formatted in the optional format specification.

By default the ustring format is yyyy-mm-dd. For format descriptions, see and "date Uformat" .

string string_from_date ( date , " date_format " | date_uformat ) Converts the date to a string representation using the given format specification.

By default the ustring format is yyyy-mm-dd. For format descriptions, see and "date Uformat" .

ustring ustring_from_date ( date , " date_format " | date_uformat ) Converts the date to a ustring representation using the given format specification.

By default the ustring format is yyyy - mm -dd. For format descriptions, see and "date Uformat" .

date date_from_timestamp( timestamp ) Returns the date from the given timestamp .
int32 days_since_from_date ( date , " source_date " | format_variable ) Returns a value corresponding to the number of days from source_date to date .

source_date must be in the form yyyy - mm -dd and must be double quoted or be a variable.

uint32 julian_day_from_date( date ) Returns a Julian date given the date .
int8 month_day_from_date( date ) Returns the day of the month given the date . For example, the date 07-23-2001 returns 23.
int8 month_from_date( date ) Returns the month from the given date . For example, the date 07-23-2001 returns 7.
date next_weekday_from_date ( date , " day " | format_variable ) The value returned is the date of the specified day of the week soonest after date (including the date ).

The day argument is optional. It is a string or variable specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. By default, the value is Sunday.

date previous_weekday_from_date ( date , " day " | format_variable ) Returns the previous weekday date from date .

The destination contains the closest date for the specified day of the week earlier than the source date (including the source date)

The day argument is optional. It is a string or variable specifying a day of the week. You can specify day using either the first three characters of the day name or the full day name. By default, the value is Sunday.

int8 weekday_from_date ( date , " origin_day " | format_variable ) Returns the day of the week from date .

The optional argument origin_day is a string or variable specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is day zero.

int16 year_day_from_date( date ) Returns the day of the year (1-366) from date .
int16 year_from_date( date ) Returns the year from date . For example, the date 07-23-2001 returns 2001.
int8 year_week_from_date( date ) Returns the week of the year from date . For example, the date 07-23-2001 returns 30.

decimal and float Field Functions

You can do the following transformations using the decimal and float field functions.

raw Field Functions

Use the raw field functions to transform a string into a raw data type and to determine the length of a raw value.

Function Description
raw raw_from_string( string ) Returns string in raw representation.
raw u_raw_from_string( ustring ) Returns ustring in raw representation.
int32 raw_length( raw ) Returns the length of the raw field.

time and timestamp field functions

InfoSphere DataStage performs no automatic conversions to or from the time and timestamp data types. You must use the modify or transform operator if you want to convert a source or destination field. Most field conversions extract a portion of the time, such as hours or minutes, and write it into a destination field.

Time conversion to a numeric field can be used with any InfoSphere DataStage numeric data type. InfoSphere DataStage performs the necessary modifications to translate a conversion result to the numeric data type of the destination. For example, you can use the transformation function hours_from_time() to convert a time to an int8, or to an int16, int32, dfloat, and so on.

The string_from_time() and time_from_string() conversion functions take a format as a parameter of the conversion. The default format of the time in the string is hh:nn:ss. However, you can specify an optional format string defining another time format. The format string must contain a specification for hours, minutes, and seconds.

time Uformat

The time uformat provides support for international components in time fields. It's syntax is:

 String % macroString % macroString % macroString

where %macro is a time formatting macro such as %hh for a two-digit hour. See below for a description of the date format macros. Only the String components of time uformat can include multi-byte Unicode characters.

timestamp Uformat

This format is a concatenation of the date uformat and time uformat which are described in "date Uformat" and "time Uformat". The order of the formats does not matter, but the two formats cannot be mixed.

time Format

The possible components of the time_format string are given in the following table:
Table 3. Time format tags
Tag Variable width availability Description Value range Options
%h import Hour (24), variable width 0...23 s
%hh   Hour (24), fixed width 0...23 s
%H import Hour (12), variable width 1...12 s
%HH   Hour (12), fixed width 01...12 s
%n import Minutes, variable width 0...59 s
%nn   Minutes, fixed width 0...59 s
%s import Seconds, variable width 0...59 s
%ss   Seconds, fixed width 0...59 s
%s.N import Seconds + fraction (N = 0...6) s, c, C
%ss.N   Seconds + fraction (N = 0...6) s, c, C
%SSS with v option Milliseconds 0...999 s, v
%SSSSSS with v option Microseconds 0...999999 s, v
%aa German am/pm marker, locale specific am, pm u, w

By default, the format of the time contained in the string is %hh:%nn:%ss. However, you can specify a format string defining the format of the string field.

You must prefix each component of the format string with the percent symbol. Separate the string's components with any character except the percent sign (%).

Where indicated the tags can represent variable-fields on import, export, or both. Variable-width date elements can omit leading zeroes without causing errors.

The following options can be used in the format string where indicated:
s
Specify this option to allow leading spaces in time formats. The s option is specified in the form:
%(tag,s)
Where tag is the format string. For example:
%(n,s)
indicates a minute field in which values can contain leading spaces or zeroes and be one or two characters wide. If you specified the following date format property:
%(h,s):$(n,s):$(s,s)
Then the following times would all be valid:
20: 6:58
20:06:58
20:6:58
v
Use this option in conjunction with the %SSS or %SSSSSS tags to represent milliseconds or microseconds in variable-width format. So the time property:
%(SSS,v)
represents values in the range 0 to 999. (If you omit the v option then the range of values would be 000 to 999.)
u
Use this option to render the am/pm text in uppercase on output.
w
Use this option to render the am/pm text in lowercase on output.
c
Specify this option to use a comma as the decimal separator in the %ss.N tag.
C
Specify this option to use a period as the decimal separator in the %ss.N tag.
The c and C options override the default setting of the locale.

The locale for determining the setting of the am/pm string and the default decimal separator can be controlled through the locale tag. This has the format:

%(L,'locale')

Where locale specifies the locale to be set using the language_COUNTRY.variant naming convention supported by ICU. See IBM InfoSphere DataStage and QualityStage Globalization Guide for a list of locales. The default locale for am/pm string and separators markers is English unless overridden by a %L tag or the APT_IMPEXP_LOCALE environment variable (the tag takes precedence over the environment variable if both are set).

Use the locale tag in conjunction with your time format, for example:

%L('es')%HH:%nn %aa

Specifies the Spanish locale.

The format string is subject to the restrictions laid out in the following table. A format string can contain at most one tag from each row. In addition some rows are mutually incompatible, as indicated in the 'incompatible with' column. When some tags are used the format string requires that other tags are present too, as indicated in the 'requires' column.
Table 4. Format tag restrictions
Element Numeric format tags Text format tags Requires Incompatible with
hour %hh, %h, %HH, %H - - -
am/pm marker - %aa hour (%HH) hour (%hh)
minute %nn, %n - - -
second %ss, %s - - -
fraction of a second %ss.N, %s.N, %SSS, %SSSSSS - - -
You can include literal text in your date format. Any Unicode character other than null, backslash, or the percent sign can be used (although it is better to avoid control codes and other non-graphic characters). The following table lists special tags and escape sequences:
Tag Escape sequence
%% literal percent sign
\% literal percent sign
\n newline
\t horizontal tab
\\ single backslash
Function Description
int8 hours_from_time( time ) Returns the hour portion of the given time.
int32 microseconds_from_time( time ) Returns the number of microseconds from the given time.
dfloat midnight_seconds_from_time ( time ) Returns the number of seconds from midnight to time .
int8 minutes_from_time( time ) Returns the number of minutes from time .
dfloat seconds_from_time( time ) Returns the number of seconds from time.
dfloat seconds_since_from_timestamp ( timestamp , " source_timestamp_string " | format_variable ) Returns the number of seconds from timestamp to the base timestamp, or optionally the second timestamp argument for the number of seconds between timestamps. The source_timestamp_string argument must be double quoted or be a variable.
time time_from_midnight_seconds( dfloat ) Returns the time given the number of seconds ( dfloat ) since midnight.
time time_from_string ( string , time_format | time_uformat | format_variable) Returns a time representation of string using the optional time_format , time_uformat , or format_variable. By default, the time format is hh:nn:ss. For format descriptions, see and "time Uformat" .
time time_from_ustring ( ustring , time_format | time_uformat | format_variable) Returns a time representation of ustring using the optional time_format , time_uformat , or format_variable specification. By default, the time format is hh:nn:ss. For format descriptions, see and "time Uformat" .
string string_from_time ( time , " time_format " | format_variable | time_uformat ) Returns a string from time. The format argument is optional.The default time format is hh:nn:ss. For format descriptions, see and "time Uformat" .
string string_from_time ( time , " time_format " | format_variable | time_format ) Returns a ustring from time . The format argument is optional.The default time format is hh:nn:ss. For format descriptions, see and "time Uformat" .
time time_from_timestamp( timestamp ) Returns the time from timestamp .
date date_from_timestamp( timestamp ) Returns the date from the given timestamp .
timestamp timestamp_from_date_time ( date , time ) Returns a timestamp from date and time . The date specifies the date portion ( yyyy - nn - dd ) of the timestamp. The time argument specifies the time to be used when building the timestamp. The time argument must be in the hh : nn :ss format.
timestamp timestamp_from_seconds_since ( dfloat , " original_timestamp_string " | format_variable ) Returns the timestamp from the number of seconds ( dfloat ) from the base timestamp or the original_timestamp_string argument. The original_timestamp_string must be double quoted or be a variable.
timestamp timestamp_from_string ( string , " timestamp_format " | timestamp_uformat | format_variable) Returns a timestamp from string, in the optional timestamp_format , timestamp_uformat, or format_variable . The timestamp_format must be double quoted or be a variable. The default format is yyyy - nn - dd hh : nn : ss . timestamp_format is described in .
timestamp timestamp_from_ustring ( ustring , " timestamp_format " | timestamp_uformat | format_variable) Returns a timestamp from ustring , in the optional format specification. The timestamp_format must be a double quoted string, a uformat , or a variable. The default format is yyyy - nn - dd hh : nn : ss . timestamp_uformat is described in .
string string_from_timestamp ( timestamp , " timestamp_format " | format_variable ) Returns a string from timestamp . The formatting specification is optional. The default format is yyyy - mm - dd hh : mm : ss .
ustring ustring_from_timestamp ( timestamp , " timestamp_format " | format_variable ) Returns a ustring from timestamp . The formatting specification is optional. The default format is yyyy - mm - dd hh : mm : ss .
timestamp timestamp_from_time ( time , time_format | time_uformat ) Returns a timestamp from time .

For format descriptions, see and "time Uformat"

date date_from_timestamp( timestamp ) Returns the date from the given timestamp .
timestamp timestamp_from_timet( int32 ) Returns a timestamp from the given UNIX time_t representation ( int32 ).
int32 timet_from_timestamp( timestamp ) Returns the UNIX time_t representation of timestamp.

null handling functions

lists the transformation functions for NULL handling.

All data types support nulls. As part of processing a record, an operator can detect a null and take the appropriate action, for example, it can omit the null field from a calculation or signal an error condition.

InfoSphere DataStage represents nulls in two ways.

The null-handling functions can change a null representation from an out-of-band null to an in-band null and from an in-band null to an out-of-band null.

Function Description
destination_field handle_null ( source_field , value ) Change the source_field NULL representations from out-of-band representation to an in-band representation. The value field assigns the value that corresponds to NULL.
destination_field make_null ( source_field , value ) Changes source_field NULL representation from in-band NULL representation to out-of-band. The value field allows multiple valid NULL values to be inputted as arguments.
int8 notnull( source_field ) Returns 1 if source_field is not NULL, otherwise returns 0.
int8 null( source_field ) Returns 1 if source_field is NULL, otherwise returns 0.
set_null() This function is used with "=" to set the left side output field, when it is nullable, to null. For example:

a-field = set_null();

int8 is_dfloat_inband_null ( dfloat ) Returns 1 if dfloat is an inband null; otherwise it returns 0.
int8 is_int16_inband_null ( int16 ) Returns 1 if int16 is an inband null; otherwise it returns 0.
int8 is_int32_inband_null ( int32 ) Returns 1 if int32 is an inband null; otherwise it returns 0.
int8 is_int64_inband_null ( int64 ) Returns 1 if int64 is an inband null; otherwise it returns 0.
int8 is_sfloat_inband_null ( sfloat ) Returns 1 if sfloat is an inband null; otherwise it returns 0.
int8 is_string_inband_null ( string ) Returns 1 if string is an inband null; otherwise it returns 0.
int8 u_is_string_inband_null ( ustring ) Returns 1 if ustring is an inband null; otherwise it returns 0.
Note: Null-handling functions cannot be used for subrecord fields.

Mathematical functions

Function

Description

int32 abs( int32 ) Returns the absolute value of int32 .
dfloat acos( dfloat ) Returns the principal value of the arc cosine of dfloat .
dfloat asin( dfloat ) Returns the principal value of the arc sine of dfloat .
dfloat atan( dfloat ) Returns the principal value of the arc tangent of dfloat.
dfloat atan2( dfloat , dfloat ) Returns the principal value of the arc tangent of y/x (where y is the first argument).
dfloat ceil( decimal ) Returns the smallest dfloat value greater than or equal to decimal .
dfloat cos( dfloat ) Returns the cosine of the given angle ( dfloat ) expressed in radians.
dfloat cosh( dfloat ) Returns the hyperbolic cosine of dfloat .
dfloat exp( dfloat ) Returns the exponential of dfloat.
dfloat fabs( dfloat ) Returns the absolute value of dfloat .
dfloat floor( decimal ) Returns the largest dfloat value less than or equal to decimal .
dfloat ldexp( dfloat , int32 ) Reconstructs dfloat out of the mantissa and exponent of int32 .
uint64 llabs( int64 ) Returns the absolute value of int64 .
dfloat log( dfloat ) Returns the natural (base e) logarithm of dfloat .
dfloat log10( dfloat ) Returns the logarithm to the base 10 of dfloat .
int32 max( int32 , int32 ) Returns the larger of the two integers.
int32 min( int32 , int32 ) Returns the smaller of the two integers.
dfloat pow( dfloat , dfloat ) Returns the result of raising x (the first argument) to the power y (the second argument).
uint32 rand() Returns a pseudo-random integer between 0 and 232 - 1. The function uses a multiplicative congruential random-number generator with period 232. See the UNIX man page for rand for more details.
uint32 random() Returns a random integer between 0 and 231 - 1. The function uses a nonlinear additive feedback random-number generator employing a default state array size of 31 long integers to return successive pseudo-random numbers. The period of this random-number generator is approximately 16 x (231 - 1). Compared with rand, random is slower but more random. See the UNIX man page for random for more details.
dfloat sin( dfloat ) Returns the sine of dfloat expressed in radians.
dfloat sinh( dfloat ) Returns the hyperbolic sine of dfloat.
dfloat sqrt( dfloat ) Returns the square root of dfloat .
int32 quotient_from_dfloat ( dfloat1 , dfloat2) Returns the value of the quotient after dfloat1 is divided by dfloat2 .
srand( uint32 ) Sets a new seed ( uint32 ) for the frand() or srand() random number generator.
srandom( uint32 ) Sets a random seed for the random() number generator. See the UNIX man page for srandom for more details.
dfloat tan( dfloat ) Returns the tangent of the given angle ( dfloat ) expressed in radians.
dfloat tanh( dfloat ) Returns the hyperbolic tangent of dfloat .

String field functions

Strings can be assigned (=), compared (==, <, >=, and so on), and concatenated (+) in the Transformation Language. In addition, the functions described in below are available for string manipulations, and the functions described in are available for ustring manipulations. When a long string is assigned to a short string, the long string is truncated to the length of the short string. The term white space refers to spaces, tabs, and any other blank space.

String Conversions and Lookup Tables

You can construct a string lookup table to use when default conversions do not yield satisfactory results. A string lookup table is a table of two columns and as many rows as are required to perform a conversion to or from a string as shown in the following table.

Numeric Value String or Ustring
numVal1 string1 | ustring1
numVal2 string2 | ustring1
... ...
numVal3 stringn | ustringn

Each row of the lookup table specifies an association between a 16-bit integer or unsigned 32-bit integer value and a string or ustring. InfoSphere DataStage scans the Numeric Value or the String or Ustring column until it encounters the value or string to be translated. The output is the corresponding entry in the row.

The numeric value to be converted might be of the int16 or the uint32 data type. InfoSphere DataStage converts strings to values of the int16 or uint32 data type using the same table.

If the input contains a numeric value or string that is not listed in the table, InfoSphere DataStage operates as follows:

A table definition defines the rows of a string or ustring lookup table and has the following form:

{propertyList} ('string' | 'ustring' = value; 'string' | 'ustring'= value; ... )

where:

propertyList is one or more of the following options; the entire list is enclosed in braces and properties are separated by commas if there are more than one:

Ustring field functions

InfoSphere DataStage provides the ustring type for multi-byte Unicode-character strings.ustrings can be assigned (=), compared (==, <, >=, and so on), and concatenated (+) in the Transformation Language. In addition, the functions described in are available for ustring manipulations. When a long string is assigned to a short string, the long string is truncated to the length of the short string. The term white space refers to spaces, tabs, and any other blank space.

Function Description
ustring ustring_from_date ( date , " date_format " | date_format | format_variable) Converts date to a ustring representation using the optional format specification.

By default, the format is yyyy -mm-dd. For format descriptions, see and "date Uformat" .

ustring ustring_from_decimal ( decimal , "fix_zero suppress_zero" | format_variable ) Returns a ustring from decimal.

fix_zero causes a decimal field containing all zeros to be treated as a valid zero.

suppress_zero argument specifies that the returned ustring value will have no leading or trailing zeros. Examples:

000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1

The format specification is optional.

ustring ustring_from_time ( time , " time_format " | time_uformat | format_variable) Returns a ustring from time using an optional format specification.The default time format is hh:nn:ss. For format descriptions, see and "time Uformat" .
ustring ustring_from_timestamp ( timestamp , " timestamp_format " | format_variable ) Returns a ustring from timestamp . The format specification is optional. The default format is yyyy-mm-dd hh:mm:ss.
int8 u_is_alnum( ustring ) Returns 1 (true) if ustring consists entirely of alphanumeric characters.
int8 u_is_alpha( ustring ) Returns 1 (true) if ustring consists entirely of alphabetic characters.
int8 u_is_numeric( ustring ) Returns 1 (true) if ustring consists entirely of numeric characters, including decimal and sign.
int16 lookup_int16_from_ustring ( ustring , " table_definition " | table_variable ) Returns an integer corresponding to ustring using table_definition string or variable. See "String Conversions and Lookup Tables" for more information.
ustring lookup_ustring_from_int16 ( int16 , " table_definition " | table_variable ) Returns a ustring corresponding to int16 using table_definition string or table_variable . See "String Conversions and Lookup Tables" for more information.
ustring lookup_ustring_from_uint32 ( uint32 , " table_definition " | table_variable ) Returns a ustring corresponding to uint32 using table_definition string or variable. See"String Conversions and Lookup Tables" for more information.
uint32 lookup_uint32_from_ustring ( string , " table_definition " | table_variable ) Returns an unsigned integer from ustring using table_definition or table_variable .
int8 u_is_valid (" type_ustring ", " value_ustring ") Returns 1 (true) if value_ustring is valid according to type_ustring , including NULL. The type_ustring argument is required. It must specify an InfoSphere DataStage schema data type.

Integer types are checked to ensure the value_ustring is numeric (signed or unsigned), a whole number, and a valid value (for example, 1024 can not be assigned to an int8 type).

Decimal types are checked to ensure the value_ustring is numeric (signed or unsigned) and a valid value.

Float types are checked to ensure the value_ustring is numeric (signed or unsigned) and a valid value (exponent is valid).

String is always valid with the NULL exception below.

For all types, if the field cannot be set to NULL and the string is NULL, 0 (false) is returned.

Date, time, and timestamp types are checked to ensure they are correct, using the optional format argument, and valid values.

Raw cannot be checked since the input is a string.

ustring u_lower_case( ustring ) Converts ustring to lowercase. Non-alphabetic characters are ignored in the transformation.
ustring u_upper_case( ustring ) Converts ustring to uppercase. Non-alphabetic characters are ignored in the transformation.
ustring u_compact_whitespace ( ustring ) Returns the ustring after reducing all consecutive white space in ustring to a single space.
ustring u_pad_string ( ustring , pad_ustring , pad_length ) Returns the ustring with pad_ustring appended to the bounded length string for pad_length number of characters. pad_length is an int16.

When the given ustring is a variable-length string, it defaults to a bounded-length of 1024 characters. If the given ustring is a fixed-length string, this function has no effect.

ustring u_strip_whitespace ( ustring ) Returns ustring after stripping all white space in the string.
ustring u_trim_leading_trailing ( ustring ) Returns ustring after removing all leading and trailing white space.
ustring u_trim_leading( ustring ) Returns ustring after removing all leading white space.
ustring u_trim_trailing ( ustring ) Returns a ustring after removing all trailing white space.
int32 u_string_order_compare ( ustring1 , ustring2, justification) Returns a numeric value specifying the result of the comparison. The numeric values are:

-1: ustring1 is less than ustring2 0: ustring1 is equal to ustring2 1: ustring1 is greater than ustring2

The string justification argument is either 'L' or 'R'. It defaults to 'L' if not specified. 'L' means a standard character comparison, left to right. 'R' means that any numeric substrings within the strings starting at the same position are compared as numbers. For example an 'R' comparison of "AB100" and "AB99" indicates that AB100 is great than AB99, since 100 is greater than 99. The comparisons are case sensitive.

ustring u_replace_substring ( expression1 , expression2, ustring) Returns a ustring value that contains the given ustring , with any characters in expression1 replaced by their corresponding characters in expression2 . For example:

u_replace_substring ("ABC", "abZ", "AGDCBDA")

returns "aGDZbDa", where any "A" gets replaced by "a", any "B" gets replaced by "b" and any "C" gets replaced by "Z".

If expression2 is longer than expression1 , the extra characters are ignored.

If expression1 is longer than expression2, the extra characters in expression1 are deleted from the given string (the corresponding characters are removed.) For example:

u_replace_substring("ABC", "ab", "AGDCBDA")

returns "aGDbDa".

int32 u_count_substring ( ustring , sub_ustring ) Returns the number of times that sub_ustring occurs in ustring . If sub_ustring is an empty string, the number of characters in ustring is returned.
int32 u_dcount_substring ( ustring , delimiter) Returns the number of fields in ustring delimited by delimiter , where delimiter is a string. For example,

dcount_substring("abcFdefFghi", "F")

returns 3.

If delimiter is an empty string, the number of characters in the string + 1 is returned. If delimiter is not empty, but does not exist in the given string, 1 is returned.

ustring u_double_quote_string ( expression) Returns the given ustring expression enclosed in double quotes.
ustring u_substring_by_delimiter ( ustring, delimiter, occurrence, numsubstr) The delimiter argument is a ustring value, and the occurrence and numsubstr arguments are int32 values.

This function returns numsubstr substrings from ustring , delimited by delimiter and starting at substring number occurrence . An example is:

u_substring_by_delimiter ("abcFdefFghiFjkl", "F", 2, 2)

The string "defFghi" is returned.

If occurrence is < 1, then 1 is assumed. If occurrence does not point to an existing field, the empty string is returned. If numsubstr is not specified or is less than 1, it defaults to 1.

int32 u_index_of_substring ( ustring, sub_ustring, occurrence ) Returns the starting position of the nth occurrence of sub_ustring in ustring. The occurrence argument is an integer indicating the nth occurrence .

If there is no nth occurrence, 0 is returned; if sub_ustring is an empty string, -2 is returned; and if ustring doesn't contain any sub_ustring , -1 is returned.

ustring u_left_substring ( ustring , length) Returns the first length characters of ustring. If length is 0, it returns the empty string. If length is greater than the length of the ustring , the entire ustring is returned.
ustring u_right_substring ( ustring , length) Returns the last length characters of ustring . If length is 0, it returns the empty string. If length is greater than the length of ustring , the entire ustring is returned.
ustring u_string_of_space( count) Returns a ustring containing count spaces. The empty string is returned for a count of 0 or less.
ustring u_single_quote_string ( expression) Returns expression enclosed in single quotes.
ustring u_string_of_substring ( ustring , count) Returns a ustring containing count occurrences of ustring. The empty string is returned for a count of 0 or less.
ustring u_trimc_string ( ustring [, character [, option ]]) If only ustring is specified, all leading and trailing spaces and tabs are removed, and all multiple occurrences of spaces and tabs are reduced to a single space or tab.

If ustring and character are specified, option defaults to 'R' The available option values are:

'A' remove all occurrences of character 'B' remove both leading and trailing occurrences of character. 'D' remove leading, trailing, and redundant white-space characters. 'E' remove trailing white-space characters 'F' remove leading white-space characters 'L' remove all leading occurrences of character 'R' remove all leading, trailing, and redundant occurrences of character 'T' remove all trailing occurrences of character

ustring u_system_time_date() Returns the current system time in this 24-hour format:

hh:mm:ss dd:mmm:yyyy

int32 u_offset_of_substring ( ustring , sub_ustring , position) Searches for the sub_ustring in the ustring beginning at character number position, where position is an uint32. Returns the starting position of the substring.
int8 u_string_case_compare ( ustring , ustring) This is a case-insensitive version of u_string_compare() below.
int8 u_string_compare ( ustring , ustring) Compares two ustrings and returns the index (0 or 1) of the greater string.
int8 u_string_num_case_compare ( ustring , ustring, uint16) This is a case-insensitive version of u_string_num_compare() below.
ustring u_string_num_concatenate ( ustring , ustring, uint16) Returns a ustring after appending uint16 characters from the second ustring onto the first ustring .
int8 u_string_num_compare ( utring , ustring, uint16) Compares first uint16 characters of two given ustrings and returns the index (0 or 1) of the greater ustring .
ustring u_string_num_copy ( ustring , uint16) Returns the first uint16 characters from the given ustring .
int32 u_string_length( ustring ) Returns the length of the ustring .
ustring u_substring ( ustring , starting_position, length) Copies parts of ustrings to shorter strings by string extraction. The starting_position specifies the starting location of the substring; length specifies the substring length.

The arguments starting_position and length are uint16 types and must be positive (>= 0).

ustring u_char_from_num( int32 ) Returns a ustring character value from the given int32 . If given a value that is not associated with a character such as -1, the function returns a space.

An example use is: u_char_from_num(38) which returns "&"

int32 u_num_from_char( ustring) Returns the numeric value of the character in the ustring . When this function is given an empty string, it returns 0; and when it is given a multi-character string, it uses the first character in the string.

An example use is: u_num_from_char("&") which returns 38

Bit manipulation functions

Function Description
string bit_expand ( uint64) Expands the given uint64 to a string containing the binary representation.
ustring u_bit_expand ( uint64) Expands the given uint64 to a ustring containing the binary representation.
uint64 bit_compress( string ) Converts the string binary representation to an uint64 field.
uint64 u_bit_compress( ustring ) Converts the ustring binary representation to an uint64 field.
uint64 set_bit ( uint64 , list_of_bits , bit_state) Turns the uint64 bits that are listed by number in the string list_of_bits on or off, depending on whether the value of the bit_state integer is 1 or 0. bit_state is an optional argument, and has a default value of 1 which turns the list of bits on.

An example use is: set_bit(0, "1,3,5,7") which returns 85.

uint64 u_set_bit ( uint64 , list_of_bits , bit_state) This function is a internationalized version of set_bit() above.

Job monitoring functions

The Job Monitor reports on the current state of a job and supplies information about the operators in your data flow. By default, it continually gathers information about your jobs, but it does not send the information unless you request it from your user-written client.

The information it supplies falls into four categories: job status, metadata, monitor, and summary, and it is given in XML notation. If you do not have job monitoring disabled with the top-level -nomonitor osh option, you can also obtain custom report and summary information about the transform operator using the functions in the table below.

There are six functions: three for the string type and three for the ustring type. The name_string or name_ustring argument can be used to specify a name for the custom information, the description_string or description_ustring argument can be used to describe the type of information, and the value_string or value_ustring argument can be used to give the details of the information.

Function Description
string set_custom_summary_info ( name_string , description_string, value_string)

ustring u_set_custom_summary_info ( name_ustring ,description_ustring, value_ustring)

Call this function in the finish code segment.
string send_custom_report ( name_string , description_string, value_string)

ustring u_send_custom_report ( name_ustring , description_ustring , value_ustring)

Call this function in the initialize code segment.
string set_custom_instance_report ( name_string , description_string, value_string)

ustring u_set_custom_instance_report ( name_ustring , description_ustring, value_ustring)

Call this function in the mainloop code segment.

Miscellaneous functions

The following table describes functions in the Transformation Language that do not fit into any of the above categories.

Function Description
void force_error ( error_message_string ) Terminates the data flow when an error is detected, and prints error_message_string to stderr.
u_force_error ( error_message_ustring ) Terminates the data flow when an error is detected, and prints error_message_ustring to stderr.
string get_environment( string ) Returns the current value of string , a UNIX environment variable. The functionality is the same as the C getenv function.
ustring u_get_environment ( ustring ) Returns the current value of utring , a UNIX environment variable. The functionality is the same as the C getenv function.
int16 get_partition_num() Returns the current partition number.
int16 get_num_of_partitions() Returns the number of partitions.
void print_message ( message_string ) Prints message_string to stdout.
u_print_message ( message_ustring ) Prints message_ustring to stdout.
uint32 size_of( value ) Returns the actual size value when it is stored, not the length.

binary operators with decimal and numeric fields

The Transformation Language supports the placement of binary operators between two decimals and between a decimal and one of the numeric field types.

The binary operators are +, -, *, and /.

The numeric field types are int8, uint8, int16, uint16, int32, uint32, int64, uint64, sfloat, and dfloat.

Generally there are no restrictions on the form of an operand. It can be a constant, a variable, a function call that returns a numeric value, a simple expression such as an addition of two variables, or a complicated expression that consists of function calls as well as arithmetic operations.

By default, the transform operator sets the precision and scale of any temporary internal decimal variable created during arithmetic operations to:

[TRX_DEFAULT_MAX_PRECISION=38,
TRX_DEFAULT_MAX_SCALE=10]
with RoundMode equal to eRoundInf.

You can override the default values using these environment variables:

Fatal errors might occur at runtime if the precision of the destination decimal is smaller than that of the source decimal


PDFThis topic is also in the IBM InfoSphere DataStage and QualityStage Parallel Job Advanced Developer's Guide.

Update timestamp Last updated: 2012-10-8