transform: syntax and options

The syntax and options of the transform operator.

Terms in italic typeface are option strings you supply. When your option string contains a space or a tab character, you must enclose it in single quotes.


transform
-fileset fileset_description   
-table -key field [ci | cs] 
[-key field [ci | cs] ...] 
[-allow_dups] 
[-save fileset_descriptor] 
[-diskpool pool] 
[-schema schema | -schemafile schema_file] 
[-argvalue job_parameter_name= job_parameter_value  ...][-collation_sequence  locale  | 
collation_file_pathname  |    OFF]
[-expression  expression_string | -expressionfile expressionfile_path ]
[-maxrejectlogs integer]   
[-sort [-input | -output [ port ] -key  field_name  
 sort_key_suboptions  ...]
[-part [-input | -output [port] -key  field_name        part_key_suboptions  ...]
[-flag {compile | run | compileAndRun} [ flag_compilation_options ]] 
[-inputschema  schema | -inputschemafile  schema_file ] 
[-outputschema  schema | -outputschemafile  schema_file ]  
[-reject [-rejectinfo reject_info_column_name_string]]
[-oldnullhandling]
[-abortonnull]

Where:

sort_key_suboptions are:


    [-ci | -cs] [-asc | -desc] [-nulls {first | last}] [-param  params ]

part_key_options are:


    [-ci | -cs] [-param  params ]

flag_compilation_options are:


[-dir  dir_name_for_compilation ] [-name  library_path_name ]
    [-optimize | -debug] [-verbose] [-compiler  cpath ] 
[-staticobj  absolute_path_name ] [-sharedobj         absolute_path_name ]     [-t  options ] 
    [compileopt options] [-linker lpath] [-linkopt  options ]

The -table and -fileset options allow you to use conditional lookups.

Note: The following option values can contain multi-byte Unicode values:

the field names given to the -inputschema and -outputschema options and the ustring values
-inputschemafile and -outputschemafile files
-expression option string and the -expressionfile option filepath
-sort and -part key-field names
-compiler, -linker, and -dir pathnames
-name file name
-staticobj and -sharedobj pathnames

-compileopt and -linkopt pathnames

Option	Use
-abortonnull	-abortonnull Specify this option to have a job stopped when an unhandled null is encountered. You can then locate the field and record that contained the null in the job log. If you specify this option together with the -oldnullhandling option, then any nulls that occur in input fields used in output field derivations that are not explicitly handled by the expression cause the job to stop. If you specify the -abortonnull option without specifying the -oldnullhandling option, then only operations such as attempting to set a non-nullable field to null cause the job to stop.
-argvalue	-argvalue `job_parameter_name` = `job_parameter_value` This option is similar to the -params top-level osh option, but the initialized variables apply to a transform operator rather than to an entire job. The global variable given by `job_parameter_name` is initialized with the value given by `job_parameter_value.` In your osh script, you reference the `job_parameter_value` with [& `job_parameter_name` ] where the `job_parameter_value` component replaces the occurrence of [& `job_parameter_name` ].
-collation_sequence	-collation_sequence `locale` \| `collation_file_pathname` \| OFF This option determines how your string data is sorted. You can: Specify a predefined IBM® ICU locale Write your own collation sequence using ICU syntax, and supply its collation_file_pathname Specify OFF so that string comparisons are made using Unicode code-point value order, independent of any locale or custom sequence. By default, InfoSphere® DataStage® sorts strings using byte-wise comparisons. For more information, reference this IBM ICU site: http://oss.software.ibm.com/icu /userguide/Collate_Intro.htm
-expression	-expression `expression_string` This option lets you specify expressions written in the Transformation Language. The expression string might contain multi-byte Unicode characters. Unless you choose the -flag option with run, you must use either the -expression or -expressionfile option. The -expression and -expressionfile options are mutually exclusive.
-expressionfile	-expressionfile `expression_file` This option lets you specify expressions written in the Transformation Language. The expression must reside in an `expression_file`, which includes the name and path to the file which might include multi-byte Unicode characters. Use an absolute path, or by default the current UNIX directory. Unless you choose the -flag option with run, you must choose either the -expression or -expressionfile option. The -expressionfile and -expression options are mutually exclusive.
-flag	-flag {compile \| run \| compileAndRun} `suboptions` compile: This option indicates that you wish to check the Transformation Language expression for correctness, and compile it. An appropriate version of a C++ compiler must be installed on your computer. Field information used in the expression must be known at compile time; therefore, input and output schema must be specified. run: This option indicates that you wish to use a pre-compiled version of the Transformation Language code. You do not need to specify input and output schemas or an expression because these elements have been supplied at compile time. However, you must add the directory containing the pre-compiled library to your library search path. This is not done by the transform operator.You must also use the -name suboption to provide the name of the library where the pre-compiled code resides. compileAndRun: This option indicates that you wish to compile and run the Transformation Language expression. This is the default value. An appropriate version of a C++ compiler must be installed on your computer. You can supply schema information in the following ways: You can omit all schema specifications. The transform operator then uses the up-stream operator's output schema as its input schema, and the schema for each output data set contains all the fields from the input record plus any new fields you create for a data set. You can omit the input data set schema, but specify schemas for all output data sets or for selected data sets. The transform operator then uses the up-stream operator's output schema as its input schema. Any output schemas specified on the command line are used unchanged, and output data sets without schemas contain all the fields from the input record plus any new fields you create for a data set. You can specify an input schema, but omit all output schemas or omit some output schemas. The transform operator then uses the input schema as specified. Any output schemas specified on the command line are used unchanged, and output data sets without schemas contain all the fields from the input record plus any new fields you create for a data set.
-flag (continued)	The flag option has the following suboptions: -dir `dir_name` lets you specify a compilation directory. By default, compilation occurs in the TMPDIR directory or, if this environment variable does not point to an existing directory, to the /tmp directory. Whether you specify it or not, you must make sure the directory for compilation is in the library search path. -name `file_name` lets you specify the name of the file containing the compiled code. If you use the -dir `dir_name` suboption, this file is in the `dir_name` directory. The following examples show how to use the -dir and -name options in an osh command line: For development: osh "transform -inputschema `schema` -outputschema `schema` -expression expression -flag compile - `dir` `dir_name` -name `file_name` " For your production machine: osh "... \| transform -flag run -name file_name \| ..." The library file must be copied to the production machine. -flag compile and -flag compileAndRun have these additional suboptions: -optimize specifies the optimize mode for compilation. -debug specifies the debug mode for compilation. -verbose causes verbose messages to be output during compilation. -compiler `cpath` lets you specify the compiler path when the compiler is not in the default directory. The default compiler path for each operating system is: Solaris: /opt/SUNPRO6/SUNWspro/bin/CC AIX®: /usr/vacpp/bin/xlC_r Tru64: /bin/cxx HP-UX: /opt/aCC/bin/aCC -staticobj `absolute_path_name` -sharedobj `absolute_path_name` These two suboptions specify the location of your static and dynamic-linking C-object libraries. The file suffix can be omitted. See External global C-function support for details. -compileopt `options` lets you specify additional compiler options. These options are compiler-dependent. Pathnames might contain multi-byte Unicode characters. -linker `lpath` lets you specify the linker path when the linker is not in the default directory. The default linker path of each operating system is the same as the default compiler path listed above. -linkopt options lets you specify link options to the compiler. Pathnames might contain multi-byte Unicode characters.
-inputschema	-inputschema `schema` Use this option to specify an input schema. The schema might contain multi-byte Unicode characters. An error occurs if an expression refers to an input field not in the input schema. The -inputschema and the -inputschemafile options are mutually exclusive. The -inputschema option is not required when you specify compileAndRun or run for the -flag option; however, when you specify compile for the -flag option, you must include either the -inputschema or the -inputschemafile option. See the -flag option description in this table for information on the -compile suboption.
-inputschemafile	-inputschemafile `schema_file` Use this option to specify an input schema. An error occurs if an expression refers to an input field not in the input schema. To use this option, the input schema must reside in a schema_file, where schema_file is the name and path to the file which might contain multi-byte Unicode characters. You can use an absolute path, or by default the current UNIX directory. The -inputschemafile and the -inputschema options are mutually exclusive. The -inputschemafile option is not required when you specify compileAndRun or run for the -flag option; however, when you specify compile for the -flag option, you must include either the -inputschema or the -inputschemafile option. See the -flag option description in this table for information on the -compile suboption.
-maxrejectlogs	-maxrejectlogs `integer` An information log is generated every time a record is written to the reject output data set. Use this option to specify the maximum number of output reject logs the transform option generates. The default is 50. When you specify -1 to this option, an unlimited number of information logs are generated.
-oldnullhandling	-oldnullhandling Use this option to reinstate old-style null handling. This setting means that, when you use an input field in the derivation expression of an output field, you have to explicitly handle any nulls that occur in the input data. If you do not specify such handling, a null causes the record to be dropped or rejected. If you do not specify the -oldnullhandling option, then a null in the input field used in the derivation causes a null to be output.
-outputschema	-outputschema `schema` Use this option to specify an output schema. An error occurs if an expression refers to an output field not in the output schema. The -outputschema and -outputschemafile options are mutually exclusive. The -outputschema option is not required when you specify compileAndRun or run for the -flag option; however, when you specify compile for the -flag option, you must include either the -outputschema or the -outputschemafile option. See the -flag option description in this table for information on the -compile suboption. For multiple output data sets, repeat the -outputschema or -outputschemafile option to specify the schema for all output data sets.
-outputschemafile	-outputschemafile `schema_file` Use this option to specify an output schema. An error occurs if an expression refers to an output field not in the output schema. To use this option, the output schema must reside in a `schema_file` which includes the name and path to the file. You can use an absolute path, or by default the current UNIX directory. The -outputschemafile and the -outputschema options are mutually exclusive. The -outputschemafile option is not required when you specify compileAndRun or run for the -flag option; however, when you specify compile for the -flag option, you must include either the -outputschema or the -outputschemafile option. See the -flag option description in this table for information on the -compile suboption. For multiple output data sets, repeat the -outputschema or -outputschemafile option to specify the schema for all output data sets.
-part	-part {-input \| -output[ `port` ]} -key `field_name` [-ci \| -cs] [-param `params` ] You can use this option 0 or more times. It indicates that the data is hash partitioned. The required field_name is the name of a partitioning key. Exactly one of the suboptions -input and -output[ `port` ] must be present. These suboptions determine whether partitioning occurs on the input data or the output data. The default for port is 0. If port is specified, it must be an integer which represents an output data set where the data is partitioned. The suboptions to the -key option are -ci for case-insensitive partitioning, or -cs for a case-sensitive partitioning. The default is case-sensitive. The -params suboption is to specify any `property`=`value` pairs. Separate the pairs by commas (,).
-reject	-reject [-rejectinfo `reject_info_column_name_string`] This is optional. You can use it only once. When a null field is used in an expression, this option specifies that the input record containing the field is not dropped, but is sent to the output reject data set. The -rejectinfo suboption specifies the column name for the reject information.
-sort	-sort {-input \| -output [ `port` ]} -key `field_name` [-ci \| -cs] [-asc \| -desc] [-nulls {first \| last}] [-param `params` ] You can use this option 0 or more times. It indicates that the data is sorted for each partition. The required field_name is the name of a sorting key. Exactly one of the suboptions -input and -output[ `port` ] must be present. These suboptions determine whether sorting occurs on the input data or the output data. The default for `port` is 0. If `port` is specified, it must be an integer that represents the output data set where the data is sorted. You can specify -ci for a case-insensitive sort, or -cs for a case-sensitive sort. The default is case-sensitive. You can specify -asc for an ascending order sort or -desc for a descending order sort. The default is ascending. You can specify -nulls {first \| last} to determine where null values should sort. The default is that nulls sort first. You can use -param `params` to specify any `property` = `value` pairs. Separate the pairs by commas (,).
-table	-table -key field [ci \| cs] [-key `field` [ci \| cs] ...] [-allow_dups] [-save `fileset_descriptor`] [-diskpool `pool`] [-schema `schema` \| -schemafile `schema_file`] Specifies the beginning of a list of key fields and other specifications for a lookup table. The first occurrence of -table marks the beginning of the key field list for lookup table1; the next occurrence of -table marks the beginning of the key fields for lookup table2, and so on For example: lookup -table -key field -table -key field The -key option specifies the name of a lookup key field. The -key option must be repeated if there are multiple key fields. You must specify at least one key for each table. You cannot use a vector, subrecord, or tagged aggregate field as a lookup key. The -ci suboption specifies that the string comparison of lookup key values is to be case insensitive; the -cs option specifies case-sensitive comparison, which is the default. In create-only mode, the -allow_dups option causes the operator to save multiple copies of duplicate records in the lookup table without issuing a warning. Two lookup records are duplicates when all lookup key fields have the same value in the two records. If you do not specify this option, InfoSphere DataStage issues a warning message when it encounters duplicate records and discards all but the first of the matching records. In normal lookup mode, only one lookup table (specified by either -table or -fileset) can have been created with -allow_dups set. The -save option lets you specify the name of a fileset to write this lookup table to; if -save is omitted, tables are written as scratch files and deleted at the end of the lookup. In create-only mode, -save is, of course, required. The -diskpool option lets you specify a disk pool in which to create lookup tables. By default, the operator looks first for a "lookup" disk pool, then uses the default pool (""). Use this option to specify a different disk pool to use. The -schema suboption specifies the schema that interprets the contents of the string or raw fields by converting them to another data type. The -schemafile suboption specifies the name of a file containing the schema that interprets the content of the string or raw fields by converting them to another data type. You must specify either -schema or -schemafile. One of them is required if the -compile option is set, but are not required for -compileAndRun or -run.
-fileset	[-fileset `fileset_descriptor` ...] Specify the name of a fileset containing one or more lookup tables to be matched. In lookup mode, you must specify either the -fileset option, or a table specification, or both, in order to designate the lookup table(s) to be matched against. There can be zero or more occurrences of the -fileset option. It cannot be specified in create-only mode. Warning: The fileset already contains key specifications. When you follow -fileset `fileset_descriptor` by `key_specifications` , the keys specified do not apply to the fileset; rather, they apply to the first lookup table. For example, lookup -fileset `file` -key `field`, is the same as: lookup -fileset `file1` -table -key `field`