Defining wrapped stages

You define a Wrapped stage to enable you to specify a UNIX command to be executed by an IBM® InfoSphere® DataStage® stage.

About this task

You define a wrapper file that handles arguments for the UNIX command and inputs and outputs. The Designer provides an interface that helps you define the wrapper. The stage will be available to all jobs in the project in which the stage was defined. You can make it available to other projects using the Designer Export facilities. You can add the stage to your job palette using palette customization features in the Designer.

When defining a Wrapped stage you provide the following information:

The UNIX command that you wrap can be a built-in command, such as grep, a utility, such as SyncSort, or your own UNIX application. The only limitation is that the command must be `pipe-safe' (to be pipe-safe a UNIX command reads its input sequentially, from beginning to end).

You need to define metadata for the data being input to and output from the stage. You also need to define the way in which the data will be input or output. UNIX commands can take their inputs from standard in, or another stream, a file, or from the output of another command via a pipe. Similarly data is output to standard out, or another stream, to a file, or to a pipe to be input to another command. You specify what the command expects.

InfoSphere DataStage handles data being input to the Wrapped stage and will present it in the specified form. If you specify a command that expects input on standard in, or another stream, InfoSphere DataStage will present the input data from the jobs data flow as if it was on standard in. Similarly it will intercept data output on standard out, or another stream, and integrate it into the job's data flow.

You also specify the environment in which the UNIX command will be executed when you define the wrapped stage.

Procedure

  1. Do one of:
    1. Choose File > New from the Designer menu. The New dialog box appears.
    2. Open the Other folder and select the Parallel Stage Type icon.
    3. Click OK. The Parallel Routine dialog box appears, with the General page on top.

      Or:

    4. Select a folder in the repository tree.
  2. Choose New > Other > Parallel Stage > Wrapped from the shortcut menu. The Stage Type dialog box appears, with the General page on top.
  3. Fill in the fields on the General page as follows:
    • Stage type name. This is the name that the stage will be known by to InfoSphere DataStage. Avoid using the same name as existing stages or the name of the actual UNIX command you are wrapping.
    • Category. The category that the new stage will be stored in under the stage types branch. Type in or browse for an existing category or type in the name of a new one. The category also determines what group in the palette the stage will be added to. Choose an existing category to add to an existing group, or specify a new category to create a new palette group.
    • Parallel Stage type. This indicates the type of new Parallel job stage you are defining (Custom, Build, or Wrapped). You cannot change this setting.
    • Wrapper Name. The name of the wrapper file InfoSphere DataStage will generate to call the command. By default this will take the same name as the Stage type name.
    • Execution mode. Choose the default execution mode. This is the mode that will appear in the Advanced tab on the stage editor. You can override this mode for individual instances of the stage as required, unless you select Parallel only or Sequential only.
    • Preserve Partitioning. This shows the default setting of the Preserve Partitioning flag, which you cannot change in a Wrapped stage. This is the setting that will appear in the Advanced tab on the stage editor. You can override this setting for individual instances of the stage Advanced Tabas required.
    • Partitioning. This shows the default partitioning method, which you cannot change in a Wrapped stage. This is the method that will appear in the Inputs Page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required. See in InfoSphere DataStage Parallel Job Developer Guide for a description of the partitioning methods.
    • Collecting. This shows the default collection method, which you cannot change in a Wrapped stage. This is the method that will appear in the Inputs Page Partitioning tab of the stage editor. You can override this method for individual instances of the stage as required.
    • Command. The name of the UNIX command to be wrapped, plus any required arguments. The arguments that you enter here are ones that do not change with different invocations of the command. Arguments that need to be specified when the Wrapped stage is included in a job are defined as properties for the stage.
    • Short Description. Optionally enter a short description of the stage.
    • Long Description. Optionally enter a long description of the stage.
  4. Go to the Creator page and optionally specify information about the stage you are creating. We recommend that you assign a release number to the stage so you can keep track of any subsequent changes.

    You can specify that the actual stage will use a custom GUI by entering the ProgID for a custom GUI in the Custom GUI Prog ID field.

    You can also specify that the stage has its own icon. You need to supply a 16 x 16 bit bitmap and a 32 x 32 bit bitmap to be displayed in various places in the InfoSphere DataStage user interface. Click the 16 x 16 Bitmap button and browse for the smaller bitmap file. Click the 32 x 32 Bitmap button and browse for the large bitmap file. Note that bitmaps with 32-bit color are not supported. Click the Reset Bitmap Info button to revert to using the default InfoSphere DataStage icon for this stage.

  5. Go to the Properties page. This allows you to specify the arguments that the UNIX command requires as properties that appear in the stage Properties tab. For wrapped stages the Properties tab always appears under the Stage page.

    Fill in the fields as follows:

    • Property name. The name of the property that will be displayed on the Properties tab of the stage editor.
    • Data type. The data type of the property. Choose from:

      Boolean

      Float

      Integer

      String

      Pathname

      List

      Input Column

      Output Column

      If you choose Input Column or Output Column, when the stage is included in a job a list will offer a choice of the defined input or output columns.

      If you choose list you should open the Extended Properties dialog box from the grid shortcut menu to specify what appears in the list.

    • Prompt. The name of the property that will be displayed on the Properties tab of the stage editor.
    • Default Value. The value the option will take if no other is specified.
    • Required. Set this to True if the property is mandatory.
    • Repeats. Set this true if the property repeats (that is you can have multiple instances of it).
    • Conversion. Specifies the type of property as follows:

      -Name. The name of the property will be passed to the command as the argument value. This will normally be a hidden property, that is, not visible in the stage editor.

      -Name Value. The name of the property will be passed to the command as the argument name, and any value specified in the stage editor is passed as the value.

      -Value. The value for the property specified in the stage editor is passed to the command as the argument name. Typically used to group operator options that are mutually exclusive.

      Value only. The value for the property specified in the stage editor is passed as it is.

  6. If you want to specify a list property, or otherwise control how properties are handled by your stage, choose Extended Properties from the Properties grid shortcut menu to open the Extended Properties dialog box.

    The settings you use depend on the type of property you are specifying:

    • Specify a category to have the property appear under this category in the stage editor. By default all properties appear in the Options category.
    • If you are specifying a List category, specify the possible values for list members in the List Value field.
    • If the property is to be a dependent of another property, select the parent property in the Parents field.
    • Specify an expression in the Template field to have the actual value of the property generated at compile time. It is usually based on values in other properties and columns.
    • Specify an expression in the Conditions field to indicate that the property is only valid if the conditions are met. The specification of this property is a bar '|' separated list of conditions that are AND'ed together. For example, if the specification was a=b|c!=d, then this property would only be valid (and therefore only available in the GUI) when property a is equal to b, and property c is not equal to d.

      Click OK when you are happy with the extended properties.

  7. Go to the Wrapped page. This allows you to specify information about the command to be executed by the stage and how it will be handled.

    The Interfaces tab is used to describe the inputs to and outputs from the stage, specifying the interfaces that the stage will need to function.

    Details about inputs to the stage are defined on the Inputs sub-tab:

    • Link. The link number, this is assigned for you and is read-only. When you actually use your stage, links will be assigned in the order in which you add them. In this example, the first link will be taken as link 0, the second as link 1 and so on. You can reassign the links using the stage editor's Link Ordering tab on the General page.
    • Table Name. The metadata for the link. You define this by loading a table definition from the Repository. Type in the name, or browse for a table definition. Alternatively, you can specify an argument to the UNIX command which specifies a table definition. In this case, when the wrapped stage is used in a job design, the designer will be prompted for an actual table definition to use.
    • Stream. Here you can specify whether the UNIX command expects its input on standard in, or another stream, or whether it expects it in a file. Click on the browse button to open the Wrapped Stream dialog box.

      In the case of a file, you should also specify whether the file to be read is given in a command line argument, or by an environment variable.

      Details about outputs from the stage are defined on the Outputs sub-tab:

    • Link. The link number, this is assigned for you and is read-only. When you actually use your stage, links will be assigned in the order in which you add them. In this example, the first link will be taken as link 0, the second as link 1 and so on. You can reassign the links using the stage editor's Link Ordering tab on the General page.
    • Table Name. The metadata for the link. You define this by loading a table definition from the Repository. Type in the name, or browse for a table definition.
    • Stream. Here you can specify whether the UNIX command will write its output to standard out, or another stream, or whether it outputs to a file. Click on the browse button to open the Wrapped Stream dialog box.

      In the case of a file, you should also specify whether the file to be written is specified in a command line argument, or by an environment variable.

      The Environment tab gives information about the environment in which the command will execute.

      Set the following on the Environment tab:

    • All Exit Codes Successful. By default InfoSphere DataStage treats an exit code of 0 as successful and all others as errors. Select this check box to specify that all exit codes should be treated as successful other than those specified in the Failure codes grid.
    • Exit Codes. The use of this depends on the setting of the All Exits Codes Successful check box.

      If All Exits Codes Successful is not selected, enter the codes in the Success Codes grid which will be taken as indicating successful completion. All others will be taken as indicating failure.

      If All Exits Codes Successful is selected, enter the exit codes in the Failure Code grid which will be taken as indicating failure. All others will be taken as indicating success.

    • Environment. Specify environment variables and settings that the UNIX command requires in order to run.
  8. When you have filled in the details in all the pages, click Generate to generate the stage.