< Previous | Next >

Lesson 3.4: Completing multiple transformations in a single job

In this lesson, you further transform your data to apply business rules, then complete a lookup that uses an additional reference table.

Overview

When you completed Lesson 3.3, you viewed the output data to review your results. You realized that one of the columns in the GlobalCo_BillTo.ds data set contains extraneous data, and that the extraneous data is now in your output data set. Some of the values in the SETUP_DATE column contain a special handling code in addition to the date that the account was set up. These values must be separated into the appropriate columns before the data is combined with the WorldCo customer data.

To separate the values, you add transformation logic to extract the special handling code from the SETUP_DATE column into a separate column. You then add a lookup operation to look up the text description that corresponds to the code from the lookup table that you populated in Lesson 2.3. You write this description to the output data, and also add a row count.

Tasks in this lesson

  1. Adding transformation and lookup operations
  2. Adding transformation logic to your job
  3. Configuring the Lookup operation

Adding transformation and lookup operations

To complete multiple transformations in your job, add another Transformer stage and additional Lookup stages.

  1. Open the Dev_CustData_CleansePrepare job.
  2. Delete the GlobalCo_CountryLookup_DS stage. You create a new data set for the output.
  3. Add a Transformer stage to the right of the GlobalCo_Country_LKP stage.
    1. Rename the stage as SpecHandling_Extract_TFM.
    2. Drag the lnk_CountryCode_Out link to the SpecHandling_Extract_TFM stage.

      Because logic remains with the link where you defined it, the columns that you defined flow from the first lookup to the new Transformer stage.

  4. Add a Lookup stage to the right of the SpecHandling_Extract_TFM stage.
    1. Rename the stage as SpecHandling_Code_LKP. This stage looks up the special handling code to find the corresponding text description.
    2. Link the SpecHandling_Extract_TFM stage to the SpecHandling_Code_LKP stage.
    3. Rename the link as lnk_SpecHandlingCode_Out.
  5. Add a Lookup File Set stage above the SpecHandling_Code_LKP stage.
    1. Rename the stage as SpecHandling_Desc_FS.
    2. Link the SpecHandling_Desc_FS stage to the SpecHandling_Code_LKP stage.
    3. Rename the link as lnk_SpecHandlingDesc_Ref.
  6. Add a Data Set stage under the SpecHandling_Code_LKP stage.
    1. Rename the stage as GlobalCo_CustData_DS.
    2. Link the GlobalCo_CustData_DS stage to the SpecHandling_Code_LKP stage.
    3. Rename the link as lnk_CustData_Out.
    Your job looks like the following job:
    Figure 1. Your job includes a second Transformer stage to further cleanse the GlobalCo customer data.
    The graphic shows the Dev_CustData_CleansePrepare job with the Transformer stage, Lookup stage, and Lookup File Set stage that you added

Adding transformation logic to your job

Add transformation logic to extract the special handling code, then add a row count to the output data.
  1. Open the SpecHandling_Extract_TFM stage.
  2. In the left pane of the Transformer Stage window, select the following columns in the lnk_CountryCode_Out link and drag them to the lnk_SpecHandlingCode_Out link in the right pane.

    CUSTOMER_NUMBER

    CUST_NAME

    ADDR_1

    ADDR_2

    CITY

    REGION_CODE

    ZIP

    TEL_NUM

  3. In the lower right pane, add the following columns for the lnk_SpecHandlingCode_Out link. Use the default values for any fields that are not specified in the following table.
    Important: The columns that you add are highlighted in red in the lnk_SpecHandlingCode_Out link because they do not have valid derivations. You add the derivations to specify how each column is processed by the Transformer stage.
    Column name SQL Type Length Nullable Description
    SOURCE Char 10 No Specifies the company that the customer data comes from
    RECNUM Char 10 No Identifies the number of rows for each record
    SETUP_DATE Char 10 Yes Checks whether the SETUP_DATE column contains a date
    SPECIAL_HANDLING_CODE Integer 10 Yes Specifies whether the SETUP_DATE column contains a special handling code
  4. In the lnk_SpecHandlingCode_Out link container, double-click the Derivation field for the SOURCE column.
    1. In the expression editor, type 'GlobalCo':.
    2. Move your cursor to the right of the text that you entered, right-click, and click Input Column.
    3. Select the COUNTRY column, then press Enter to submit the derivation.

      This derivation adds the prefix "GlobalCo" to the two-letter country code. For example, when you run your job, a customer record with the country code US becomes GlobalCoUS. The following code shows the derivation for the SOURCE column:

      'GlobalCo': lnk_CountryCode_Out.COUNTRY
  5. Double-click the Derivation field for the REC_NUM column.
    1. In the expression editor, type 'GC':.
    2. Move your cursor to the right of the text that you entered, right-click, and click System Variable.
    3. Select @OUTROWNUM, then press the Enter key to submit the derivation.

      This derivation adds row numbers to your output so that the formatting is easier to read. Also, other developers can refer to customer records by their row number rather than randomly searching through the output. The following code illustrates the derivation for the REC_NUM column:

      'GC': @OUTROWNUM
  6. In the Transformer Stage editor toolbar, click the Stage Properties icon (The Stage Properties icon).
    1. In the Transformer Stage Properties window, click the Stage Variables tab.
    2. Add stage variables. Use the default values for any fields that are not specified in the following table.
      Name SQL type Precision
      SpecHandlingExtract Char 1
      TrimSetupDate VarChar 10
    3. Click OK to close the Transformer Stage Properties window.

      The stage variables that you created are listed in the Stage Variables container.

  7. In the Stage Variables container, double-click the Derivation field for the SpecHandlingExtract variable.
    1. Enter the following expression, then press Enter. You can right-click in the expression editor to add columns like you did in step 4.
      IF LEN(lnk_CountryCode_Out.SETUP_DATE) < 2 THEN lnk_CountryCode_Out.SETUP_DATE
      ELSE FIELD(lnk_CountryCode_Out.SETUP_DATE, ' ',2)

      This expression specifies that, if the number of characters in the SETUP_DATE column is less than two, then use the value from the SETUP_DATE column in the SPEC_HANDLING_CODE column. Otherwise, the value of the SETUP_DATE column is returned, and the two-digit character is extracted from that column and moved to the SPEC_HANDLING_CODE column.

    2. Drag the derivation for the SpecHandlingExtract variable to the Derivation field for the SPEC_HANDLING_CODE column in the lnk_SpecHandlingCode_Out link container.

      A line is shown between the stage variable and the column, and the name SpecHandlingExtract is listed in the Derivation field. For each row that is processed, the value of the SpecHandlingExtract variable is written to the SPEC_HANDLING_CODE column.

  8. In the Stage Variables container, double-click the Derivation field for the TrimSetupDate variable.
    1. Type the following expression, then press Enter. You can right-click in the expression editor to add columns like you did in step 4 and step 7.
      IF LEN(lnk_CountryCode_Out.SETUP_DATE) < 3 THEN '01/01/0001'
      ELSE FIELD(lnk_CountryCode_Out.SETUP_DATE, ' ',1)

      This expression says that, if the SETUP_DATE column does not contain a date, then set the value of the TrimSetupDate variable to '01/01/0001'. Otherwise, the value of the SETUP_DATE column is used in the TrimSetupDate variable.

    2. Drag the derivation for the TrimSetupDate variable to the Derivation field for the SETUP_DATE column in the lnk_SpecHandlingCode_Out link container.

      A line is shown between the stage variable and the column, and the name TrimSetupDate is listed in the Derivation field. For each row that is processed, the value of the TrimSetupDate variable is written to the SETUP_DATE column.

  9. Click OK to close the Transformer Stage window, and then save your job.
Your stage editor looks like the following stage editor:
Figure 2. You map the columns from the input link to the output link. The stage variables that you add provide additional transformation logic to fix inconsistencies and errors in the GlobalCo customer data.
The graphic shows the column mapping and derivations for the SpecHandling_Extract_TFM stage

Configuring the Lookup operation

In this exercise, you configure the stages that are required to look up the special handling code and write it to an output data set.
  1. Open the SpecHandling_Desc_FS stage.
    1. Set the Lookup File Set property to reference the SpecHandlingCode_In_Source parameter.
    2. Click the Columns tab and load the SpecialHandling.csv table parameter.
    3. Click OK to close the stage editor.
  2. Open the SpecHandling_Code_LKP stage.
    1. Select all of the columns from the lnk_SpecHandlingCode_Out link container and drag them to the lnk_CustData_Out link container.
    2. Select the DESCRIPTION column from the lnk_SpecHandlingDesc_Ref link container and drag it to the lnk_CustData_Out link container.
    3. In the lower right pane, type a description for the DESCRIPTION column. For example, Description for the special handling code.
    4. In the lnk_SpecHandlingDesc_Ref link container, double-click the Condition bar.
    5. Select Continue in the Lookup Failure field, then click OK.

      When you inspected the output data at the end of Lesson 3.3, you noticed that few rows in the sample data contain a special handling code. Therefore, you want the job to continue processing data if the rows that do not contain a code are rejected.

      Your stage editor looks like the following stage editor:

      Figure 3. You map the columns from the input link to the output link. The SPECIAL_HANDLING_CODE column is used as a key in the Lookup stage to reference customer data.
      The graphic shows the column mapping for the SpecHandling_Code_LKP stage
    6. Click OK to close the SpecHandling_Code_LKP stage editor.
  3. Open the GlobalCo_CustData_DS stage.
    1. In the File field, click the browse icon, then click Insert job parameter > New.
    2. In the Job Properties window, enter the following values for the job parameter.
      Parameter name Prompt Type Default Value Help Text
      CustData_Output_dataset Target file for the customer data output Pathname

      tutorial_folder\

      GlobalCo_Target

      _Output.ds

      Target file that contains the cleansed GlobalCo customer data
      tutorial_folder is the folder where you want to save the files for the tutorial, such as C:\IBM\InformationServer\Server\Projects\Tutorial\Tutorial_files.
    3. Click OK to close the Job Properties window.
    4. In the File property, specify the CustData_Output_dataset parameter, then press the Enter key.
    5. Click OK to close the GlobalCo_CustData_DS stage editor.
  4. Save, compile, and run the SpecHandling_Extract_TFM job.
  5. After the job runs successfully, right-click the GlobalCo_CustData_DS stage and click View lnk_CustData_Out data.
  6. In the Resolve Job Parameters window, click OK.

    The Data Browser window opens, displaying the cleansed customer data.

In the SOURCE column, the two-digit country code is prefaced by GlobalCo. Including this information is important so that developers can distinguish between the GlobalCo customer data and the WorldCo customer data when the cleansed data is merged.

The SPECIAL_HANDLING_CODE column contains a special handling code for each customer record. The corresponding description for each code is listed in the DESCRIPTION column. The shipping department at GlobalCo Worldwide can use this information to determine when each customer receives their delivery.

What's next

In this lesson, you applied stricter data typing to further transform and cleanse the GlobalCo billing data. To this point, you wrote your output data to data sets. In the following module, you write the output data to a relational database so that other developers in the GlobalCo Worldwide organization can easily access the data.

You completed the following tasks:
  • Defined stage variables in a Transformer stage
  • Filtered column metadata in a Transformer stage
  • Used system variables to generate output column values
  • Configured Lookup stages to enhance your output data
  • Viewed the job log to better understand the components that your job is composed of
< Previous | Next >