Configuring the Unstructured Data stage as a source

When you create an Unstructured Data stage job, you must configure the Unstructured Data stage so that it extracts the data and generates the output in the data type that the user requires.

Procedure

  1. On the parallel canvas, double-click the Unstructured Data stage.
  2. From the Document Type list, select Excel.
  3. Click Configure to configure additional properties, and define mapping between Microsoft Excel items and DataStage columns.
  4. Specify the file name details in the Data source pane:
    1. Specify the name of the file from which you want to read the data, in theFile name field.
      Job compilation fails if this field is empty. If the file is password protected, specify password in the Password field.
    2. Optional: If you specify wildcard characters in the file name, select Use template file for design time and specify a template file name.
      Template file is used for subsequent configuration steps, and not used at runtime. Specify a value for Template password if the specified template file is password protected.
    3. Optional: Click View to launch the external Microsoft Excel viewer program. You can confirm the content of Microsoft Excel file you are working with.
    4. Click Load.
  5. Specify the data range details to read from the Microsoft Excel file in the Read options pane.
    1. Optional: Specify a value for Range option. If you select Specify the start row, you only need to specify the first row. Unstructured Data stage finds the last row at runtime. If you select Specify the entire data range, you need to specify both start row and end row.
    2. Optional: Specify Range expression. Range expression is a required property at runtime, but it can be empty when clicking Load button. Unstructured Data stage searches the entire document and lists the candidates of data range in the Template data range list box in the Import pane. Range expression property is set with the appropriate value when you click Import in the Import pane.
    3. Optional: If you want to skip any sheet names from range expression, then specify the name in the Specify the Sheet names to skip field. Use this field when the sheet names are omitted from the range expression.
    4. Optional: Specify First row of data ranges. At design time, if you select None, Microsoft Excel column names are expressed in the format: "Column#column number(ColumnExcel column label)" in the Map pane. If you select First row is header , then the first row value is displayed in the Map pane. At runtime if you select None, the first row is extracted. If you select First row is header, the first row is skipped.
  6. Specify data range details to import in the Import pane.
    1. Select one data range from Template data range.
    2. Optional: If you want to extract additional Microsoft Excel items such as document properties, select Property tab and select items to be extracted.
    3. Click Import.
  7. Define the mapping between InfoSphere DataStage columns and Microsoft Excel items in the Map pane.
    1. Define the mapping between InfoSphere DataStage columns and imported Microsoft Excel items.
      You can add InfoSphere DataStage column mappings or change the column order by clicking Up, Down, Insert, orDelete buttons. In InfoSphere DataStage Column, specify the InfoSphere DataStage column name for each Microsoft Excel item. In Microsoft Excel Item, you can select the item you want to map to the InfoSphere DataStage column. All items that can be selected in the Import pane are listed in each cell. In Import Option, you can select the Microsoft Excel item if there is any import options available. For example, If you select Microsoft Excel column in Excel Item, Value, Comment, Author of Comment, Formula, Hyperlink Type, and Hyperlink Address options are available.
    2. Click OK.
  8. Specify required details in the Properties tab and the Advanced tab.
  9. Click OK to save the settings that you specified. to save the settings that you specified.