Validating, testing, and running data flows

When you finish designing your data flow, you can validate the data flow, test the data flow by generating the SQL code, and run the data flow from the Design Studio. You can also run the data flow in the background while you continue to work in the Design Studio.
Important: Running multiple flows in the background concurrently, or starting to run the same flow before the prior run is finished, might result in resource contention problems that could produce unexpected results. These problems could involve files, database tables, created temporary tables, declared temporary tables, and other resources that are common to the flows.
To validate, test, and run a data flow:
  1. Open the data flow and ensure that it is selected in the editor. The name is highlighted when the data flow is selected. If the data flow is not selected, click in the canvas to select it.
  2. Select Data Flow > Validate.
  3. If the validation fails, correct the data flow:
    1. Review the errors that you see in the data flow canvas. To see the error text, hover over the exclamation mark or red x that appears in the upper-left corner of the operator.
    2. Review additional information by clicking the red x. If additional information exists, the Diagnostic Dialog window opens. Use the message number as a reference if you require technical support. Review the error message and explanation. You can also open the Problems view to see a list of errors. For more information about an error in this list, right-click it and select Show Diagnostic Information. Correct the errors in the flow and run the validation again.
    Diagnostic dialog example
  4. Optional: To test the data flow by generating and reviewing the SQL code, select Data Flow > Generate Code.
  5. To update the database with the transformed data from the data flow:
    1. Select Data Flow > Execute. The Flow Execution window opens.
    2. On the General page of the run profile:
      • Complete the pages of the wizard.
      • Specify the requirements for running the data flow by selecting either the default run profile or one that has been created for running the flow.
    3. On the Diagnostics page, specify the temporary and log file locations and specify the trace information.
      Tip: To save the temporary files and the log file in unique locations, change the locations each time you run the flow. Select the Write Statistics check box to log statistics such as the row count for table inserts and the time when each node in the EPG graph is entered and exited.
    4. By default, the Resources page shows all of the resource profiles that are referenced by the process that are not variables. To select from the resource profiles that are variables, click Show All. You can also edit or remove a selected resource profile.
      Tip: You can browse the Variables page to review the resource type variables.
    5. On the Variables page, review the variables that are used when the data flow runs. You can edit the current values for these variables before you run the data flow.
  6. Run the data flow by clicking Execute on the Flow Execution window. The Executing flow window opens.
    • To work in the Design Studio when the data flow is running, click Run in Background. This minimizes the window to the lower right corner of the Design Studio and makes the canvas available. Clicking the icon next to the minimized window opens a default progress view, which closes when the flow stops running.
    • To check the state of the running flow, open the Execution Status view and click the refresh icon. The initial state is Starting. The process moves to the Started state, and then to the Running state, and finally to the Completed state. When the process is no longer running, it is in one of two states: Failed to start or Completed.
      Note: If you click Cancel in the Execution Status window to stop the execution of a running data flow, any remaining code units of the data flow that have not yet started running do not run. Therefore, you might need to manually clean up any temporary tables or views that were created by the unfinished data flow run before you run that data flow again.
  7. When the data flow is finished running, review the following information:
    • Review the Execution Result window to see a log of the data flow run. Use this information to debug problems if the run fails. You can save the results to a text file.
    • Review the Execution Status view. This view shows a table of information for all of the processes that you ran. This information is available until you close the Design Studio. You can right-click a row in the table, which represents one run process, and delete the process from the table or you can view the entire log file for the process. You can delete a process when it is in the Failed to start, Cancelled, or Completed state. You can also review a log of the process activities. To check the process activity when an EPG is running, review the Progress column. If the Progress column is not visible in the Execution Status view, go to Window > Preferences > Data Warehousing > Execution Status and select the Progress check box.
    • Review the Tail Log page for a selected run to see the last few hundred lines of the log file.


Feedback | Information roadmap