< Previous | Next >

Module 2: Designing data flows

In this module, you will design three data flows to populate the tables in your data warehousing project. Data flows define the SQL-based data movement and transformation activities that run in a DB2® database.

You can design a data flow to create SQL-based data movement and transformation by placing operators in a canvas, defining their properties, and connecting their ports. Data flows extract data from data sources (for example, from flat files or relational tables), transform the data by using operators, and then export it into data targets (for example, a data warehouse, a file, or a staging table).

The Design Studio provides an intuitive way to visualize and design data flows. Database metadata, which is imported to or created in the Data Project Explorer, describes the schemas of database objects. Graphical operators model the various steps of a data flow activity. By arranging these source, transform, and target operators in a canvas work area, connecting them, and defining their properties, you can create models that meet your business requirements.

In this module, you first create a variable that you will use in your flows, and then you create three data flows: populate_branch_location, populate_time_dimension, and populate_order_fact. These flows are used to extract data from the tables in the GOSALES schema and populate the following tables: MARTS.BRANCH_LOCATION, MARTS.TIME_DIMENSION, and MARTS.ORDER_FACT. After you create each data flow, you generate its SQL code. SQL code defines the operations that are performed by the SQL execution database when you run a data flow. Finally, you will verify the logic in your flows by running the populate_order_fact data flow in debug mode.

Learning objectives

After completing the lessons in this module, you will understand the concepts and know how to do the following tasks:

Time required

This module should take approximately 1.5 hours to complete.
< Previous | Next >



Feedback | Information roadmap