Before you start
The Slowly Changing Dimension stage was added in the 8.0 release of InfoSphere Information Server DataStage. It is designed specifically to support the types of activities required to populate and maintain records in star schema data models, specifically dimension table data. The Slowly Changing Dimension stage encapsulates all of the dimension maintenance logic — finding existing records, generating surrogate keys, checking for changes, and what action to take when changes occur. In addition, you can associate dimension record surrogate key values with source records, which eliminates the need for additional lookups in later processing.
This tutorial is designed to introduce you to using the Slowly Changing Dimension stage on the Information Server DataStage parallel canvas. The tutorial uses a simplified example scenario that focuses on Slowly Changing Dimension functionality. Actual business scenarios may require different approaches to the job design used in this tutorial's example. The volume of data processed in the tutorial is intentionally small to make it easier to understand the processing that is taking place.
The material in the SCD_Tutorial.zip file in the Download section is built to run on a Windows platform with a DB2 database. You can modify the material to run on a different platform or to use a different database.
In this tutorial, you will learn how to design a job that uses the Slowly Changing Dimension stage to perform updating and loading of dimension and fact tables. After completion, you will be able to configure the SCD stage for history-tracking changes and in-place changes, and use the output of the stage to update an associated fact table.
This tutorial is written for DataStage developers who are familiar with the DataStage Parallel Edition design canvas. You will also benefit if you already have a knowledge of star schema design concepts (including fact and dimension tables), the use of surrogate keys, and the usual methodology for updating dimension tables.
To create the job in this tutorial, you need an Information Server DataStage 8.x installation that is licensed to use the parallel engine. You also need a DataStage Designer client and access to a DataStage project where you can create, import, compile, and run DataStage jobs.
To use the sample scripts in the SCD_Tutorial.zip download, your Information Server must be installed on a Windows® OS with access to a DB2 database. However, you can also modify the scripts to work on other operating systems and with a different database.