Advanced topics for DWE users Part 2: Best practices for choosing DB2 Data Warehouse Edition SQL Warehousing variable phases

IBM® DB2® Data Warehouse Edition SQL Warehousing Tool provides variables and variable groups to enable you to customize your data transformations and work flows. Optimize your DB2 Data Warehouse Edition flow design with a good understanding of variable usage in order to promote user satisfaction, support flow reuse, and help reduce administration overhead. In this article, find recommendations and best practices on how to best utilize different variable phases to maximize the benefits of variable usage in SQL Warehousing Tool data flows and control flows during design time, and how each variable phase can impact the runtime behavior when these flows are executed.

Cheung-Yuk Wu (wu@us.ibm.com), DB2 Data Warehouse Edition SQL Warehousing Development, IBM

Cheung-Yuk Wu has over 15 years of relational database tool and Business Intelligence application development experience in DB2, Oracle, Sybase, Microsoft SQL Server and Informix on Windows and UNIX platforms at IBM. She was the architect for the IBM DB2 Data Warehouse Center and Warehouse Manager. She also developed products including Tivoli for DB2, IBM Data Hub for UNIX, and QMF. She was also a DBA for DB2, CICS and IMS at the IBM San Jose Manufacturing Data Center. She received her Bachelor of Science degree in Computer Science from the California Polytechnic State University, San Luis Obispo.



Marion Behnen (mbehnen@us.ibm.com), DB2 Data Warehouse Edition SQL Warehousing Development, IBM

Marion Behnen is a Senior Software Engineer within the IBM Software Group and works as a technical lead for the Data Warehouse Edition. She has worked as an architect and designer for the SQL Warehousing Runtime and Administration, which included the runtime metadata and statistics. She is a member of the DWE architecture board and participates in future DWE product strategy. She has more than 20 years of experience with database application development, business reporting, data warehousing, and business process integration. Prior to joining IBM, she was involved with many aspects of business process and data integration, in particular for the manufacturing industry.



01 February 2007


Introduction

Enterprise warehousing data is becoming more and more complex, with evolving operational data and increasing data volume. This phenomenon may be partly due to internal consolidation, external acquisition, organization changes, and business expansion. Many corporations require rapid warehousing development and deployment cycles to achieve near real-time warehousing for data analysis and prompt intelligent business decisions. Providing high availability in warehousing data and controlling maintenance costs become critical to the success of the business.

New business decisions and strategy changes in an organization often affect the underlying user data models and the dependent system environments. Existing logical and physical data models and authorizations may need to change to support the new user data models. New hardware, operating system platforms, file system structure, and intra- and inter-company network restructuring may also be necessary. These adjustments lead to modifications and enhancements in the existing warehouse models in order to maintain accurate data warehouses or data marts for further data analysis and decision-making.

When deploying warehouse models from development, to test and production, it's a big challenge for corporations to maintain high availability for their warehouse data and to adapt to new business models on demand with minimum time, cost, and risk. Having a robust migration path from development, to test and production encourages your users to deploy new and changed business models with confidence and reduces production down time.

The DB2 Data Warehouse Edition SQL Warehousing Tool provides a large number of data flow operators to perform fine-grained data transformation and a number of control flow operators to specify the execution hierarchy of data flows and other activities. To ensure maintainability, flexibility, and scalability for these flow models, the tool provides parameterization through attributes and variables. You can design these models once and deploy them wherever the DB2 Data Warehouse Edition Administration Console is installed. You have total control over the level of flow model customization in the runtime environment by using different variable phases.


Understanding SQL Warehousing parameterization

When you are designing a data flow or control flow, the runtime values for the execution of a process instance might differ. For example, warehouse applications may encounter changes in the runtime environment or may rely on information from a prior activity. To adapt more easily to changing requirements, you can assign variables to properties in the flow model instead of using fixed values. When a process instance is executed, the variable values will be substituted based on the change phase setting of each variable. This enables you to share and reuse the same flow models for different scenarios, for example, for test and production environments.

Variables

You can use a variable to defer the value assignments of selected flow properties such as file names, database schemas, table names, and database names to a later phase in the lifecycle of a flow. Variables and variable groups can be re-used and shared among flow models within the same SQL Warehousing project.

Variables can be concatenated with literal strings to form an aggregated value for an operator property. For example, a fully qualified file name property can contain a directory name variable, a file separation literal, and a file name variable. You can also store the result of an operator in a variable and pass the variable to other operators.

Each user-defined variable requires the selection of a variable type and a change phase. Variable types characterize the variables and are used to restrict the applicable change phase as appropriate and to validate the values.

Note: For a list of variable types, see the Appendix.

Change phases

SQW provides different change phases for variables so you can decide when a variable value might be changed. Setting a particular change phase will allow or deny the ability to modify the variable value in subsequent phases. The following table provides a short overview of the supported change phases:

Table 1. Variable change phases
PhaseDescription
DesignThe variable value can be changed while designing a flow and testing its execution in the DesignStudio
Deployment PreparationThe variable value is fixed when the application package is created in the DesignStudio and cannot be changed later
DeploymentThe variable value can be changed during the deployment of an SQW application in the DWE Administration Console
RuntimeThe variable value can be modified after deployment. All instances of a process share the variable value
Execution InstanceThe variable value is prompted for each process instance. It can either be supplied directly when the instance is started, or provided through a process profile

Choosing variable phases

The flexibility through parameterization comes with a trade-off that you must consider at design time. Using variables inside a flow model allows you to react to new and changed requirements quickly and efficiently without redesigning and redeploying the flow model. However, extensive use of parameterization adds complexity and places a cost on flow model management and operation to manage variable values.

In order to strike a good balance between designing a flexible flow model and controlling increased administration costs, it is recommended to observe the following guidelines when choosing variable change phases.

  • The "Design" and "Deployment Preparation" change phases are recommended for settings that are often changed during the design and testing of a flow model. For example, to experiment for desirable results, you can choose the "Design" phase for a variable that contains command execution arguments. You can use the "Deployment Preparation" change phase to set the default variable values for generating an SQL Warehousing application in anticipation of potential deployment environments. Once the application package has been created for deployment to the DWE Administration Console, the values for the "Design" and "Deployment Preparation" variables are fixed.

    Note: Variable values with these phase settings are not saved into the statistics for the process instance. They are only available for reference (read-only) until the application is uninstalled.

  • The "Deployment" change phase is recommended for settings that depend on the actual runtime system to reflect persistent properties which are not expected to change after deployment. You might choose the "Deployment" phase for a variable that, for example, points to the installation location of an executable used in a flow, or that contains the name of the sender for email activities, or that contains the name of a schema in a source database.

    Note: "Deployment" phase variable values are not saved into the statistics for the process instances. They are only available until the application is uninstalled.

  • The "Runtime" change phase is recommended if a variable is likely to change its value because of a change in the business model after the flow models are deployed to the DWE Administration Console. All instances of a deployed process share the current value of a "Runtime" phase variable. If the "Runtime" phase variable value is changed, any subsequent instance started in the DWE Administration Console will use the new value.

    If you expect a particular variable value that is referenced in a process to change, define the variable as a "Runtime" variable. An example of a "Runtime" variable is a local FTP file path that will be shared among all instances of the same process.

    Note: "Runtime" phase variables are copied into the SQL Warehousing Runtime statistics table for each instance of a process that uses the variable. The variable values for each instance remain as persistent data in the SQL Warehousing statistics data for problem determination and future reports. The statistics data remains until you delete it explicitly. Contrary to the previous phases, metadata changes have no effect on the information that is available for an instance statistic entry.

  • The "Execution Instance" change phase is recommended if a variable value might differ for each process instance. For example, "Execution Instance" phase variables are useful for a process that loads files of a certain format.
    1. Define a "File"-type variable to hold the filename that should be loaded, and select the "Execution Instance" phase.
    2. Assign the variable to the data flow operator
    3. Complete the flow and package the flow in an application
    4. Deploy the application in the DWE Administration Console
    5. Select a process from the list of processes and start a new instance. You will be prompted for the name of the file that should be loaded with this particular instance

    Thus, if multiple files arrive in a given period of time, a separate process instance can be started for each of the files. The load process is the same, but the variable will hold a different file name for each instance.

    Note: Variable values of phase "Execution Instance" are treated like "Runtime" phase variables for statistics purposes

Some of the examples used for "Design", "Deployment Preparation", and "Deployment" phases might be candidates for "Runtime" phase variables as well. The criteria to determine the change phase for a variable is a probability estimate based on the model designer’s familiarity with the business models and actual runtime environments. The designer should understand how a variable value would change during its lifetime service in an application.

Attributes

Attributes are associated with the code units in an activity, for example, an FTP unit, EMAIL unit, or FILEWAIT unit. Each type has a specific set of attributes whose values determine the execution details. In most cases, code unit types are directly reflected in the type of their containing activity, the one exception are activities which are mapped to data flows.

Note: For a detailed list of attributes, see the Appendix.

Attribute values can be specified as an actual value, or can refer to a variable. Attributes in general are already treated as "Runtime" phase variable values, thus in most cases you do not need to define and manage separate variables for each attribute.

A typical example is the @WAITTIME attribute in a FileWait code unit. The attribute value is specific to this particular activity in this particular process, thus sharing the value through a variable is not desired, but the value might nevertheless need to be changed to adapt to the actual runtime environment. To simplify managing the attribute values they are therefore implicitly managed as "Runtime" phase variables that are specific to the activity.

Instead of an actual value, the attribute value can contain a variable name to point to a shared value. This option should be considered in the following situations:

  • when an attribute needs to be exposed as an "Execution Instance" change phase value. For example, if an activity attribute points to a file name that is different for each instance.
  • when an attribute value is a common value shared within a deployed SQL Warehousing application. For example, if the default email sender attribute is a fixed email address and the sender later leaves the company, using a shared variable to represent the email sender requires only one simple change and all email activities that reference this sender remain current without further modification.

System variables

You can reference system variables in a command execution activity to access some common values. The values for these variables are substituted during the execution of an instance. For example, if the host name of a remote file is used as part of a local directory path name where data for the process is gathered, you can set the @FILE attribute in the FTP activity type to:

@LOGPATH/@RESNAME/ftpfile

These variables will be substituted with the actual log path of the deployed SQL Warehousing application and the resource hostname to build the local file path.

The following is a list of supported system variables in SQW Runtime:

Table 2. SQL Warehousing System Variables
System Variable IDDescription
@HOMESQL Warehousing application home location path
@LOGPATHSQL Warehousing application home log path
@PROCESSThe name of a deployed SQL Warehousing process
@RESNAMEDatabase or System name for the referenced resource
none@test.noneWhen used as the receiver of an EMAIL activity, no email will be sent; the content is instead printed to the log file

Managing runtime variable values

Variable values are shared among all processes and activities that belong to the same SQL warehouse application and are therefore managed at the application level. At application deployment, all variables with their associated default values are stored in the SQW Runtime metadata. When an application variable value is changed later, the new value becomes visible to all processes and activities in that application.

The following rules apply to variable value management in SQL Warehousing.

  1. "Design" and "Deployment Preparation" phase variables are editable in the DWE Design Studio only. Choose DB2 Data Warehouse Edition Design Studio Business Intelligence perspective -> File -> New -> SQL Warehouse Application -> select a project -> Variable Management. Select the variable entry in the list and click the "Edit" button to change the variable properties.
    Figure 1. Manage variable during deployment preparation
    Figure 1. Manage variable during deployment preparation
  2. Once the processes are deployed in the DB2 Data Warehouse Edition Administration Console, "Runtime" and "Execution Instance" variables can be changed in the DWE Administration Console by selecting DB2 Data Warehouse Edition SQL Warehousing -> Data Warehouse Applications -> Manage Data Warehouse Applications -> select an application -> Variables . You can also view the variables for a process in the DWE Administration Console by choosing DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Processes -> select a process -> Variables .
  3. Attributes can be changed in the DWE Administration Console by selecting an activity for a process chosen from the application list or directly in the process list.

    • By application:
      • DB2 Data Warehouse Edition SQL Warehousing -> Data Warehouse Applications -> Manage Data Warehouse Applications -> select an application -> Processes -> select a process -> Activities -> select an activity -> Attributes
    • By process:
      • DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Processes -> select a process -> Activities -> select an activity -> Attributes
  4. In the DWE Administration Console, a SQL Warehousing process profile can be used to supply "Execution Instance" change phase variable values for process instances. Its use is mandatory when you schedule a process. Note: The process profile lists values for other change phases for information, they cannot be modified per profile.

    • To edit variables
      • DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Process Profiles -> select a process profile -> Instance Variables .
    • To view both "Runtime" and "Execution Instance" variables:
      • DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Process Profiles -> select a process profile -> All Variables .
  5. If a variable value that is referenced in a process is changed after that process is scheduled in the DWE Administration Console, all remaining scheduled process instances will continue to use the variable value that was set at the time when the schedule was created. The new variable value is observed only when a new process schedule is created.

    To check for remaining schedule runs for a deployed application or process, go to DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Process Schedules -> Runs Remaining column. You can filter the "Manage Process Schedules" list by using the Application or Process filter.

    Figure 2. Manage Schedules
    Figure 2. Manage Schedules
  6. All "Runtime" and "Execution Instance" variables are recorded for each process instance in the SQW Runtime statistics. If a process instance fails, any variable values are displayed in the DWE Administration Console. DB2 Data Warehouse Edition SQL Warehousing -> Troubleshooting -> select a failed instance -> Variable Information . You can use these values for problem determination and verify that the correct values are specified.

Appendix

Activity attributes

The following table lists the SQL Warehousing activity attributes with their corresponding change phases. All attribute values can be changed in the DWE Administration Console except for those marked with an "X" in the column "Phase Limit".

Figure 3. SQL Warehousing Activities and Attributes
Figure 3. SQL Warehousing Activities and Attributes

Click to see larger image

Figure 3. SQL Warehousing Activities and Attributes

Figure 3. SQL Warehousing Activities and Attributes
Figure 3. SQL Warehousing Activities and Attributes (continued)
Figure 3. SQL Warehousing Activities and Attributes

Variable Phases

The following table lists all variable types together with the latest change phase that can be selected for this variable type. If a variable type is marked as "any" change phase, the phase selection is unrestricted.

Figure 4. Variable Phases and Default Values
Figure 4. Variable Phases and Default Values

Summary

DB2 Data Warehouse Edition SQL Warehousing parameterization enables users to defer specifying flow properties to a later phase in the lifecyle of a process flow. For this purpose variables can be referenced in the flow model definitions, and an appropriate variable change phase selected. Understanding the flow model requirements and the implication of each variable phase are essential in building flexible and more easily maintainable data warehouses and data marts. Users should be aware of the impact of the different variable phases on the deployment requirements, application maintenance and storage sizes for the variables: "Design" and "Deployment Preparation" phases require a re-deployment of the application to change a variable value, but have less requirements on storage space. "Runtime" and "Execution Instance" phases enable value changes without re-deployment, but require additional planning for space utilization.


Acknowledgement

The authors would like to thank Bob Rumsby and Gregg Snodgrass for their help with developing this paper.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=193657
ArticleTitle=Advanced topics for DWE users Part 2: Best practices for choosing DB2 Data Warehouse Edition SQL Warehousing variable phases
publish-date=02012007