Enterprise warehousing data is becoming more and more complex, with evolving operational data and increasing data volume. This phenomenon may be partly due to internal consolidation, external acquisition, organization changes, and business expansion. Many corporations require rapid warehousing development and deployment cycles to achieve near real-time warehousing for data analysis and prompt intelligent business decisions. Providing high availability in warehousing data and controlling maintenance costs become critical to the success of the business.
New business decisions and strategy changes in an organization often affect the underlying user data models and the dependent system environments. Existing logical and physical data models and authorizations may need to change to support the new user data models. New hardware, operating system platforms, file system structure, and intra- and inter-company network restructuring may also be necessary. These adjustments lead to modifications and enhancements in the existing warehouse models in order to maintain accurate data warehouses or data marts for further data analysis and decision-making.
When deploying warehouse models from development, to test and production, it's a big challenge for corporations to maintain high availability for their warehouse data and to adapt to new business models on demand with minimum time, cost, and risk. Having a robust migration path from development, to test and production encourages your users to deploy new and changed business models with confidence and reduces production down time.
The DB2 Data Warehouse Edition SQL Warehousing Tool provides a large number of data flow operators to perform fine-grained data transformation and a number of control flow operators to specify the execution hierarchy of data flows and other activities. To ensure maintainability, flexibility, and scalability for these flow models, the tool provides parameterization through attributes and variables. You can design these models once and deploy them wherever the DB2 Data Warehouse Edition Administration Console is installed. You have total control over the level of flow model customization in the runtime environment by using different variable phases.
Understanding SQL Warehousing parameterization
When you are designing a data flow or control flow, the runtime values for the execution of a process instance might differ. For example, warehouse applications may encounter changes in the runtime environment or may rely on information from a prior activity. To adapt more easily to changing requirements, you can assign variables to properties in the flow model instead of using fixed values. When a process instance is executed, the variable values will be substituted based on the change phase setting of each variable. This enables you to share and reuse the same flow models for different scenarios, for example, for test and production environments.
You can use a variable to defer the value assignments of selected flow properties such as file names, database schemas, table names, and database names to a later phase in the lifecycle of a flow. Variables and variable groups can be re-used and shared among flow models within the same SQL Warehousing project.
Variables can be concatenated with literal strings to form an aggregated value for an operator property. For example, a fully qualified file name property can contain a directory name variable, a file separation literal, and a file name variable. You can also store the result of an operator in a variable and pass the variable to other operators.
Each user-defined variable requires the selection of a variable type and a change phase. Variable types characterize the variables and are used to restrict the applicable change phase as appropriate and to validate the values.
Note: For a list of variable types, see the Appendix.
SQW provides different change phases for variables so you can decide when a variable value might be changed. Setting a particular change phase will allow or deny the ability to modify the variable value in subsequent phases. The following table provides a short overview of the supported change phases:
Table 1. Variable change phases
|Design||The variable value can be changed while designing a flow and testing its execution in the DesignStudio|
|Deployment Preparation||The variable value is fixed when the application package is created in the DesignStudio and cannot be changed later|
|Deployment||The variable value can be changed during the deployment of an SQW application in the DWE Administration Console|
|Runtime||The variable value can be modified after deployment. All instances of a process share the variable value|
|Execution Instance||The variable value is prompted for each process instance. It can either be supplied directly when the instance is started, or provided through a process profile|
Choosing variable phases
The flexibility through parameterization comes with a trade-off that you must consider at design time. Using variables inside a flow model allows you to react to new and changed requirements quickly and efficiently without redesigning and redeploying the flow model. However, extensive use of parameterization adds complexity and places a cost on flow model management and operation to manage variable values.
In order to strike a good balance between designing a flexible flow model and controlling increased administration costs, it is recommended to observe the following guidelines when choosing variable change phases.
The "Design" and "Deployment Preparation" change phases are
recommended for settings that are often changed during
the design and testing of a flow model. For example, to
experiment for desirable results, you can choose the
"Design" phase for a variable that contains command
execution arguments. You can use the "Deployment
Preparation" change phase to set the default variable
values for generating an SQL Warehousing application in
anticipation of potential deployment environments. Once
the application package has been created for deployment to the DWE Administration
Console, the values for the "Design" and "Deployment Preparation" variables are fixed.
Note: Variable values with these phase settings are not saved into the statistics for the process instance. They are only available for reference (read-only) until the application is uninstalled.
The "Deployment" change phase is recommended for settings
that depend on the actual runtime system to reflect persistent
properties which are not expected to change after deployment.
You might choose the "Deployment" phase for a variable that,
for example, points to the installation location of an executable used in a
flow, or that contains the name of the sender
for email activities, or that contains the
name of a schema in a source database.
Note: "Deployment" phase variable values are not saved into the statistics for the process instances. They are only available until the application is uninstalled.
The "Runtime" change phase is recommended if a variable is
likely to change its value because of a change in the
business model after the flow models are deployed to the
DWE Administration Console. All instances of a deployed
process share the current value of a "Runtime" phase variable.
If the "Runtime" phase variable value is changed, any subsequent
instance started in the DWE Administration Console will
use the new value.
If you expect a particular variable value that is referenced in a process to change, define the variable as a "Runtime" variable. An example of a "Runtime" variable is a local FTP file path that will be shared among all instances of the same process.
Note: "Runtime" phase variables are copied into the SQL Warehousing Runtime statistics table for each instance of a process that uses the variable. The variable values for each instance remain as persistent data in the SQL Warehousing statistics data for problem determination and future reports. The statistics data remains until you delete it explicitly. Contrary to the previous phases, metadata changes have no effect on the information that is available for an instance statistic entry.
The "Execution Instance" change phase is recommended if a
variable value might differ for each process instance.
For example, "Execution Instance" phase variables are useful for
a process that loads files of a certain format.
- Define a "File"-type variable to hold the filename that should be loaded, and select the "Execution Instance" phase.
- Assign the variable to the data flow operator
- Complete the flow and package the flow in an application
- Deploy the application in the DWE Administration Console
- Select a process from the list of processes and start a new instance. You will be prompted for the name of the file that should be loaded with this particular instance
Thus, if multiple files arrive in a given period of time, a separate process instance can be started for each of the files. The load process is the same, but the variable will hold a different file name for each instance.
Note: Variable values of phase "Execution Instance" are treated like "Runtime" phase variables for statistics purposes
Some of the examples used for "Design", "Deployment Preparation", and "Deployment" phases might be candidates for "Runtime" phase variables as well. The criteria to determine the change phase for a variable is a probability estimate based on the model designer’s familiarity with the business models and actual runtime environments. The designer should understand how a variable value would change during its lifetime service in an application.
Attributes are associated with the code units in an activity, for example, an FTP unit, EMAIL unit, or FILEWAIT unit. Each type has a specific set of attributes whose values determine the execution details. In most cases, code unit types are directly reflected in the type of their containing activity, the one exception are activities which are mapped to data flows.
Note: For a detailed list of attributes, see the Appendix.
Attribute values can be specified as an actual value, or can refer to a variable. Attributes in general are already treated as "Runtime" phase variable values, thus in most cases you do not need to define and manage separate variables for each attribute.
A typical example is the @WAITTIME attribute in a FileWait code unit. The attribute value is specific to this particular activity in this particular process, thus sharing the value through a variable is not desired, but the value might nevertheless need to be changed to adapt to the actual runtime environment. To simplify managing the attribute values they are therefore implicitly managed as "Runtime" phase variables that are specific to the activity.
Instead of an actual value, the attribute value can contain a variable name to point to a shared value. This option should be considered in the following situations:
- when an attribute needs to be exposed as an "Execution Instance" change phase value. For example, if an activity attribute points to a file name that is different for each instance.
- when an attribute value is a common value shared within a deployed SQL Warehousing application. For example, if the default email sender attribute is a fixed email address and the sender later leaves the company, using a shared variable to represent the email sender requires only one simple change and all email activities that reference this sender remain current without further modification.
You can reference system variables in a command execution activity to access some common values. The values for these variables are substituted during the execution of an instance. For example, if the host name of a remote file is used as part of a local directory path name where data for the process is gathered, you can set the @FILE attribute in the FTP activity type to:
These variables will be substituted with the actual log path of the deployed SQL Warehousing application and the resource hostname to build the local file path.
The following is a list of supported system variables in SQW Runtime:
Table 2. SQL Warehousing System Variables
|System Variable ID||Description|
|@HOME||SQL Warehousing application home location path|
|@LOGPATH||SQL Warehousing application home log path|
|@PROCESS||The name of a deployed SQL Warehousing process|
|@RESNAME||Database or System name for the referenced resource|
|email@example.com||When used as the receiver of an EMAIL activity, no email will be sent; the content is instead printed to the log file|
Managing runtime variable values
Variable values are shared among all processes and activities that belong to the same SQL warehouse application and are therefore managed at the application level. At application deployment, all variables with their associated default values are stored in the SQW Runtime metadata. When an application variable value is changed later, the new value becomes visible to all processes and activities in that application.
The following rules apply to variable value management in SQL Warehousing.
"Design" and "Deployment Preparation" phase variables are editable in the DWE
Design Studio only. Choose
DB2 Data Warehouse Edition Design Studio
Business Intelligence perspective ->
SQL Warehouse Application
-> select a project ->
Select the variable entry in the
list and click the "Edit" button to change the
Figure 1. Manage variable during deployment preparation
- Once the processes are deployed in the DB2 Data Warehouse Edition Administration Console, "Runtime" and "Execution Instance" variables can be changed in the DWE Administration Console by selecting DB2 Data Warehouse Edition SQL Warehousing -> Data Warehouse Applications -> Manage Data Warehouse Applications -> select an application -> Variables . You can also view the variables for a process in the DWE Administration Console by choosing DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Processes -> select a process -> Variables .
Attributes can be changed in the DWE Administration Console by selecting an activity for a process chosen from the application list or directly in the process list.
- DB2 Data Warehouse Edition SQL Warehousing -> Data Warehouse Applications -> Manage Data Warehouse Applications -> select an application -> Processes -> select a process -> Activities -> select an activity -> Attributes
- DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Processes -> select a process -> Activities -> select an activity -> Attributes
- By application:
In the DWE Administration Console, a SQL Warehousing process profile can be used to supply "Execution Instance" change phase variable values for process instances. Its use is mandatory when you schedule a process. Note: The process profile lists values for other change phases for information, they cannot be modified per profile.
To edit variables
- DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Process Profiles -> select a process profile -> Instance Variables .
To view both "Runtime" and "Execution
- DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Process Profiles -> select a process profile -> All Variables .
- To edit variables
If a variable value that is referenced in a process is changed after that process is scheduled in the DWE Administration Console, all remaining scheduled process instances will continue to use the variable value that was set at the time when the schedule was created. The new variable value is observed only when a new process schedule is created.
To check for remaining schedule runs for a deployed application or process, go to DB2 Data Warehouse Edition SQL Warehousing -> Processes -> Manage Process Schedules -> Runs Remaining column. You can filter the "Manage Process Schedules" list by using the Application or Process filter.
Figure 2. Manage Schedules
- All "Runtime" and "Execution Instance" variables are recorded for each process instance in the SQW Runtime statistics. If a process instance fails, any variable values are displayed in the DWE Administration Console. DB2 Data Warehouse Edition SQL Warehousing -> Troubleshooting -> select a failed instance -> Variable Information . You can use these values for problem determination and verify that the correct values are specified.
The following table lists the SQL Warehousing activity attributes with their corresponding change phases. All attribute values can be changed in the DWE Administration Console except for those marked with an "X" in the column "Phase Limit".
Figure 3. SQL Warehousing Activities and Attributes
Figure 3. SQL Warehousing Activities and Attributes (continued)
The following table lists all variable types together with the latest change phase that can be selected for this variable type. If a variable type is marked as "any" change phase, the phase selection is unrestricted.
Figure 4. Variable Phases and Default Values
DB2 Data Warehouse Edition SQL Warehousing parameterization enables users to defer specifying flow properties to a later phase in the lifecyle of a process flow. For this purpose variables can be referenced in the flow model definitions, and an appropriate variable change phase selected. Understanding the flow model requirements and the implication of each variable phase are essential in building flexible and more easily maintainable data warehouses and data marts. Users should be aware of the impact of the different variable phases on the deployment requirements, application maintenance and storage sizes for the variables: "Design" and "Deployment Preparation" phases require a re-deployment of the application to change a variable value, but have less requirements on storage space. "Runtime" and "Execution Instance" phases enable value changes without re-deployment, but require additional planning for space utilization.
The authors would like to thank Bob Rumsby and Gregg Snodgrass for their help with developing this paper.
- "Crash recovery utility for DB2 Data Warehouse Edition SQL Warehousing Tool" (developerWorks, Jan 2007): Get an introduction to DB2 Data Warehouse Edition.
- developerWorks data warehousing and business intelligence resource page: Find articles and tutorials and connect to other resources to expand your DB2 Data Warehouse Edition skills.
- developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Technology bookstore: Browse for books on these and other technical topics.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in developerWorks blogs and get involved in the developerWorks community.
Dig deeper into Information management on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.