A virtual cube is a logical cube that is defined in terms of exactly two existing cubes: either two real cubes, two other virtual cubes, or one virtual cube and one real cube. If more than two cubes are required to be aggregated, they can be merged in pairs. The resulting virtual cubes can be merged with other cubes as shown in Figure 1. The merging is done based on the dimension names. Dimensions with the same name in both cubes will be merged. Dimensions from one cube that do not have a corresponding dimension with the same name in the other cube will simply be added to the virtual cube. Virtual cubes can be used to combine two cubes that share either one or several dimensions. Two cubes are considered to share a dimension if a dimension with a particular name exists in both cubes. For example, if two cubes each have a dimension called [Time], in the context of virtual cubes, the two cubes are considered to share the dimension [Time]. The two cubes that are merged can belong to different cube models, so they might have completely different internal structures. The main requirement is that they share at least one dimension. The virtual cubes and the depending cubes must reside in the same database, and a virtual cube can be defined only using cubes from the same cube server.
For example, Figure 1 shows cubes aggregated to form virtual cubes. Virtual cubes are indicated by an asterisk (*). In the figure, the Store Sales and Web Sales cubes have nearly identical dimensions, and they merge to form the Total Sales virtual cube. The Total Sales virtual cube and the Inventory cube share a common Time dimension, and they merge to form the Inventory Sales virtual cube. Note that virtual cubes (Total Sales) can merge to form other virtual cubes (Inventory Sales). The Inventory Sales virtual cube is useful in this case because it aggregates cubes that are part of two separate business problems: the Inventory cube is used to track inventory, and the Sales cube is used to track sales figures.
Figure 1. Cubes aggregated to form virtual cubes
When a query is issued against a virtual cube, that query is routed to the depending cubes, which produces two intermediary results that are aggregated according to the virtual cube’s merge operator. Merge operators are SUM, MINUS, PRODUCT, DIVIDE, MAX, MIN, and NOP (returns data from the first cube in the virtual cube definition).
- When merging [CubeA] and [CubeB] to form [Virtual Cube] using the SUM operator, select from [Virtual Cube] = select from [CubeA] + select from [CubeB].
- When merging [CubeA] and [CubeB] using MAX operator, select from [Virtual Cube] = MAX (select from [CubeA], select from [CubeB]).
Exploring virtual cubes scenarios
To get an idea of when virtual cubes might be useful for your application, a few scenarios are detailed in this section.
Low latency improvement scenario: Partition data based on time
All sales information is stored in one cube [AllSales]. Fact data is added nightly, which requires rebuilding the data cache and refreshing the materialized query tables (MQTs). For large cubes, this requires a significant amount of time. To improve performance, you can partition the data for the [AllSales] cube based on time to form the basis for two cubes [HistoricSales] and [CurrentMonthSales], and you can define a virtual cube called [VirtualSales] to join the two cubes. The [HistoricSales] cube is used to record the historical sales information that does not change frequently. The smaller [CurrentMonthSales] cube is used to record the daily sales information for the current month. The [CurrentMonthSales] cube is referred to as the delta cube.
The performance of the cache refresh improves because of several factors:
- The MQT refresh time is now reduced, because often only the small MQTs based on [CurrentMonthSales] must be refreshed.
- Large MQTs based on [HistoricSales] are updated only monthly.
- Because query results from the [HistoricSales] cube are pre-cached, the performance for queries that are run against the sales data of the entire time period is improved.
- Because of the smaller size of the [CurrentMonthSales] cube, the performance for queries that are run against the sales data of the current month or the entire time period is improved.
You can partition data based on time by defining database views that contain historic data and the current month data respectively. The cube models for [HistoricSales] and [CurrentMonthSales] use their respective fact views instead of database tables. The views should be recreated monthly when the data accumulated in the current month view rolls out to the historic view. The performance improvements resulting from this scenario are directly related to the size of the [CurrentMonthSales] cube relative to the [HistoricSales] cube. The historic cube should be as large as possible, and the incremental CurrentMonth should be as small as possible to obtain the best results.
One testing scenario used a current month cube in which the fact table has 3 million rows, and the historic cube is based on a fact table of 900 million rows. The test queries ran for a single user more than 100 times slower against the big cube [AllSales] that contains all the data than against the virtual cube [VirtualSales]. To achieve similar size ratios between the delta cube and the historic data cube, the delta cube can be set to contain less sales data by updating it weekly, bi-weekly, and so on, depending on the data volume.
The following example is based on the GOSALES database that is included with the 9.7 release. For demonstration purposes, the time partition is calculated by year only. Two cubes, [CSSalesMarketing_Recent] and [CSSalesMarketing_Historic], are created, and a virtual cube, [CSSalesMarketing], aggregates the two cubes. The fact table for [CSSalesMarketing_Recent] was replaced with a view that partitions data based on the year, as showing in Listing 1.
Listing 1. Create view of recent facts
create view GOSALESDW.fact_recent as select ORDER_DAY_KEY, ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY, PRODUCT_KEY, PROMOTION_KEY, ORDER_METHOD_KEY, SALES_ORDER_KEY, SHIP_DAY_KEY, CLOSE_DAY_KEY, QUANTITY, UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, GROSS_PROFIT from gosalesdw.sls_sales_fact fact, gosalesdw.go_time_dim time where fact.ORDER_DAY_KEY = time.DAY_KEY and time.CURRENT_YEAR = 2007;
Listing 2 shows how to create a view of historic facts
Listing 2. Create view of historic facts
create view GOSALESDW.fact_historic as select ORDER_DAY_KEY, ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY, PRODUCT_KEY, PROMOTION_KEY, ORDER_METHOD_KEY, SALES_ORDER_KEY, SHIP_DAY_KEY, CLOSE_DAY_KEY, QUANTITY, UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, GROSS_PROFIT from gosalesdw.sls_sales_fact fact, gosalesdw.go_time_dim time where fact.ORDER_DAY_KEY = time.DAY_KEY and time.CURRENT_YEAR < 2007;
The columns selected as the view columns are all the columns defined in the original fact table gosalesdw.sls_sales_fact that was partitioned. The cube models for the two cubes, recent and historic, are very simple and, for demonstration purposes, have one dimension, [Time], and one measure each: Sale Total (Recent) and Sale Total (Historic), respectively. The virtual cube renames both these measures to Sale Total, and it aggregates the historic and recent cubes using the addition merge operator. The result of querying the virtual cube is equal to the sum of the same queries against the two cubes.
Figures 2, 3, and 4 illustrate using the [Time] dimension and the Sale Total measures with Microsoft® Excel. Note that the grand total for the virtual cube is equal to the grand total for [CSSalesMarketing_Recent] plus the grand total for [CSSalesMarketing_Historic]. Also, [CSSalesMarketing_Recent] has values only for the year 2007, while [CSSalesMarketing_Historic] has values only for the years before 2007.
Table 1 and Figure 2 show the data for [CSSalesMarketing_Recent] from the Pivot Table Field List when Time and Sale Total (Recent) are selected.
Table 1. Data for [CSSalesMarketing_Recent]
|Sale total (recent)||2007||Grand total|
Figure 2. [CSSalesMarketing_Recent] has only 2007 data
Table 2 and Figure 3 show the data for [CSSalesMarketing_Historic] from the Pivot Table Field List when Time, Year, and Sale Total (Historic) are selected.
Table 2. Data for [CSSalesMarketing_Historic]
|Sales total (historic)||2004||2005||2006||Grand total|
Figure 3. [CSSalesMarketing_Historic] has only historic data
Table 3 and Figure 4 show the data for the merged cubes from the Pivot Table Field List when Time, All, Year, and Sale Total are selected.
Table 3. Combined sales
Figure 4. [CSSalesMarketing] has both historic and recent data
Download the GSDB_TimePartitionExample for an example of cubes and virtual cube definitions for the time partition scenario so you can explore virtual cubes.
Global business view: Currency conversion scenario
All the company's sales information is stored in one cube called [GlobalSales], and there is a business need to convert some of the sales figures that are in U.S. dollars into other currencies. If both the sales and exchange rate information are in the [GlobalSales] cube, the cube contains a lot of redundant data and is difficult to maintain. To address this problem, you can create a cube called [CurrencyExchange] to store the exchange rates to keep the [Global Sales] cube unchanged, and you can define a virtual cube [SalesConversion] to handle the currency conversion for the sales data. The cubes [GlobalSales] and [CurrencyExchange] share some dimensions, such as Time, but they generally have different structures.
The virtual cube [SalesConversion] needs to define a calculated member that handles the actual currency conversion. Because the actual currency conversion might not occur daily but instead only at preset days or at intervals such as monthly or weekly, the ClosingPeriod and OpeningPeriod functions can be used in this scenario. For example, Listing 3 defines the calculated members used for the currency conversion if the exchange happens at the end of each month.
Listing 3. Currency conversion calculated members
Calculated member name: Conversion2 Calculation: closingperiod([Time].[Month], [Time].currentmember), [SalesConversion].[Measures].[CONVERSION_TO_LOCAL]) Calculated member name: UnitSalesLocal Calculation:[ Virtual Cube].[Measures].[Conversion2] * [SalesConversion].[Measures].[Unit sale price]
Note that one virtual calculated member uses the other.
Merging different cubes: Inventory and sales scenario
Sometimes it's useful to merge cubes that do not have many dimensions in common, but for the purpose of running certain queries, aggregating these cubes might make sense. For example, the virtual cube [Inv Sales] that merges an [Inventory] cube with a [Sales] cube contains certain dimensions that are common between the cubes, including [Time] and [Product], and other dimensions that are not common. The two cubes are solutions to two distinct business problems: tracking key performance indicators for sales, and analyzing inventory levels.
By aggregating two such cubes, you can extract information, such as the efficiency of resource management. The virtual cube contains the union of all available dimensions. A query that uses the common dimensions is executed in parallel for each of the composing cubes, and the results are aggregated according to the virtual cube merge operator.
For example, you can calculate the efficiency of resource management by using an inventory turnover ratio formula, as shown in Listing 4.
Listing 4. Inventory turnover calculation
Inv Turnover Ratio = [Inv Sales].[Measures].[Cost of Goods Sold] / [Inventory].[Measures].[Average Inventory] where [Inv Sales].[Measures].[Cost of Goods Sold] = [Inventory].[Measures].[Cost of Goods] * [Sales].[Measures].[Quantity Sold].
The numbers for the example database in Table 4 were randomly generated, so they might not necessarily represent values in the real range.
Table 4. Inventory turnover ratio obtained from [Sales] and [Inventory] cubes for January 2000
|Item category||Cost of goods (Inventory)||Quantity (Total sales)||Cost of good sold (Inv_sales)||Average inventory (Inventory)||Inventory turnover ratio (Inv_sales)|
Merging multiple sales cubes: Web sales and store sales scenario
When there are sales performance data being tracked in different cubes, a virtual cube provides a way to have the overall view by aggregating these separate cubes. For example, if you have a cube called [Web Sales] that tracks the performance of the sales made over the Internet, and if you have another cube called [Store Sales] that has the sale numbers realized at the physical store locations, you can use a virtual cube called [Total Sales] for an overall view of business performance.
In this example, both [Web Sales] and [Store Sales] cubes have similar dimensions, such as [Product] and [Date], but the cubes have different structures. The [Store Sales] cube has an extra dimension [Store], as shown in the pivot table field list from Figure 5. Similarly [Web Sales] has a dimension [Websites] that does not exist in [Store Sales]. For demonstration purposes, the virtual cube [Total Sales] renames the [Date] dimension from both cubes to [Date (Total Sales)], and it renames the [Items] dimension from both cubes to [Items (Total Sales)]. This operation is not necessary, because as long as the [Date] and [Items] dimensions have the same name in both [Web Sales] and [Store Sales], they are merged.
[Total Sales] also renames two measures originating from each cube to ensure they have the same name and are merged. [Gross Profit (Store Sales)] and [Gross Profit (Web Sales)] are both renamed to [Gross Profit (Total Sales)]. The [Web Sales] and [Store Sales] cubes are aggregated with the addition (+) merge operator, which causes the results of the queries issued against the [Total Sales] virtual cube to be the sum of the queries run against [Web Sales] and [Total Sales]. For the example, Figure 6 shows the PivotTable Field list with Date (Total Sales), Items (Total Sales), and Gross Profit (Total Sales) selected. The Gross Profit (Total Sales) equals $6,425,085,101.50, as shown in Figure 5.
Figure 5. [Total Sales] aggregates [Web Sales] and [Store Sales]
This section describes merging the two cubes using the merging process that occurs at runtime. When the cube server starts, the virtual cubes get instantiated only after all the depending cubes are instantiated. If the two cubes that compose a virtual cube have dimensions with identical names, those dimensions are selected for merging. If a dimension name from a depending cube is not found in the other cube, that dimension is simply added to the virtual cube without any merging. Table 5 shows an example of merging cube dimensions.
Table 5. Merging cube dimensions to form a virtual cube
|Sales +||Inventory >||Inventory Sales|
|Product||Product||Product||Merge dimensions with identical names|
|Time||Time||Time||Merge dimensions with identical names|
|Inventory||Inventory||Add unmatched dimensions|
|Store||Hide unwanted dimensions|
There is a virtual cube design option to hide dimensions from the cubes so they will not exist on the virtual cube. Hiding a dimension in a virtual cube only affects that virtual cube. Dimensions and members can also be renamed for the virtual cube to cause merging on the desired elements, as described later in this article.
Once the dimensions from both cubes are selected for merging, the merging happens on the default hierarchies defined for each dimension. The merge process iterates through the levels of the hierarchies from both dimensions, beginning at level zero, and merges levels that have identical names. If the names are not identical, the two levels are still merged, and the resulting level name is the concatenation of the original level names. If one of the hierarchies is longer than the other, the remaining levels from the longer hierarchy are added to the virtual hierarchy. Note that the levels cannot be renamed.
Note especially the hierarchy definitions for the depending cubes. Make sure that merged levels are semantically identical. Otherwise, the results of level merging will be meaningless.
In the examples below, the cubes have equivalent levels called State (such as California in the U.S.) and Province (such as Ontario in Canada). In Table 6, the two levels are at the same depth, so merging those produces the State Province level that contains the California and Ontario members.
Table 6. Merging level names for common dimensions must be done on hierarchies with levels that are semantically identical
|Sales East||Sales West||All sales|
|State||Province||State Province (unmatched names are concatenated)|
In Table 7, State and Province are on different levels, and they will not be merged with each other. Instead, Region (such as Pacific for the U.S., which has no equivalent in Canada) is merged with Province, but Region and Province are not semantically identical.
Note that a query against the Neighborhood level of the example virtual cube returns results only from the cube that contains that level, such as Sales East. Because Sales West has no knowledge of that level, the values returned can only come from Sales East. A query against the City level of the hierarchy of the same virtual cube returns results from both cubes, because both cubes have a level called City.
Table 7. If the elements don't align semantically, you can get bad results.
|Sales East||Sales West||All sales|
|Region||Province||Region Province (mismatched levels produce meaningless results)|
In addition to level merging, virtual cubes also perform member merging for the common dimensions of the depending cubes. The members are also merged according to their names, as shown in Figure 6 and Table 8.
Figure 6. Merging members
Table 8. Merging members to form a virtual cube
|Cube A||Cube B||Virtual cube|
|Store sales||Store sales||Store sales||Merge members with identical names|
|Unit sales East||Unit sales East||Add unmatched members to virtual cube|
|Unit sales West||Unit sales West||Add unmatched members to virtual cube|
A query issued against a virtual merged member is executed in parallel on both depending cubes. The results are aggregated by the virtual cube according to the merge operator. If the query is issued against a virtual member that is based on a member that exists in only one of the cubes, the result is identical to when the query is run against the depending cube that contains that member.
Working with virtual cubes
Virtual cubes are defined using Design Studio. The administrative tasks for virtual cubes are performed using the Cubing Services Administration Console application. Listing 5 and Figure 7 show how virtual cube elements appear in Design Studio. Especially notable folders appear between asterisks.
Figure 7. Design Studio view for virtual cube elements
Listing 5. Design Studio folder structure
VirtualCubes Data Diagrams Data Models Database Model.dbm Database SQL Statements OLAP Objects CSSalesMarketing_Historic FACT Time Cubes *CSSalesMarketing_Historic* (this is a cube) Time Cube Facts (CSSalesMarketing_Historic) CSSalesMarketing_Recent FACT Time Cubes *CSSalesMarketing_Recent* (this is a cube) Time Cube Facts (CSSalesMarketing_Recent) *CSSalesMarketing* (this is a virtual cube) MDX Calculated Measures *Virtual Dimensions* (only renamed virtual dimensions appear here) Time *MDX Calculated Members* Virtual Members *Virtual Measures* Sale total Sale total Shared Dimensions GOSALESDW MDX Schema Other Files SQL Scripts
Complete the following steps to create virtual cubes:
- Add a virtual cube to the model using the Add Virtual Cube option.
- Select the two cubes to be aggregated to form the virtual cube. You can select any cubes from the list of available cubes.
- Optionally rename any existing dimensions from the depending cubes by selecting Add Virtual Dimension in order to cause them to merge.
- Optionally rename members for that dimension to cause member-merging by selecting Add Virtual Member.
- Optionally define or add calculated members to the virtual cube by selecting Add MDX Calculated Member.
The only elements for virtual cubes that Design Studio displays are the ones that were explicitly changed by performing Add operations. For a full view of the virtual cube structure that includes all available dimensions, hierarchies, and members, you can use Design Studio's dimension browser. For example, if [VirtualCube] is defined by merging [CubeA] and [CubeB], when adding the virtual cube to the model, Design Studio displays [VirtualCube]. If [CubeA] has the time dimension called [Time] and [CubeB] has the time dimension called [Tiempo] and the two dimensions should be merged because they are semantically identical, you can rename [Tiempo] to [Time] to cause the merge. The Add Dimension operation enables you to rename [CubeB].[Tiempo] to [Time]. In Design Studio, you will see one virtual dimension defined for the virtual cube that renames [Tiempo] to [Time].
If you don't want a particular dimension from one of the two cubes to be available in the virtual cube, there is an option to hide it. This only affects the virtual cube, and not the cube that contains the dimension. To hide a dimension, add it to the model, and then check the hidden flag.
When merging two dimensions, you might have a situation where each dimension has a different default member. There is an option to set a particular member from the virtual dimension to be the default member. To set a default member for a dimension, add that dimension to the model, and then set the default member property. If the default member is not explicitly set on the virtual dimension, the default is the dimension from the first cube selected to compose the virtual cube.
Members from the two cubes are merged only if they have identical names. There is an option to rename members in order to cause the merging. Furthermore, the members can have their own merge operators defined which, if they are set, will have priority over the cube merge operator. To rename a member, add it to the model under the virtual dimension where it is contained. If a member is not to be included in the resulting virtual cube, it can be hidden. To hide a member, add it to the model under the virtual dimension where it is contained, and check the hidden flag.
In the context of a virtual dimension, you can also create virtual calculated members. The calculated members defined on the cubes are not available on the virtual cube. Some calculations are made in the context of the cube on which they are defined, and propagating them to the virtual cube might produce unexpected results. Also the calculations using non-distributive measures such as standard deviation will not produce reliable results if propagated to the virtual cube. In addition, if both cubes have a calculated member with the same name but different calculations, the cubing engine cannot know which calculation the user intended to propagate. The calculated members need to be redefined for the virtual cube as needed.
The virtual calculated members can be defined for any virtual dimension that was added to the model using Add Virtual Dimension. If some members were renamed for a virtual dimension, the new member names can be used in the calculation. Following are some additional points about virtual calculated members:
- The virtual calculated members exist only in the context of the virtual cube and are not available from the dependent cubes.
- The virtual calculated members have another degree of flexibility by allowing setting of a parent. The parent unique ID for a calculated member can be any member on a non-leaf level from the same dimension.
Managing virtual cubes from the Administration Console
To the Administration Console users, the virtual cubes are different than regular cubes in the following ways:
- The virtual cubes belong to a placeholder cube model called VIRTUAL. This is where virtual cubes can be deleted.
- The virtual cubes do not have cache refresh scheduling information.
- The operations of starting and stopping, deploying to a cube server, and undeploying virtual cubes are similar to the operations on the regular cubes, except for a few different dependency checks.
- A cube or virtual cube can be stopped or undeployed only after all the virtual cubes that directly use it are stopped or undeployed.
- A virtual cube can be started or deployed only after both its direct cubes are started or deployed.
Understanding dimensional security
Dimensional security is a new feature for InfoSphere Warehouse 9.7. The specifics of this feature are detailed in an article listed in Resources. For a virtual cube, the dimensional security is inherited from its two cubes. There are no specific security settings for a virtual cube. You can set security on the real depending cubes.
For example, for a virtual cube based on [Sales West] and [Sales East], both with a [Store] dimension, the [Sales West] cube allows only descendants for [California] to be visible and [Sales East] allows only descendants of [Florida] to be visible. The virtual cube allows only descendants of Florida and California to be visible.
Starting virtual cubes with the Cube Server
When the Cubing Services server starts, the elements defined using Design Studio are parsed. This includes the virtual dimensions, members, and calculated members. If a dimension or member is renamed for the virtual cube, then the name available for the virtual cube is the user-specified one. For example, [CubeA] and [CubeB] are merged to form [VirtualCube]. The time dimension for [CubeA] is [Time], and for [CubeB] the time dimension is [Tiempo]. Renaming [CubeB].[Tiempo] to [Time] causes the merging of CubeA.[Time] with CubeB.[Tiempo], because they both have the name Time. The virtual cube will have the time dimension [Time]. The next step is to merge the hierarchies of shared dimensions from the two depending cubes. From the example, the hierarchies for CubeA.[Time] and CubeB.[Tiempo] are merged level by level beginning at level zero.
The merge process verifies whether the dimension members have the same name, and if they do, then the virtual cube contains only one member obtained by merging the depending members. If the member exists in only one of the cubes, that member is simply added to the virtual cube. For example, the [CubeA].[Time] dimension has members 2008 and 2009, and [CubeB].[Time] has the members 2007 and 2008. The virtual cube that aggregates these two cubes will have the members 2007 and 2008 merged from both cubes and also the member 2009. A query against 2007 returns data from its parent cube [CubeB], and a query against the merged member 2008 returns the results from both [CubeA] and [CubeB] according to the merge operator. In another example, if [CubeB].[Time] has instead the members 2007 and 2008_1, then the virtual cube in the previous example has the members 2007, 2008, 2008_1 and 2009. By renaming 2008_1 to 2008, the virtual cube has the members 2007 and 2008 merged from both cubes and also the member 2009.
Another important consideration for virtual cubes is what happens if the structure of the two cubes changes when the cube server is running and the cubes are reloaded. In that case the virtual cube should be rebuilt by stopping and starting it. Each time the virtual cube is started, it checks the two cubes and performs the merge.
This article described how virtual cubes are created and how the merging process works. The interaction with virtual cubes using the Design Studio and the Administration Console was also detailed. The examples provided covered a wide range of situations that could serve as a starting point to implement your particular solution. Specific scenarios addressed:
- Performance improvement using time partitioning
- A global business view using currency conversion
- Inventory turnover analysis
- Aggregating data from different sales cubes, such as Web Sales and Store Sales
- A geographic partitioning of the data if two cubes track sales for the East region and sales for the West region
- Learn more from "Designing and deploying a security model using IBM InfoSphere Warehouse Cubing Services" (developerWorks, Sep 2009) to learn how to define security on cubes and dimensions in InfoSphere Warehouse.
- Refer to Virtual Cubes documentation for InfoSphere Warehouse 9.7 for additional information about virtual cubes in the information center.
- Check out the data warehousing area on developerWorks to get the resources you need to advance your skills in InfoSphere Warehouse.
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Download a free trial version of IBM InfoSphere Warehouse 9.7 Enterprise Edition.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- Check out the developerWorks blogs and get involved in the developerWorks community.