The majority of measures within an OLAP cube are additive. To get values at the higher levels we simply sum the lower level values. There are times when measures are not additive. This document shows when measures are not additive and strategies to deal with them.
An important question that any OLAP modeller should always be asking is “How does this measure work at higher levels?”. Often we are presented with data, be they measures or calculations, that work fine at the leaf level, or grain that the data is supplied at, but no longer work at the aggregate levels of the OLAP model. By asking this question, we spot these potential issues before we complete the OLAP design.
The OLAP tool used throughout this document is IBM Cognos Transformer and the techniques tested within were tested against IBM Cognos Transformer 10.1.x. However, the issues and discussion are applicable to earlier versions of Transformer as well.
The reader is assumed to have basic IBM Cognos Transformer modelling skills.
As a good OLAP modeller we are always asking the important question: “Does this measure work at higher levels?”. We shall use an example requirement given to us by a customer to demonstrate why this question is so important.
Our customer wants us to do some very simple sales analysis. The customer wants to be able to analyze the following measures.
- Sold Quantity
- Unit Cost
- Cost of Sales (Sold Quantity * Unit Cost)
- Profit % (profit as a percentage of sales)
Handily they can provide us with the following measures at the Product and Region levels.
- Sold Quantity
- Unit Cost
- Profit %
Indeed this is the data they use in their spreadsheets to produce reports similar to the OLAP requirements. They have verified the data in the spreadsheet as being correct.
An example data extract they provide is a CSV file with the following values.
Region,Product Line,Product,Sales,Unit Cost,Sold Quantity,Profit% UK,Electronics,TV,300,100,2,33.3 US,Electronics,TV,500,100,4,20 UK,Electronics,HiFi,280,48,5,14.3 US,Electronics,HiFi,490,48,7,31.4
If you wish, copy and paste this CSV data into a text file and use it as a source in IBM Cognos Transformer to follow along with this document.
Using this data in the CSV file, we build the IBM Cognos Transformer model shown in Illustration 1 where the model has the following items.
- A Product dimension with two levels. The highest level is Product Line and the lowest level Product.
- Four Measures sourced from the columns Sales, Unit Cost, Sold Quantity and Profit %.
- A calculated measure Cost of Sales which is Sold Quantity * Unit Cost. Each measure has been formatted to 2 decimal places.
- A single PowerCube called sales_unconsolidated.txt.
Illustration 1: IBM Cognos Transformer's default view window showing the model with a Product Line dimension, several measures including a calculated measure, and a PowerCube
Experienced IBM Cognos Transformer modellers will notice that this example does not follow the proven practice of separate dimension and fact queries. This would not alleviate the issues demonstrated within this document. The source data is also not consolidated – we will discuss this later in the document.
When we examine the resulting PowerCube in IBM Cognos PowerPlay client, we identify a number of issues with the results (we are using the IBM Cognos PowerPlay client for convenience in this document to illustrate the returned values but the same output would also apply in the IBM Cognos BI studios).
As seen in Illustration 2, Unit Cost, Profit %, and Cost of Sales are all wrong at both the leaf (Product) level and the aggregate level (Product Line) where the numbers are inflated.
Illustration 2: A crosstab showing inflated Unit Cost, Profit %, and Cost of Sales measures as rows and Product as columns
If the values are correct in our underlying data – why are the values in our OLAP cube wrong? The cube is simply summing the values, which is incorrect for these types of non-additive measures.
Unit Cost – Non-Additive Roll Up
The first step in debugging OLAP data is to look at the leaf values. If the leaf values are incorrect then it is not possible for the higher aggregation levels to be correct. Such a simple rule is often overlooked.
We observe from the crosstab in Illustration 2 that TV has a Unit Cost of 200 when in fact it should have a Unit Cost of 100. HiFi has a Unit Cost of 96, this should be 48. What has gone wrong?
Consolidation of Duplicate Records
In the source file it looks like we have unique records. For every Region and Product we have a record. However in the context of our OLAP model, we do not have unique records, we actually have duplicate records. Our OLAP cube does not use Region. This part of the record can be ignored. When we do this, we observe that for TV products we have two records. The same is true for HiFi.
Our OLAP cube consolidates these duplicate records into a single value. How should IBM Cognos Transformer handle the duplicate records?
First, we must state that proven practices recommend that we actually do the consolidation of duplicate records before the data even reaches IBM Cognos Transformer. Why is this?
- At the moment we are moving two records for every product. If this were a database we would have to physically extract two records and move them across the network to IBM Cognos Transformer which would then write these to work files on which it could perform consolidation. We would use far less resources if we simply moved one record. Now imagine this scenario with real data volumes.
- If we were using a database source, our database server is typically of a much higher specification than the IBM Cognos Transformer server. It makes far more sense for our database server to do the consolidation.
For now we will proceed down the route of making IBM Cognos Transformer perform the consolidation. By default IBM Cognos Transformer sums duplicate values. While this is correct for Sales and Sold Quantity, it is wrong for Unit Cost. It is this sum of the Unit Costs we are seeing in the OLAP cube ( 2 TV records, each with costs of 100 thus 200. The same with HiFi - 2 * 48 = 96).
We need to change the consolidation of duplicates for our Unit Cost measure.
Within the Measure – Unit Cost properties window, on the Rollup tab we can change the Duplicates rollup property as seen in Illustration 3. We will use the value Average.
Illustration 3: IBM Cognos Transformer's Unit Cost measure properties window showing the Rollup tab and Duplicate roll up options with Average highlighted
In our example data, we would see the same result with all of these options (None, Minimum, Maximum, First, and Last) except Sum.
If we examine the OLAP cube that is now built, we can replicate the previous crosstab, but this time with the correct values. We observe now that our TV and HiFi – the leaf members – are showing the correct Unit Cost values of 100 and 48 as seen in Illustration 4.
Illustration 4: A crosstab with the five model measures as rows and product as columns showing the effects of roll up changes
If our data were consolidated we would not have had to change the Duplicates rollup property as there would be no duplicates.
However, after fixing the leaf values we notice in Illustration 4 that the rollup of Electronics is showing a Unit Cost of 148, the sum of the lower levels where TV is 100 and Hifi is 48. Let us once again ask the important question: “How does this measure work at higher levels?”. Here we should be asking the user what is the Unit Cost of Electronics? How do we work this out from the lower level data?
Our customer tells us we should be averaging the value at higher levels rather than summing. Within the Measure – Unit Cost properties window on the Rollup tab, we can change the Regular rollup property to Average as seen in Illustration 5. This means we could also change the Duplicates rollup property back to None (Regular rollup) as Regular rollup is Average and duplicates will also be averaged.
Illustration 5: IBM Cognos Transformer's Unit Cost measure properties window showing the Rollup tab and Regular rollup options
If we once again examine the OLAP cube, we observe in Illustration 6 that the Unit Cost of Electronics is now 74, which is the average of 100 and 48.
Illustration 6: A crosstab with the five model measures as rows and product as columns showing the effect of the Regular rollup change
Cost of Sales – Calculation Before or After Rollup
Having applied the same leaf level fix for the Unit Cost measure to the Cost of Sales measure, it also behaves as expected at the leaf level. For example, as shown in Illustration 6, TVs sold 6 units at a cost of 100 equalling a Cost of Sales of 600 and HiFi has a Cost of Sales of 576.
Again, we must make sure our measure works at all levels in the OLAP cube. If we examine the Electronics Cost of Sales, we expect the total Cost of Sales to be 1,176 (600 + 576). But in Illustration 6, our OLAP cube is showing 1,332. Clearly this is wrong but why?
The answer is in the timing of our Cost of Sales calculation. Presently our calculation for Electronics is showing Sold Quantity of 18 * Unit Cost of 74. This is our expected value of 1,332. Our calculation is happening after rollup. We want to work out the leaf cost of sale and then rollup. In other words, we want to perform the Cost of Sales calculation before rollup.
IBM Cognos Transformer allows the user to change the rollup of a calculated measure in the Measure property window on the Rollup tab. The properties in this window are different than those of a regular measure. In the Regular timing drop-down list, we have the option of After Rollup or Before Rollup as seen in Illustration 7. In this case, we will choose Before Rollup.
Illustration 7: IBM Cognos Transformer's Cost of Sales calculated measure properties window showing the Rollup tab and Regular timing options
However, there are some limitations on this:
- The measures involved in the calculation must come from the same data source. In this case they do.
- We may get an issue if a measure from the same data source has a non-standard rollup.
- If we want to use the measure in a calculation, that measure must also have the same rollup timing (this will be done later in the document).
Given these limitations and in order to make our model extendable without hitting these limitations, it is often better to define a Before Rollup measure as a calculated column in the IBM Cognos Transformer Data Sources pane and then treat it the same as any other measure by adding it as a measure to the Measures pane. This is shown in Illustration 8. By doing the calculation on the data source, the calculation is done at data load.
Illustration 8: Transformer's default view window showing the model with a Product Line dimension, a calculated column in the Data Sources pane, several measures including the Cost of Sales calculated data source column measure and a PowerCube
We will keep the default rollup properties for the new Cost of Sales measure. After rebuilding the cube, our report now has a Cost of Sales at the Electronics level showing 1,176, which is the correct value.
Illustration 9: A crosstab with the five model measures as rows and Product as columns and Cost of Sales showing the correct value for Electronics
Revisiting the Unit Cost Measure
Our first user is now happy with the Unit Cost and Cost of Sales measures. However a second user is now asked to view the cube. They are examining the values at the Electronics level of the cube. They simply cannot understand how Unit Cost: 74 * Sold Quantity: 18 = Cost of Sales: 1,176. Surely we would want all the numbers at the different levels of the cube to correlate?
Maybe we should actually work out the Unit Cost after aggregation? In fact, Unit Cost = Cost of Sales / Sold Quantity! By turning the problem around, as an OLAP modeller we can introduce a calculated measure for Unit Cost. We should also do this calculation after roll up. That is, work out the total Cost of Sales, then work out the total Sold Quantity and then divide. This is a different timing than the previous Cost of Sales calculation, which was set to Before Rollup – hence the need to produce Cost of Sales as a calculated column.
In our IBM Cognos Transformer model, we introduce a new calculated measure called Unit Cost (calculated) which, as shown in Illustration 10, has the following expression:
"Cost of Sales" / "Sold Quantity"
The Regular timing property is set to After Rollup.
Illustration 10: IBM Cognos Transformer's default view window showing the newly added Unit Cost (calculated) measure
After the cube is built again and we observe the report shown in Illustration 11, at the leaf level both Unit Cost and Unit Cost (calculated) show identical values. It is only at the aggregate level of Electronics that the measures show different values. While our average based measure shows 74, our new calculated measure shows 65.33. This value ties in with the Sold Quantity and Cost of Sales measures for Electronics where 18 * 65.33 = 1,176.
Illustration 11: A crosstab with the new Unit Cost (calculated) measure showing the expected results for Electronics
As a modeller it is our responsibility to be able to discuss both versions of the Unit Cost measure with our client and explain the values. By showing both versions of the measure the client can choose which version they prefer. They may even decide to have both but they would have to ensure users understood the two different measures. If the customer chooses to show both then more appropriate measure names should be used.
Profit % - Knowing When To Discard Seemingly Useful Data and Do It Yourself
Our client has given us the requirement to show the Profit% measure. They have included this value in the CSV file from which we have built our OLAP cube. As a reminder, the Profit % is calculated as (Sales – Cost of Sales) / Sales.
When we observe the report we can see that our Profit% figures are wrong. We are showing a value of 53, 45 and 98 for TV, HiFi and Electronics as shown in Illustration 12. These values are too high.
Illustration 12: Crosstab showing incorrect values for Profit% measure
The values are wrong for two reasons.
- The source data is not consolidated. For TV we have two records. One has a Profit % of 33.3% whilst the other has 20%. How do we consolidate these two records? Do we average them? We discussed consolidation previously in this document.
- The data is summing to the higher levels in the cube. We don't want to add the percentages together as this is wrong. Even if our source data were consolidated we would still have to contend with this issue.
Any ratio that we are asked to report on in an OLAP model could be affected by these issues. In this case, the solution is that Profit% must be calculated after rollup. That is, first work out the total sales, then work out the total Cost of Sales. After we have rolled up these two values we should then do the calculation.
Within the IBM Cognos Transformer model we can create a new calculated measure called Profit % (calculated) which, as shown in Illustration 13, has the expression of,
(“Sales” – “Cost of Sales”) / “Sales”
Illustration 13: IBM Cognos Transformer Measure Calculation expression editor window showing the formula (“Sales” - “Cost of Sales”) / “Sales”
Observing the report from the OLAP cube now shows values of 25, 25.19 and 25.1 for TV, HiFi and Electronics respectively as shown in Illustration 14.
Illustration 14: A crosstab report showing the new Profit % (calculated) measure alongside the original Profit% measure
These are the correct values for this ratio. We can see that the calculation will work regardless of which level data we are looking at.
Dig deeper into Big data and analytics on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Crazy about Big data and analytics? Sign up for our monthly newsletter and the latest Big data and analytics news.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.