IBM Cognos BI - Unexpected Results When Performing Multi-Fact Queries (Stitch Queries)
Nature of Document: Tip or Technique; Product(s): IBM Cognos BI; Area of Interest: Reporting
This content is part # of # in the series: IBM Cognos Proven Practices
This content is part of the series:IBM Cognos Proven Practices
Stay tuned for additional content in this series.
Stitch queries are a technique used by the IBM Cognos BI reporting layer. They allow users to create reports using multiple fact tables from within a database. This applies to star schema database design as well as operational/third normal form databases that have been virtually modelled as a star schema within Framework Manager.
Stitch queries ensure facts are returned from both fact tables joined by conformed dimensions even when there is only data for a dimension member in a single fact.
This document discusses a number of issues that advanced report authors may encounter when they create reports that generate a stitch query.
The guide is applicable to IBM Cognos BI version 8.x, 10.1, and 10.1.1 (when using Compatible Query Mode).
This document assumes experience and advanced knowledge of modeling metadata with IBM Cognos Framework Manager.
What is a Stitch Query?
A stitch query is the name given to the type of query that IBM Cognos BI produces when a user creates a query that contains two or more fact tables. A stitch query consists of independent sub-queries for each fact that are then joined together on common dimensions. This joining process is done using an outer join to ensure no data from any fact sub-query is lost during the stitch.
Throughout the document we will be using a very simple model to illustrate certain challenges that may arise. This model uses the Go Sales Data Warehouse sample.
We are using the following three tables:
This consists of two fact tables, sales and returns, and one conformed dimension, products. The Products query subject has a one-to-many relationship to Sales Fact and Returns Fact as seen in Illustration 1.
Illustration 1: Products dimension query subject with a one-to-many relationship to both the Sales Fact and Returns Fact query subjects
The query subjects in Illustration 1 have been modelled in the Business View and from the Business View, two star schema groupings have been created in the Presentation View. One star schema grouping is in the Sales namespace, which consists of shortcuts to Sales Fact and Products, and the other is in the Returns namespace consisting of shortcuts to Returns Fact and Products as seen in Illustration 2.
Illustration 2: Star schema groupings for each fact, Sales Fact and Returns Fact
The Sales Fact model query subject in the Business Layer has had a filter applied, as seen in Illustration 3, with the following syntax.
[Business View].[Sales Fact].[Product Key] <> 30001
Illustration 3: Filter definition dialog illustrating a Product Key filter applied to the Sales Fact query subject
This filter is simply to give us no sales for that particular product. It ensures we have returns data in our returns query, but no sales for a particular product and will help to illustrate key points in this document.
Again, a stitch query is a query that uses data from multiple fact tables. For example, for all the different products show me the sales and the returns. An important quality of the result set is that on our report we will want to see products if they have had sales and no returns, or they have had returns and no sales. Using standard SQL inner join syntax we would only get data back for a product if it had both sales and returns.
In order to satisfy this important quality, a stitch query is performed in a very particular manner.
- First the product sales are queried.
- Then the product returns are queried.
By doing this as two separate sub-queries, we will get sales even if there aren’t any returns, and returns when there are no sales.
The two sub-queries are then stitched together by common dimension components that are used within the user's report. These are stitched using a full outer join ensuring no records are lost from the two intermediate queries.
Note the part in bold above, “within the user's report”. The facts may have many dimensions in common but if they are not used in the report output there is no need to stitch on them. Additionally stitching is performed at the level of the report. For example if the user requires sales and returns by Product Type then the two sub-queries are grouped by the Product Type level and then stitched together at the Product Type level even though the two facts are joined to the Products dimension at the Product Number level.
Here is the SQL created by a stitch query.
select (coalesce("D2"."Product_Line", "D3"."Product_Line")) AS "Product_Line", "D2"."Sales_Quantity" AS "Sales_Quantity", "D3"."Returns_Quantity" AS "Returns_Quantity" from ( select "Products"."PRODUCT_LINE_CODE" AS "Product_Line", sum("Sales_Fact"."QUANTITY") AS "Sales_Quantity" from "GOSALESDW"."gosalesdw"."SLS_PRODUCT_DIM" "Products", "GOSALESDW"."gosalesdw"."SLS_SALES_FACT" "Sales_Fact" where "Products"."PRODUCT_KEY" = "Sales_Fact"."PRODUCT_KEY" and "Sales_Fact"."PRODUCT_KEY" <> 30001 group by "Products"."PRODUCT_LINE_CODE" ) "D2" FULL OUTER JOIN ( select "Products"."PRODUCT_LINE_CODE" AS "Product_Line", sum("Returns_Fact"."RETURN_QUANTITY") AS "Returns_Quantity" from "GOSALESDW"."gosalesdw"."SLS_PRODUCT_DIM" "Products", "GOSALESDW"."gosalesdw"."DIST_RETURNED_ITEMS_FACT" "Returns_Fact" where "Products"."PRODUCT_KEY" = "Returns_Fact"."PRODUCT_KEY" group by "Products"."PRODUCT_LINE_CODE" ) "D3" on "D2"."Product_Line" = "D3"."Product_Line"
The important things to note about the query are:
- The first sub query is for Sales Quantity and Products. We can see the query is aggregating the data to Product Line level.
- The second sub query is for Returns Quantity and Products. We can see the query is aggregating the data to Product Line level.
- We can see data being stitched by a full outer join at Product Line level.
We will now proceed to discuss some unexpected results that users may encounter when IBM Cognos performs a stitch query. These include calculations,filtering, counting and aggregating.
Calculations that Involve Measures from Different Facts
(A note to the reader: The issue covered in this chapter is discussed in another IBM Cognos Proven Practice titled Ensure Non-Null Values for Calculations in IBM Cognos 8 Report Studio. The issue is covered again here to provide a complete overview of stitch queries that can cause unexpected results for users. Later chapters of this document are, at the time of writing, exclusive to this document.)
Stitch queries can introduce null values into a query result set when nulls do not appear in the underlying data. In our example of sales and returns data, imagine that the company has just started selling a new product and that we have yet to have any returns for that product. What would the result set look like for that row of a sales and returns report? The stitch query ensures we will get a row for the new product with the valid sales figure and as we have no returns, the stitch query will show a null for the returns value.
Now imagine a user tries to perform a simple calculation between measures that come from different facts and that either sub-query includes data for a dimension member that is not within the other sub query. For example, the user may want to perform a variance calculation called Sales Returns Variance between the sales and returns measures such as Sales Quantity - Returns Quantity. We may have sales and no returns or we may have returns and no sales.
If we have sales records for a product and no returns, then after the stitch operation we will have values for sales and nulls for returns. In SQL, unknown is null. Null is very different from a zero. Zero means I know I had none back. Null means I don’t know if I had some or none back. In this case, given the way our data is modelled we could assume null returns equates to zero.
With SQL, a calculation involving a null results in a null. Commonly authors and report consumers will expect a null to be substituted with a zero.
Again in our example, if our data looked like this, our users would probably want a Sales Returns Variance value as shown in the table below.
|Product||Sold||Returns||Sales Returns Variance|
Instead the nulls for either fact will return null for the Sales Returns Variance calculation as shown in the table below.
|Product||Sold||Returns||Sales Returns Variance|
The first thing an inexperienced Framework Manager modeller will usually do to attempt to solve this issue is to put a calculation around the sales and returns measures in Framework Manager to say if these are null then substitute with a zero. For example, the modeller may use the coalesce function to substitute a null for a zero. However the IBM Cognos BI query engine will push this calculation into the sub-query as shown in the SQL below.
select (coalesce("D2"."Product_Line", "D3"."Product_Line")) "Product_Line" , "D2"."Sales_Quantity" "Sales_Quantity" , "D3"."Return_Quantity" "Return_Quantity" from ( select "Product"."PRODUCT_LINE_CODE" "Product_Line" , sum((coalesce("Fact_Sales"."QUANTITY", 0))) "Sales_Quantity" from "GOSALESDW"."SLS_PRODUCT_DIM" "Product", "GOSALESDW"."SLS_SALES_FACT" "Fact_Sales" where "Product"."PRODUCT_KEY" = "Fact_Sales"."PRODUCT_KEY" and "Fact_Sales"."PRODUCT_KEY" <> 30001 group by "Product"."PRODUCT_LINE_CODE" ) "D2" full outer join ( select "Product"."PRODUCT_LINE_CODE" "Product_Line" , sum((coalesce("F_Returns"."RETURN_QUANTITY", 0))) "Return_Quantity" from "GOSALESDW"."SLS_PRODUCT_DIM" "Product", "GOSALESDW"."DIST_RETURNED_ITEMS_FACT" "F_Returns" where "Product"."PRODUCT_KEY" = "F_Returns"."PRODUCT_KEY" group by "Product"."PRODUCT_LINE_CODE" ) "D3" on "D2"."Product_Line" = "D3"."Product_Line" FOR FETCH ONLY
We can see the Framework Manager modeller doing a null substitution. The bold text in the SQL shows this pushed into the sub-query, whereas the null substitution is needed within the parent query (after the stitch). At the sub-query level there are no nulls. The nulls only occur after the stitch.
There are documented methods around this issue.
Within Report Studio or Business Insight Advanced, the simple Sales Quantity – Returns Quantity calculation can be replaced with something that checks for nulls as shown below.
coalesce([Sales].[Sales Fact].[Sales Quantity] - [Returns].[Returns Fact].[Returns Quantity], [Sales].[Sales Fact].[Sales Quantity], - [Returns].[Returns Fact].[Returns Quantity], 0)
The calculation has to be written in a way that forces it to happen after the stitch. If the calculation is not written in a way that forces it to happen after the stitch, then we are back in the previous position. In this case, the coalesce function includes facts from each fact table. In order for this calculation to be possible, it must be applied after both fact sub-queries have been returned. The coalesce first evaluates the variance calculation we are interested in. If it is null, then it will attempt to return Sales Quantity. If that is null, then a negative instance of Returns Quantity is returned. And finally, if that is null, then it returns a 0.
However, for Query Studio users there is no easy way to get a simple variance working for multiple fact tables. The only way to do this is to put the null checking variance calculation using the coalesce function in the Framework Manager model. Query Studio users can then use the appropriate variance calculation. This does mean that all variances between facts that are needed by Query Studio users will have to be created by the Framework Manager modeller.
Illustration 4 shows an example of a Framework Manager model using a stand-alone calculation called Sales Return Variance located in a Calcs folder under the Sales namespace. The definition is the same coalesce statement shown earlier. This null checking calculation is then published to IBM Cognos Connection and available to authors.
Illustration 4: Framework Manager model displaying a stand-alone Sales Returns Variance calculation
Depending on the nature of your calculation and the desired results, you may need to ensure that the Regular Aggregate property for the stand-alone calculation in the model is set to Calculated. This property aggregates the values first and then performs the calculation instead of calculating the details first and then aggregating the results. This order of operations is only possible with stand-alone calculations in Framework Manager or calculations created in the studios. It does not apply to calculations embedded in query subjects.
Illustration 5 shows that the model calculation works when a Sales Quantity value is null. A Sales Returns Variance value is still returned since the Return Quantity is not null.
Illustration 5: Query Studio report illustrating a null value for Sales Quantity, a value of 39,569 for Return Quantity, and a value of -39,569 for Sales Return Variance
Take note that the count of Product Number for Product Type 951 is a value of 10. This count will be revisited in later examples.
Illustration 6 shows that the model calculation works when a Return Quantity value is null. A Sales Returns Variance value is still returned since the Sales Quantity is not null.
Illustration 6: Query Studio report illustrating a null value for Return Quantity, a value of 1,062 for Sales Quantity, and a value of 1,062 for Sales Return Variance
Counting a Common Dimension Component that is Not on the Report
On the previous Query Studio screen shots we could see the count of products for each product type. While this count works when product number is on the report, it does not work when product number is not on the report. Why is this?
In Report Studio, create a report that has four columns; Product Type, Product Number, Sales Quantity, and Returns Quantity. The report has two dimension levels, Product Type and Product Number, and three measures.
For starters, the question in the report just created above is ambiguous. Do we want the count of products sold, the count of products returned, the count of products that were either sold or returned or a count of products for that product type regardless of the other facts?
Let us take take a look at the more complex example and count products that were either sold or returned by changing the Aggregate Function property for Product Number to Count Distinct. Remember in the previous example the products count we got for Product Type 951 was 10. This report returns a count of 1 for Product Type 951 as seen in Illustration 7 below.
Illustration 7: A simple list report with a Count Distinct applied for Product Number which returns the unexpected result of 1 for each Product Type
This is clearly incorrect. Why?
Note that this issue has nothing to do with the first example discussed in this document. The example in this case has no variance column.
Knowing what we know about the IBM Cognos BI query engine, we know that the data is stitched together at the level of data that is required by the report, in this case Product Type. The sub- queries will not group on Product Number as Product Number is not on the report but rather a count distinct of Product Number. Hence, after the stitch, there are no product number details to count. This is why we see the correct values in the previous Query Studio examples. The stitch is done at the Product Number level. Hence, the stitch has product numbers to count.
If you examine the SQL of the incorrect results the problem is easily identified by looking at the sub- queries of the stitch query. There, you will see that a MIN function is used on Product Number which is returned in each sub-query.
As the query is returning and grouping data at the Product Type level, everything else within that query that is not grouped (as dictated by SQL) must have an aggregation function as seen in the sub- query SQL below.
select "Products"."PRODUCT_TYPE_CODE" AS "Product_Type", sum("Sales_Fact"."QUANTITY") AS "Sales_Quantity", min("Products"."PRODUCT_NUMBER") AS "rc" from "GOSALESDW-C10-RP1"."gosalesdw"."SLS_PRODUCT_DIM" "Products", "GOSALESDW-C10-RP1"."gosalesdw"."SLS_SALES_FACT" "Sales_Fact" where "Products"."PRODUCT_KEY" = "Sales_Fact"."PRODUCT_KEY" and "Sales_Fact"."PRODUCT_KEY" <> 30001 group by "Products"."PRODUCT_TYPE_CODE" ) "D9" ....
The behavior of the IBM Cognos query engine is to perform a MIN function when aggregating a non-measure column. This is a simple way of cleansing the data so that two names from the same column only produce a single row. You can see the MIN function being used on the Product Number column.
The query will return the MIN product number, which will be a single product number. At the top of the stitch query, you will see the count being performed.
If you count how many products have the MIN product number, you get 1. Which is the value seen on the report. With the existing Framework Manager model, there is no work around for Query Studio users but we will take a look at a solution shortly.
With Report Studio, a work around is to use a sub-query or what we will refer to as a base query.
- Create a query with Product Type, Product Number, Sales Quantity, and Returns Quantity as shown in Illustration 8.
Illustration 8: Report Studio query Data Items pane
Give this query the name Base Query. This base query will be used in later examples.
- Create a second query called Count Products and in the Query Explorer, drag Base Query created in Step 1 to the right of the Count Products query. This creates a shortcut reference to the Base Query which now becomes a source for the Count Products query as shown below.
Illustration 9: Query Explorer showing the Count Products query based on Base Query
- Double-click the Count Products query and then add all the data items from Base Query.
Illustration 10: All data items from Base Query added to Count Products query
- In the Products Count query, set the Aggregation Function property for Product Number to Count Distinct. Set the Aggregate Function property for Sales Quantity and Returns Quantity to Total. This query can then be used to populate a list report as shown in the report output below which now shows the correct value of 10 for Product Type 951.
Illustration 11: Report showing correct values for a count distinct of Product Number for Product Types
Why does this method work?
The Base Query ensures that the data is stitched at the Product Number level. The result of the stitch query includes the Product Number. This is then passed into the parent query where the Product Number is counted. This can be seen within the query SQL.
select "Base_Query16"."Product_Type" AS "Product_Type", count(distinct "Base_Query16"."Product_Number") AS "Product_Number", sum("Base_Query16"."Sales_Quantity") AS "Sales_Quantity", sum("Base_Query16"."Returns_Quantity") AS "Returns_Quantity" from ( select (coalesce("D9"."Product_Type", "D10"."Product_Type")) AS "Product_Type", (coalesce("D9"."Product_Number", "D10"."Product_Number")) AS "Product_Number", "D10"."Returns_Quantity" AS "Returns_Quantity", "D9"."Sales_Quantity" AS "Sales_Quantity" from FULL OUTER JOIN on "D9"."Product_Type" = "D10"."Product_Type" and "D9"."Product_Number" = "D10"."Product_Number" ) "Base_Query16" group by "Base_Query16"."Product_Type"
We can see the column projection list of Base Query (the first sub query in the SQL) includes the Product Number. Within the inner queries (shown in bold italics) we can see the Product Number is included within the Group By clause. Hence there is no need for an aggregation clause on its column. Finally the parent query does the count of the returned Product Numbers.
While this is suitable for Report Studio, how can we let Query Studio users count the number of products?
For simple counts, an additional fact query subject can be introduced into the Framework Manager model. In this case we will create one called Products Count Fact. This query subject will contain a query item with an expression simply containing a value of 1. This will allow for a count of each product. Remember to set the Usage property to Fact and the Regular Aggregate property to Sum for this measure. A Product Key is also be included in this fact query subject in order to create a relationship to the Products dimension. The key can actually be taken from the same underlying SLS_PRODUCT_DIM table. The Products Count Fact query subject will be modelled the same as any other fact in that it will have a relationship with it's related dimensions, in this case to the Products dimension as seen in Illustration 12.
Illustration 12: Products dimension with a one-to-many relationship to the new Products Count Fact query subject
Within the Presentation View, the new fact is represented as a new star schema grouping named Products Count containing shortcuts to Product Count Fact and Products as seen in Illustration 13.
Illustration 13: Presentation View showing the new Products Count star schema grouping
Within Query Studio, the Product Count measure can be used the same as any other measure as seen in Illustration 14.
Illustration 14: Query Studio report showing the new Product Count fact in the report showing correct roll-up values
There are some caveats to this approach:
- A fact query subject needs to be added to the model for each count that is needed for reporting.
- The result counts all Products regardless of if there has been a sale or return.
- Filters applied to the Products dimension will also apply to the Product Count. For example, for a query filtered to only show Red products, the product count would only be for Red products. For specific counts where the user requirements are known, this method is suitable.
If the user needs to qualify the question differently about which products they want to count (for example, count the returned products), then an appropriate column can be introduced on the returns fact table. The user can then count this column within Query Studio. In our example we can see that each fact already has a Product Key column exposed to the user (typically they are not exposed but for this type of situation making them available can be useful). Since the relationship is 1:1 between the Product Key and Product Number we will use these columns. In the Framework Manager model's fact tables, we will rename the keys in the Returns Fact and Sales Fact query subjects to Returns Product Key and Sales Product Key respectively.
To create the desired report in Query Studio:
- Add Product Type, Returns Product Key, Sales Product Key, and Product Count to the report. The results appear as shown below.
Illustration 15: Query Studio Report showing actual Returns Product Key and Sales Product Key values
The actual key values are displayed in the report at this point.
- In separate steps, right-click on the Returns Product Key and Sales Product Key columns, select Summarize, click Advanced (in the top right corner of the Summarize pane), and then set Count Distinct for the Summary for cells: field and Total for the Summary for footers: field as shown below.
Illustration 16: The Summarize (advance) pane in Query Studio
- Click OK. The resulting Query Studio report will then show the appropriate counts. In this case for Product Type 951, Returns Product Key returns 10 and Sales Product Key returns 9.
Illustration 17: Query Studio report showing correct counts for both Returns Product Key and Sales Product Key
Advanced modellers may be aware that rather than having the user change the summarize options for a cell, this can be done directly within Framework Manager. Within Framework Manager, properties of the Returns Product Key and Sales Product Key columns could be changed where the usage attribute could be set to a fact and the aggregation property can be changed to Count Distinct. However, while this gives a model that produces the correct result in Report Studio, Query Studio ignores the count distinct and replaces it with a simple count.
Filtering on Stitch Query Facts
Filtering a stitch query on conformed dimensions will give the user the expected results. What may cause users issues is when they filter on a fact. In our example how would the user answer the question “Show me the sales of products which only have had returns?”.
In Query Studio, authors would set the Missing values section in the filter for Returns Quantity to “Leave out missing values” and may be tempted to apply the filter to “Individual values in the data source” as seen in Illustration 18.
Illustration 18: Query Studio filter for Returns Quantity set to filter on Individual values in the data source and leave out missing values
The Query Studio report for such a filter would appear as shown in Illustration 19 in which product number 125130 still shows up in the results even thought there are no returns.
Illustration 19: Query Studio report returning a record with no returns even though the author expected this record to be filtered out
Since the author has added both Sales Quantity and Returns Quantity to the report along with a conformed dimension, in this case Products, IBM Cognos BI will perform a stitch query. The filter on Return Quantity to not return missing values does not behave as expected in this scenario because the null values do not actually exist in the data itself. They are a product of the stitch query which returns all sales regardless of returns and all returns regardless of sales. Therefore, the report is still showing a row for Product 125130 even though it had no returns.
Again, this happens because the IBM Cognos BI query engine pushes the filter into the sub-queries of the stitch query, meaning the filter is applied before the stitch and before the nulls are actually introduced in to the final record set.
Within Query Studio, the proper way to answer the report in question is to apply the filter to the “Values in the report” rather than “Individual values in the data source” (the detail values in the data) as seen in Illustration 20. This setting is actually the default setting in Query Studio so an author would have to go out of their way to bring back the unexpected results.
Illustration 20: Query Studio filter for Returns Quantity set to filter on Values in the report and to leave out missing values
With the appropriate filter options set, the Query Studio report returns as expected and the product number row for 125130 is now left out of the results as seen in Illustration 21.
Illustration 21: Query Studio report now omitting record with no returns
Understanding these settings which either apply a filter prior to or after aggregation can help authors achieve their desired results.
Within Report Studio for the same type of report, the user could create a detail filter with the expression of “[Returns Quantity] is not null” and change the filter to “After auto aggregation” as seen in Illustration 22.
Illustration 22: Report Studio filter set to After auto aggregation
By default, Report Studio filters are set to “Before auto aggregation”, the opposite of Query Studio.
Another option for this type of report in Report Studio is to use the Base Query method described in the previous section. The Base Query gets the data and stitches it together. The parent query then filters the data after the stitch ensuring the null values introduced by the stitch are filtered out.
Users should always be wary when filtering in a stitch query if that filter is not on a conformed dimension. This could include,
- Filtering on a fact column as shown in this example
- Filtering on a non-conformed dimension (a dimension that is only applies to one of the facts in the query)
Total for Group Giving Incorrect Values with an After Stitch Calculation
In this scenario, when a calculation is created, such as a variance calculation, and then another calculation is based on the first calculation, values appear incorrectly.
This issue is easily demonstrated in Query Studio using one of the analytical functions. Using our previous example of a Sales Return Variance calculation (Sales Quantity minus Return Quantity) and the analytic function % of Total against the Sales Return Variance calculation, the values for the new calculation are much larger than they should be as seen in Illustration 23.
Illustration 23: List report with the % of total calculation displaying values much larger than expected
Currently there is no work around for this issue within Query Studio. To get the values required in this scenario, within Report Studio or Business Insight Advanced you can use the percentage function. The following is an example of the syntax that would be used in the calculation.
percentage([Sales Return Variance] for [Product Type])
This will return the expected values for the % of total calculation as seen in Illustration 24.
Illustration 24: List report with the % of total calculation displaying the correct results
Stitch queries are a powerful report engine technique that allow multiple fact queries to return the correct data. However there are some circumstances where a little more thought is required to ensure the query returns the expected results.
Users who are likely to produce stitch queries, regardless of what studio they are in, should be aware of the edge cases shown within this document.
Within Report Studio the base query method will work against most of the shown stitch query issues. The base query method is not available within Query Studio or Business Insight Advanced. For these users, the Framework Manager modeller may alleviate the issues with correct modelling. Some of these techniques have been shown. It may even be the case that the data modeller may take the view that instead of using stitch queries and multiple fact tables the data may be better modelled using a single fact table. This will prevent stitch queries being generated and business authors encountering the shown issues.
Creating the correct data model, be that in the physical data warehouse or the Framework Manager model, is not done by just looking at the data. Knowing how the data will be used, and in what products, can also have a significant influence on the final data model. For that reason you should ensure these areas have been given significant thought when finalizing the data design.