Topic
• 5 replies
• Latest Post - ‏2013-01-09T14:17:11Z by smithha
294 Posts

# Pinned topic best way to compare sum1 vs sum2 where join is not possible

‏2013-01-08T07:43:56Z |
i want to compare sum value from table_1 VS sum from table_2.
the rule set force me to perform join between the tables.
the join is not needed.
is there a way to do such compare without jion?
Updated on 2013-01-09T14:17:11Z at 2013-01-09T14:17:11Z by smithha
• smithha
161 Posts

#### Re: best way to compare sum1 vs sum2 where join is not possible

‏2013-01-08T13:05:18Z
Hi,

There was a similar question posed in the "Table Rows Count" thread, so you can see some of the same discussion there http://www.ibm.com/developerworks/forums/thread.jspa?threadID=465654&tstart=0

The simplest way is to create a rule for each table such as FieldX Exists and bind that to the key field. Each rule will generate statistics for Total Records and Total Rows Met (those should generally be equal on a key field).

Then create a Metric that compares the Total Records for table1 and the Total Records for table2. Assuming either one could be higher than the other a good Metric expression would be: *ABS(ruleA Total Records - ruleB TotalRecords) *
Include a Benchmark for the Metric so that you can indicate what variance, if any, you tolerate.

For more on logic for Metrics see: http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/topic/com.ibm.swg.im.iis.ia.quality.doc/topics/dq_functions_syntax_metrics.html

Harald
294 Posts

#### Re: best way to compare sum1 vs sum2 where join is not possible

‏2013-01-08T14:07:26Z
• smithha
• ‏2013-01-08T13:05:18Z
Hi,

There was a similar question posed in the "Table Rows Count" thread, so you can see some of the same discussion there http://www.ibm.com/developerworks/forums/thread.jspa?threadID=465654&tstart=0

The simplest way is to create a rule for each table such as FieldX Exists and bind that to the key field. Each rule will generate statistics for Total Records and Total Rows Met (those should generally be equal on a key field).

Then create a Metric that compares the Total Records for table1 and the Total Records for table2. Assuming either one could be higher than the other a good Metric expression would be: *ABS(ruleA Total Records - ruleB TotalRecords) *
Include a Benchmark for the Metric so that you can indicate what variance, if any, you tolerate.

For more on logic for Metrics see: http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/topic/com.ibm.swg.im.iis.ia.quality.doc/topics/dq_functions_syntax_metrics.html

Harald
Hi,

your idea will work well with count1 VS count2.
I'm trying to compare sum VS sum.
Metric let you use number of record or % number of records but it is not possible to use the result of sum from your rule.

thanks
• smithha
161 Posts

#### Re: best way to compare sum1 vs sum2 where join is not possible

‏2013-01-08T17:52:59Z
Hi,

your idea will work well with count1 VS count2.
I'm trying to compare sum VS sum.
Metric let you use number of record or % number of records but it is not possible to use the result of sum from your rule.

thanks
Hi,

You are correct that the metric approach won't work for sum vs. sum.

If you are using v8.7 or before, there is not going to be an easy solution since there is nothing in common to bring the sums together. The two approaches that are viable in v8.7 are:
1) Create rules to calculate the sums for each table; write the distinct sums plus a common key value to the output tables. Extract the values of the output tables via the API's and format as .csv files. Ensure you have ODBC definitions and the metadata imported for the .csv files. Add a third rule to read the .csv files and compare the sums using a join on the common key value.
2) Put the rules into distinct Rule Stages in a DataStage/QualityStage job where you read each file, run each through their respective rules, then bring the output together and compare the sums (could be in a third Rule Stage if you've put the two sums into a common record). Will allow you to evaluate the sums against each other, but won't have the results stored in the IA repository.

In the latest v9.1 release, there are a couple of added options taking advantage of the named output table support.
1) Create rules to calculate the sums for each table; write the sums plus a common key to distinct named output tables; add a third rule to read the named output tables and compare the sums using a join on the common key.
2) Create rules to calculate the sums for each table; write the sums plus a common key to the same named output table; add a third rule to read the named output table and compare the max and min value for the sum column for the common key (avoids a join and minimizes the number of named output tables needed even where you have multiple sums you want to compare).

Harald
294 Posts

#### Re: best way to compare sum1 vs sum2 where join is not possible

‏2013-01-09T08:01:33Z
• smithha
• ‏2013-01-08T17:52:59Z
Hi,

You are correct that the metric approach won't work for sum vs. sum.

If you are using v8.7 or before, there is not going to be an easy solution since there is nothing in common to bring the sums together. The two approaches that are viable in v8.7 are:
1) Create rules to calculate the sums for each table; write the distinct sums plus a common key value to the output tables. Extract the values of the output tables via the API's and format as .csv files. Ensure you have ODBC definitions and the metadata imported for the .csv files. Add a third rule to read the .csv files and compare the sums using a join on the common key value.
2) Put the rules into distinct Rule Stages in a DataStage/QualityStage job where you read each file, run each through their respective rules, then bring the output together and compare the sums (could be in a third Rule Stage if you've put the two sums into a common record). Will allow you to evaluate the sums against each other, but won't have the results stored in the IA repository.

In the latest v9.1 release, there are a couple of added options taking advantage of the named output table support.
1) Create rules to calculate the sums for each table; write the sums plus a common key to distinct named output tables; add a third rule to read the named output tables and compare the sums using a join on the common key.
2) Create rules to calculate the sums for each table; write the sums plus a common key to the same named output table; add a third rule to read the named output table and compare the max and min value for the sum column for the common key (avoids a join and minimizes the number of named output tables needed even where you have multiple sums you want to compare).

Harald
Hi,

Is there is a way to upgrade only the information analyzer from 8.7 to 9.1 version without upgrade the datastage?

tanks.
• smithha
161 Posts

#### Re: best way to compare sum1 vs sum2 where join is not possible

‏2013-01-09T14:17:11Z
Hi,