i want to compare sum value from table_1 VS sum from table_2.
the rule set force me to perform join between the tables.
the join is not needed.
is there a way to do such compare without jion?
This topic has been locked.
5 replies Latest Post - 2013-01-09T14:17:11Z by smithha
Pinned topic best way to compare sum1 vs sum2 where join is not possible
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-01-09T14:17:11Z at 2013-01-09T14:17:11Z by smithha
Re: best way to compare sum1 vs sum2 where join is not possible2013-01-08T13:05:18Z in response to SystemAdminHi,
There was a similar question posed in the "Table Rows Count" thread, so you can see some of the same discussion there http://www.ibm.com/developerworks/forums/thread.jspa?threadID=465654&tstart=0
The simplest way is to create a rule for each table such as FieldX Exists and bind that to the key field. Each rule will generate statistics for Total Records and Total Rows Met (those should generally be equal on a key field).
Then create a Metric that compares the Total Records for table1 and the Total Records for table2. Assuming either one could be higher than the other a good Metric expression would be: *ABS(ruleA Total Records - ruleB TotalRecords) *
Include a Benchmark for the Metric so that you can indicate what variance, if any, you tolerate.
For more on logic for Metrics see: http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/topic/com.ibm.swg.im.iis.ia.quality.doc/topics/dq_functions_syntax_metrics.html
Re: best way to compare sum1 vs sum2 where join is not possible2013-01-08T14:07:26Z in response to smithhaHi,
your idea will work well with count1 VS count2.
I'm trying to compare sum VS sum.
Metric let you use number of record or % number of records but it is not possible to use the result of sum from your rule.
Re: best way to compare sum1 vs sum2 where join is not possible2013-01-08T17:52:59Z in response to SystemAdminHi,
You are correct that the metric approach won't work for sum vs. sum.
If you are using v8.7 or before, there is not going to be an easy solution since there is nothing in common to bring the sums together. The two approaches that are viable in v8.7 are:
1) Create rules to calculate the sums for each table; write the distinct sums plus a common key value to the output tables. Extract the values of the output tables via the API's and format as .csv files. Ensure you have ODBC definitions and the metadata imported for the .csv files. Add a third rule to read the .csv files and compare the sums using a join on the common key value.
2) Put the rules into distinct Rule Stages in a DataStage/QualityStage job where you read each file, run each through their respective rules, then bring the output together and compare the sums (could be in a third Rule Stage if you've put the two sums into a common record). Will allow you to evaluate the sums against each other, but won't have the results stored in the IA repository.
In the latest v9.1 release, there are a couple of added options taking advantage of the named output table support.
1) Create rules to calculate the sums for each table; write the sums plus a common key to distinct named output tables; add a third rule to read the named output tables and compare the sums using a join on the common key.
2) Create rules to calculate the sums for each table; write the sums plus a common key to the same named output table; add a third rule to read the named output table and compare the max and min value for the sum column for the common key (avoids a join and minimizes the number of named output tables needed even where you have multiple sums you want to compare).
Re: best way to compare sum1 vs sum2 where join is not possible2013-01-09T14:17:11Z in response to SystemAdminYou cannot upgrade one component of Information Server without upgrading another -- they share application server and repository.
You could install a separate Information Server 9.1 instance on another server with only install Information Analyzer there. That would allow you to focus your information quality work, but you would have to assess impacts in regards to other Information Server components (e.g. Business Glossary, Metadata Workbench) that you may be using.