Topic
  • 10 replies
  • Latest Post - ‏2010-12-15T23:12:32Z by shails
shails
shails
15 Posts

Pinned topic Data Rules in IA

‏2010-12-01T17:44:42Z |
I have a query with joins on 4 tables and a couple of filter conditions. The query returns 5 columns as output and has a count(distinct(colA)) as one of the output columns. I am trying to implement this query using Data Rules functionality in IA and I was able to replicate the joins and filter conditions but not count(distinct(colA)) .
When I try to Generate Rule using a Data Rule definition, I can only use count function on the columns that are defined in the Rule definition but I cannot use the count function on any of the columns from the data source list. If I have to define count(colA) function in the rule definition, the count function must be associated with one of the conditions like > or < or >= or <=.
Can someone suggest if there is a better way of doing this? Datastage is not a option here. This should be done using Data Rules functionality only. That is the requirement.
Thanks in advance.
Updated on 2010-12-15T23:12:32Z at 2010-12-15T23:12:32Z by shails
  • smithha
    smithha
    162 Posts

    Re: Data Rules in IA

    ‏2010-12-02T20:14:06Z  
    Output expressions can only be defined on rule variables. This is as expected as those are the fields that will be sent into the rule scoring engine.
    The workaround is however quite simple: just add a dummy condition in the rule using the column. For instance an easy option would be: Col exists where Col is the variable; and if there is the potential for ambiguous results you could do something like: Col exists OR Col Not exists

    You can then define count(col) in the output. As of the 8.1.2 release, you can also apply a group by condition to the count function. However, there is not currently a Distinct option for the count function, so you might not be able to achieve your goal.

    Harald
  • smithha
    smithha
    162 Posts

    Re: Data Rules in IA

    ‏2010-12-06T12:50:53Z  
    • smithha
    • ‏2010-12-02T20:14:06Z
    Output expressions can only be defined on rule variables. This is as expected as those are the fields that will be sent into the rule scoring engine.
    The workaround is however quite simple: just add a dummy condition in the rule using the column. For instance an easy option would be: Col exists where Col is the variable; and if there is the potential for ambiguous results you could do something like: Col exists OR Col Not exists

    You can then define count(col) in the output. As of the 8.1.2 release, you can also apply a group by condition to the count function. However, there is not currently a Distinct option for the count function, so you might not be able to achieve your goal.

    Harald
    I should clarify one point in my prior post:

    The count function will provide a distinct total and if applied on a sub-group will provide a distinct total for the sub-group. What you will not get is a distinct single line per sub-group, but you can include the sub-group count in the output.

    Example:
    GroupID ColB
    01 X
    01 Y
    01 Z
    02 X
    03 W

    Including a ColB Exists and GroupID Exists in the rule logic and applying a Count(ColB, GroupID) in the output expression would yield output of:

    RecID GroupID ColB Count(ColB, GroupID)
    1 01 X 3
    2 01 Y 3
    3 01 Z 3
    4 02 X 1
    5 03 W 1

    Harald
  • shails
    shails
    15 Posts

    Re: Data Rules in IA

    ‏2010-12-06T14:21:18Z  
    • smithha
    • ‏2010-12-06T12:50:53Z
    I should clarify one point in my prior post:

    The count function will provide a distinct total and if applied on a sub-group will provide a distinct total for the sub-group. What you will not get is a distinct single line per sub-group, but you can include the sub-group count in the output.

    Example:
    GroupID ColB
    01 X
    01 Y
    01 Z
    02 X
    03 W

    Including a ColB Exists and GroupID Exists in the rule logic and applying a Count(ColB, GroupID) in the output expression would yield output of:

    RecID GroupID ColB Count(ColB, GroupID)
    1 01 X 3
    2 01 Y 3
    3 01 Z 3
    4 02 X 1
    5 03 W 1

    Harald
    Hi,

    Thanks for your reply. Count(colA) >=0 is the option I am using to get my result. However before proceeding to that step I noticed that my inner join is not working.

    I have table A with 5 million recs and table B with 2000 recs. There is a key column in both the tables and using a inner join I have to test if every key column in table B has a matching key in Table A. I created a rule definition and generated a data rule. When I run the rule I should get 2000 recs as output but I see the output as 5 million recs. All the data values in table B is a subset of data values in table A.

    I am using the following condition in the rule definition.
    TableB_columnname in_reference_column TableA_columnname.

    Can you please let me know if this is correct or am I missing something. However when I enabled the debugging log for 20 recs, I see the matching records in the debugging log with the rule condition set to True. The problem is number of output records is more than expected. The output has 3 columns from Table A and 2 columns from Table B.
  • shails
    shails
    15 Posts

    Re: Data Rules in IA

    ‏2010-12-06T18:32:28Z  
    • shails
    • ‏2010-12-06T14:21:18Z
    Hi,

    Thanks for your reply. Count(colA) >=0 is the option I am using to get my result. However before proceeding to that step I noticed that my inner join is not working.

    I have table A with 5 million recs and table B with 2000 recs. There is a key column in both the tables and using a inner join I have to test if every key column in table B has a matching key in Table A. I created a rule definition and generated a data rule. When I run the rule I should get 2000 recs as output but I see the output as 5 million recs. All the data values in table B is a subset of data values in table A.

    I am using the following condition in the rule definition.
    TableB_columnname in_reference_column TableA_columnname.

    Can you please let me know if this is correct or am I missing something. However when I enabled the debugging log for 20 recs, I see the matching records in the debugging log with the rule condition set to True. The problem is number of output records is more than expected. The output has 3 columns from Table A and 2 columns from Table B.
    Hi,
    Please ignore my previous post. Since I ran the data rule on a sample of 2000 recs from table B, I assumed that the output should be 2000 recs. But the fact is data rules doesn't consider the sample size from column analysis results when you are running the data rule and so it runs on the complete dataset. So now my counts are matching. Thank you.
  • shails
    shails
    15 Posts

    Re: Data Rules in IA

    ‏2010-12-07T21:31:53Z  
    • shails
    • ‏2010-12-06T18:32:28Z
    Hi,
    Please ignore my previous post. Since I ran the data rule on a sample of 2000 recs from table B, I assumed that the output should be 2000 recs. But the fact is data rules doesn't consider the sample size from column analysis results when you are running the data rule and so it runs on the complete dataset. So now my counts are matching. Thank you.
    Hi Harald,

    When I create a data rule with a inner join on 2 tables, the rule is working fine. But when I add a "AND" condition to add another inner join on the 3rd table, the rule doesn't work. Its giving me 0 records as output. All the records has a matching key value from the 3rd table. So ideally I should get the same number of output records with the second join.
    Can you suggest any alternative to make this rule working.

    Regards,
  • smithha
    smithha
    162 Posts

    Re: Data Rules in IA

    ‏2010-12-08T14:28:15Z  
    • shails
    • ‏2010-12-07T21:31:53Z
    Hi Harald,

    When I create a data rule with a inner join on 2 tables, the rule is working fine. But when I add a "AND" condition to add another inner join on the 3rd table, the rule doesn't work. Its giving me 0 records as output. All the records has a matching key value from the 3rd table. So ideally I should get the same number of output records with the second join.
    Can you suggest any alternative to make this rule working.

    Regards,
    It's difficult to answer at that level. A Join with more than 2 tables is supported. We would need to see the rule, the join conditions, and maybe a data sample to diagnose the problem.

    Harald
  • shails
    shails
    15 Posts

    Re: Data Rules in IA

    ‏2010-12-08T15:38:24Z  
    • smithha
    • ‏2010-12-08T14:28:15Z
    It's difficult to answer at that level. A Join with more than 2 tables is supported. We would need to see the rule, the join conditions, and maybe a data sample to diagnose the problem.

    Harald
    Hi Harald,

    Thanks for your reply. I was browsing IBM website and I see that we need to install some patches to run a data rule with more than one join. I am checking with my sys admin to make sure we have the right patches. Here is the link that gives that information
    http://www-01.ibm.com/support/docview.wss?uid=swg1JR36624
    I will update the message once I get more information.

    To answer your previous question, I am trying to join 3 tables out of which 2 are dimension tables and one is a fact table. The fact table has 2 dimesion keys one from each dimension table. I am creating a data rule to test if the two dimension keys in fact table has a associated key value in dimension tables. It is a primary key- foreign key relationship testing. The data rule works if I create 2 joins in 2 different data rules but when I try to create one data rule with both the joins the number of output records is zero.

    Thanks,
  • shails
    shails
    15 Posts

    Re: Data Rules in IA

    ‏2010-12-15T21:38:28Z  
    I was able to do the count(colA) based on a single column. As you mentioned in your previous post, the output count is repeated for every record in the sub group. Do you know if there are any improvements on this in 8.5 version to get only one record within each sub group.
  • smithha
    smithha
    162 Posts

    Re: Data Rules in IA

    ‏2010-12-15T21:56:47Z  
    • shails
    • ‏2010-12-15T21:38:28Z
    I was able to do the count(colA) based on a single column. As you mentioned in your previous post, the output count is repeated for every record in the sub group. Do you know if there are any improvements on this in 8.5 version to get only one record within each sub group.
    Good to hear you were able to get the count function working. As for the output count repetition, there are no changes for this in the 8.5 version, but it is a noted item and we are looking at subsequent enhancement opportunities.

    Harald
  • shails
    shails
    15 Posts

    Re: Data Rules in IA

    ‏2010-12-15T23:12:32Z  
    An enhancement to this would be really helpful for the customers. I am working on data rules where the volume of data runs into millions of records (80 or more). Validation is becoming difficult with the repetitions. Thanks for looking into this.