Topic
  • 5 replies
  • Latest Post - ‏2013-01-14T21:21:50Z by SystemAdmin
SystemAdmin
SystemAdmin
294 Posts

Pinned topic Using Group by in Information Analyzer Data Rules

‏2013-01-09T14:17:59Z |
Hi all,

I am trying to apply this SQL statement using Information Analyzer data rule, But I don't know how to use group by

Select student_id,address_line, floor from Address
where floor is NULL
group by student_id,address_line, floor
having count( address_line ) > 1 ;
Your help is really appreciated,
Sami.
Updated on 2013-01-14T21:21:50Z at 2013-01-14T21:21:50Z by SystemAdmin
  • smithha
    smithha
    162 Posts

    Re: Using Group by in Information Analyzer Data Rules

    ‏2013-01-10T13:38:44Z  
    When going from one style of rule implementation (SQL) to another (Information Analyzer), it's important to step back from the code and ask: what is the rule I'm evaluating?

    Are you trying to evaluate how many repeated addresses exist?
    Or how many repeated addresses exist where the floor is null?
    Or how many addresses exist where the floor is null?
    Or something else?
    And you generally want to express the rule as what is expected (i.e. are repeated addresses good or bad?)
    The answers will define what you need to code in the rule definition.

    Let's say that the expectation is that all students have an address which includes a floor.
    The rule definition could simply be: studentID EXISTS AND address EXISTS AND floor EXISTS
    or it could be: studentID EXISTS AND address EXISTS AND floor EXISTS AND count(address, studentID) = 1

    In generating the rule from either of these definitions, you would bind the variables address (to address_line), studentID (to student_id) and floor (to floor).
    For the output, you would select the Does Not Meet Rule criteria (you want to see exceptions).
    If you used the first of the rule definitions, for the output detail you could include a Custom Expression: count(address, studentID)
    If you used the second of the rule definitions, the count will be one of the variables and again could include it in the output detail.
    This expression would give you the count of addresses grouped by studentID.

    Now, you may still have other factors to address. Possibly you want to limit the entire evaluation to only those records where the floor is NULL. If so, you could define a virtual table with that condition and do a much tighter evaluation, but it depends on whether that is appropriate for the condition you are evaluating or not.

    Harald
  • SystemAdmin
    SystemAdmin
    294 Posts

    Re: Using Group by in Information Analyzer Data Rules

    ‏2013-01-10T14:29:57Z  
    • smithha
    • ‏2013-01-10T13:38:44Z
    When going from one style of rule implementation (SQL) to another (Information Analyzer), it's important to step back from the code and ask: what is the rule I'm evaluating?

    Are you trying to evaluate how many repeated addresses exist?
    Or how many repeated addresses exist where the floor is null?
    Or how many addresses exist where the floor is null?
    Or something else?
    And you generally want to express the rule as what is expected (i.e. are repeated addresses good or bad?)
    The answers will define what you need to code in the rule definition.

    Let's say that the expectation is that all students have an address which includes a floor.
    The rule definition could simply be: studentID EXISTS AND address EXISTS AND floor EXISTS
    or it could be: studentID EXISTS AND address EXISTS AND floor EXISTS AND count(address, studentID) = 1

    In generating the rule from either of these definitions, you would bind the variables address (to address_line), studentID (to student_id) and floor (to floor).
    For the output, you would select the Does Not Meet Rule criteria (you want to see exceptions).
    If you used the first of the rule definitions, for the output detail you could include a Custom Expression: count(address, studentID)
    If you used the second of the rule definitions, the count will be one of the variables and again could include it in the output detail.
    This expression would give you the count of addresses grouped by studentID.

    Now, you may still have other factors to address. Possibly you want to limit the entire evaluation to only those records where the floor is NULL. If so, you could define a virtual table with that condition and do a much tighter evaluation, but it depends on whether that is appropriate for the condition you are evaluating or not.

    Harald
    Thanks Harald for your replay. I really appreciate it.

    The rule I am trying to apply is to retrieve the students from the Address table who live in a building but have no apartment number

    -I set the floor value equal to null in the conditions because it means there is no apartment number
    -I used the aggregation function count(address_line) to determine if the address is building, since if more than one records lives in the same address it means it’s a building.
  • smithha
    smithha
    162 Posts

    Re: Using Group by in Information Analyzer Data Rules

    ‏2013-01-14T21:01:21Z  
    Thanks Harald for your replay. I really appreciate it.

    The rule I am trying to apply is to retrieve the students from the Address table who live in a building but have no apartment number

    -I set the floor value equal to null in the conditions because it means there is no apartment number
    -I used the aggregation function count(address_line) to determine if the address is building, since if more than one records lives in the same address it means it’s a building.
    One approach to the rule would be the following:

    IF count(studentID, address_line) > 1 THEN floor EXISTS

    This will do an initial test on the count of all students at the same address and only evaluate where the count > 1.
    For those records that pass the test, the expected condition is that the floor value exists.

    That would then allow you to choose for output type: Those records that do not meet the rule.
    NOTE: For consistency of results, the best practice for a rule definition is to establish the condition that is supposed to be true so that records that do not meet the rule are always exceptions.

    For output details, you would want to write out: studentID, address_line, floor (to validate it is empty). You could include the count(studentID, address_line) in the output as well if you want to see how many are grouped together.

    Harald
  • smithha
    smithha
    162 Posts

    Re: Using Group by in Information Analyzer Data Rules

    ‏2013-01-14T21:02:16Z  
    • smithha
    • ‏2013-01-14T21:01:21Z
    One approach to the rule would be the following:

    IF count(studentID, address_line) > 1 THEN floor EXISTS

    This will do an initial test on the count of all students at the same address and only evaluate where the count > 1.
    For those records that pass the test, the expected condition is that the floor value exists.

    That would then allow you to choose for output type: Those records that do not meet the rule.
    NOTE: For consistency of results, the best practice for a rule definition is to establish the condition that is supposed to be true so that records that do not meet the rule are always exceptions.

    For output details, you would want to write out: studentID, address_line, floor (to validate it is empty). You could include the count(studentID, address_line) in the output as well if you want to see how many are grouped together.

    Harald
    in the last post, the note is supposed to read:

    NOTE: For consistency of results, the best practice for a rule definition is to establish the condition that is supposed to be true so that records that do not meet the rule are always exceptions.
  • SystemAdmin
    SystemAdmin
    294 Posts

    Re: Using Group by in Information Analyzer Data Rules

    ‏2013-01-14T21:21:50Z  
    • smithha
    • ‏2013-01-14T21:02:16Z
    in the last post, the note is supposed to read:

    NOTE: For consistency of results, the best practice for a rule definition is to establish the condition that is supposed to be true so that records that do not meet the rule are always exceptions.
    Thanks Harald for your answer.