UDA API methods

User defined aggregates are used in conjunction with a GROUP BY clause or the OVER clause. Implementing a UDA requires deeper understanding of how the Netezza database works internally when performing aggregations.

For a standard UDA used with a GROUP BY statement, the Netezza database processes the data by sequentially following the steps outlined here:
  1. For each row of data being processed, check to see if a row for the group the row is in has already been found. If it has, pass the row to the accumulate() method along with the current value of the state variables for that group. If the row belongs to a new group, first call the initState() method and then call the accumulate method with the newly initialized state values plus the row of data. The accumulate method updates the current state variables based on the row of data and return the updated state. No data is moved across the network until Step 2.
  2. Once all rows of data have been processed on each data slice, redistribute the final values of the state variables for each group of data so that all of the state variables for each group of data are moved to the same data slices (for example, if the GROUP BY was on customer_id, all of the states from each data slice for customer_id=1 would be moved to the same data slice).
  3. Merge all of the state variables together by calling the merge() method. On each call to merge, two sets of state variables are passed in. The merge() method merges the two states together and returns the result. The merge() method is called until there is only one state for each group based on the group by statement.
  4. For each of the remaining states, call the finalResult() method, passing in the final values for each state. The finalResult() method then calculates a result based on the state variables and returns the result to the Netezza database for each group.