Friday, August 19, 2011

User Defined Aggregates


Until recently I had no use for User Defined Aggregates, that was until I was ridiculed by my MySQL counterparts over the fact that Informix does not have a group_concat aggregate function.

This will not stand, ya know, this aggression will not stand, man.

A quick search of the internets produced this Stack Overflow post which shows how to create a User Defined Aggregate in Informix that will duplicate the MySQL group_concat aggregate.


Oh, if you're wondering what group_concat does, it concatenates the string fields of a table like so:
id                 value
1                  ABC
2                  DEF
1                  GHI
1                  JKL
2                  MNO
3                  PQR

select
   id,
   group_concat(value)
from
   tab
group by
   id;

1     ABC, GHI, JKL
2     DEF, MNO
3     PQR
Creating a User Defined Aggreate involves creating 4 support functions in SPL, C or Java.
  • Initialize - Initializes the aggregate
  • Iteration - Merges results of previous iterations of the aggregate with the current value
  • Combine - Combines results of 2 iterations when the aggregate is run in parallel
  • Final - Returns the result of the aggregation
Once the support functions are created you define an aggregate with the CREATE AGGREGATE statement.

Here is an example that implements a User Defined Aggregate that returns the second smallest value in a group of values using SPL.
create function second_min_init (dummy float)
returning row(min1 float, min2 float);
   return row(NULL, NULL);
end function;

-- take 3 values, row.min1, row.min2 and value and return row.min1 and row.min2 that contain
-- the 2 smallest of the 3
create function second_min_iter(result row(min1 float, min2 float), value float)
returning row(min1 float, min2 float);
   if result.min1 is null then
      return row(value, NULL);
   end if;

   if result.min2 is null then
      if value <= result.min1 then
         return row(value, result.min1);
      else
         return row(result.min1, value);
      end if;
   end if;

   if value <= result.min1 then
      return row(value, result.min1);
   elif value <= result.min2 then
      return row(result.min1, value);
   else
      return row(result.min1, result.min2);
   end if;
end function;

-- take 2 sets of min1, min2 and return the minimum and second minimum values out of the 4 values
create function second_min_combine (partial1 row(min1 float, min2 float), partial2 row(min1 float, min2 float))
returning row(min1 float, min2 float);
   define min1 float;
   define min2 float;

   let min1 = partial1.min1;
   let min2 = partial1.min2;

   if min1 is null or partial2.min1 <= min1 then
      let min2 = min1;
      let min1 = partial2.min1;

      if min2 is null or partial2.min2 <= min2 then
         let min2 = partial2.min2;
      end if;
   elif min2 is null or partial2.min1 <= min2 then
      let min2 = partial2.min1;
   end if;

   return row(min1, min2);
end function;

-- return the second smallest value
create function second_min_final(final row(min1 float, min2 float))
returning float;
   return final.min2;
end function;

create aggregate second_min with (
   init = second_min_init,
   iter = second_min_iter,
   combine = second_min_combine,
   final = second_min_final
);

create temp table second_min_test (
   field1        integer
) with no log;

insert into second_min_test values (1);
insert into second_min_test values (2);
insert into second_min_test values (3);
insert into second_min_test values (4);

select
   second_min(field1)
from
   second_min_test;

    second_min

2.000000000000

insert into second_min_test values (1);

select
   second_min(field1)
from
   second_min_test;

    second_min

1.000000000000   
Notice how adding a second row with field1 of 1 causes the second_min aggregate to return 1. This is because the smallest value is 1 and the second smallest value is also 1. If you would like the aggregate to return the second smallest distinct value you can change less than or equal comparisons in the support functions to less than.

No comments:

Post a Comment