Hi

Assuming I have 10 lists in a window, how to check for tuples over all the 10 lists based on two column values? The lists can be unequal in size.

Basically I am trying to find the average distance between a pair among many entities over a t unit sliding window. Hence for each instance of time I am thinking of :

1. collecting all the tuples for an instance of time.

2. then take a self join and find the distance. More filtering to tackle:

- association with itself <A,A>, which is not required.

- association for both the pairs <A,B> and <B,A>, which is also not required

3. Collect the list in a t unit sliding window.

4. find the average distance of each pair considering all the lists over t unit sliding window and filter those with distance less than threshold.

I am running IBM InfoSphere Streams v2. Any pointers will be helpful.