Assuming I have 10 lists in a window, how to check for tuples over all the 10 lists based on two column values? The lists can be unequal in size.
Basically I am trying to find the average distance between a pair among many entities over a t unit sliding window. Hence for each instance of time I am thinking of :
1. collecting all the tuples for an instance of time.
2. then take a self join and find the distance. More filtering to tackle:
- association with itself <A,A>, which is not required.
- association for both the pairs <A,B> and <B,A>, which is also not required
3. Collect the list in a t unit sliding window.
4. find the average distance of each pair considering all the lists over t unit sliding window and filter those with distance less than threshold.
I am running IBM InfoSphere Streams v2. Any pointers will be helpful.