• No replies
10 Posts

Pinned topic Dynamic Regex Generation for Search

‏2012-08-07T17:57:00Z |
Hi All -

Have a general search design question I could use help with -

Using stock symbol search as an example -

1. Data is precrawled into Omnifind index and available for searching. (e.g. AAPL, MSFT, GOOG ... thousands of stock symbols)

2. Custom code runs and asks user to provide 3 stock symbols. (ANY 3 out of thousands possible).

3. In a loop of 3, we now want to run a regex-based Omnifind search for each symbol - e.g. if user inputted NYX, F, MSFT then we form simple boundary word regex as -

NYX -- > \bN\bY\bX
F -- > \bF
MSFT -- > \bM\bS\bF\bT

4. Key design point here is to NOT use PEAR files uploaded ahead of time. This is for 2 reasons -

- the master-list of symbols is unknown and changes often - it's practically impossible to write regex/pear for an unbound data universe.
- the regex above is very simple (and not much useful), but the real solution will have multiple levels of regex complexity. So we want to write regex-generator java functions which create the expression for the loop of 3, and fire the search to the index.

Would appreciate thoughts on feasibility. I would expect UIMA/Omnifind to handle dynamic generation of regex based searches, without having to pre-define in pear files. I haven't seen examples, but hoping we aren't the first one attempting to do this.. Thanks in advance!!