I am going to setup a filter operator to eliminate unwanted/unnecessary records prior to processing. Essentially I only want to look thru the data for certain keywords and exclude everything else. The issue I have is that these "Keywords" need to be dynamic as end users will be controlling these thru a web interface. There will be a few dozen users that will be able to update these keywords. My thoughts are that I would store these keywords in a database (postgres or Oracle) and would need to query these keywords for every tuple. I anticipate a few keywords changing every few seconds. Does anyone have any suggestions for best practices for performance and architecture? Essentially data will be flowing thru and at any given point in time there will be active users who have an interest in the data via keywords. As the data flows if keywords aren't found then the data can be trashed. I know there is overhead with databases so I'm hoping there might be some type of cache or something that can be dynamic???
Any thoughts or advice I would appreciate it.
This topic has been locked.
Pinned topic Best practice for eliminating data based on user provided input thru webapp
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
travis2k4 270004YBAQ4 Posts
oakstream 270005M93S9 Posts
Re: Best practice for eliminating data based on user provided input thru webapp2012-08-28T15:17:02ZThis is the accepted answer. This is the accepted answer.
- travis2k4 270004YBAQ
Thanks alot, This definately helps. I've been able to try most of what you have suggested and it works great. I just had a question about the InetSource. Is this the operator you would use to connect to a rest service? Are you just connecting once with the INETSource upon job startup to get the keywords? Or do you periodically go back and check. Doesn't look like I need to do this if I'm writing the keywords to the TCPSource.
Another thing I'm doing is inserting records into a database. My database is Postgresql which doesn't look like is supported at least thru their already developed operators so I have to go thru a rest service to update the data. Just wondering if the INET operator is what I would use for rest connection thru streams or if there is another one that should be used. (and whether you know of any examples of doing this)
Thanks again for your help. This is definately helpful.
travis2k4 270004YBAQ4 Posts
Re: Best practice for eliminating data based on user provided input thru webapp2012-08-28T23:33:21ZThis is the accepted answer. This is the accepted answer.
- oakstream 270005M93S
- It queries the provided URLs every so often. In our case we only want it to load the URL once. To get round this I copied the operator's code from the toolkits folder inside the Streams install and created a new version of the operator. I altered this new version so that it the "fetchIntervalSeconds" parameter is negative, then the operator only requests the URL once.
- InetSource outputs a stream with a single attribute which is an rstring of the contents of the URL that was requested so it doesn't do any parsing of the returned data. The REST API I use returns JSON. Luckily there are some JSON parsing operators available in the Streams Exchange (https://www.ibm.com/developerworks/mydeveloperworks/files/app?lang=en#/person/060002871K/file/d8bd5118-4587-4b3e-b43e-4b1717f8691f) so I pass the output from my altered InetSource operator into one of these operators to decode the JSON into a Streams tuple.
Looking at the Streams Exchange it looks like some developers have contributed some other operators to help with making HTTP requests and it looks like there is a toolkit for connecting to REST APIs too. So it looks like you might be in a better position than I was. Take a look at the inet_ssb_v1.0.6.tar.gz and HTTPUtils V1.0.tgz toolkits available at