Node.js is hot, so hot that I got burnt.
I am working in an organisation that has to report to the customers regularly of what’s going on with their SLA. This organisation is also a famous security freak that everything must be compliant to certain policies and people also have to pass security clearance check to access any data.
When I was onboard the job I realized that there was no official tools to allow the network operations to retrieve the historical alarms from the database (mysql). At the first sight this seemed to be an easy job --- just deploy the Tivoli Common Reporting.
However due to the security checking in place a seemingly easy job turned out to be a very complicated one. Let’s say a NOC user wants to get the last one months alarms for any devices with their FQDN containing “.au.my-company.com”. I have to make sure the web application is able to go through each record of the query result and filter out the records the user is not allowed to see. TCR/Cognos has been hard enough for me so I wouldn’t even dare to try for this scenario.
I decided to write my own web application.
I know Perl and it seems to be a natural choice. How about Apache with a Perl CGI? I thought about that but given the uncertain nature of the returned data volume I am not sure if it’s a good idea to hold millions of records in memory and go through each of them. Maybe POE? But Perl doesn’t attract me these days anymore. I want to try something else.
I was thinking what if I can go through each alarm in the fly in an event driven fashion? I then don’t have to hold everything in the memory! I heard Node.js is event driven and you can also use one language on both server and client. There is a Node mysql driver which supports popping up event upon each record!
Then it’s the actual programming. For a beginner it’s kind of putting snippets I found online into one page and making sure they work together. After 2 weeks struggling my first Node application was borned!
The initial testing seemed promising. Everything worked fine as they should be until one day an user came to my desk and told me he has managed to crash the application repetitively.
The culprit is the huge amount of returned data that generated far too many events and Node was not able to catch up and things got even worse when there is another query coming in before the first one finished.
There are some workarounds. I put in the cluster mechanism and it will bring up another process when one crashes. However since Node’s event processing is not preemptive you can’t jump into the queue and stop something that is causing trouble. It seems that when Node process goes to 100% CPU usage you just have to wait for it to finish or crash.
I still love Node and it still works for me in most cases. I have put in some sanity checks on the input so crashes is seldom now. However I do have learned from this exercise and the lesson is: Avoid heavy event driven stuff in Node.
Tags: 
netcool
node.js