All, I will be using DataStage 8.5 for a project where the target database is Netezza Paralel Server (NPS). I'm currently exploring how to best use the parallelism potential that comes with NPS, i.e. how to push some of the processing to the database instead of doing it in DataStage (outside the database).
From what I've seen so far, one option is to use custom SQL in the Netezza Connector. For instance, instead of using a Join Stage, I could use a Netezza Connector Stage and do the join as a SQL statement. This seems to be the approach taken by the Balanced Optimizer as well as described here: http://www.ibm.com/developerworks/data/library/techarticle/dm-1201netezzaconnector/index.html.
So I was wondering:
1. Is this the only option available for pushing the processing to the database?
2. What do I trade off if I go with this approach (either by manually specifying custom SQL or by letting the balanced optimizer do its thing) in terms of metadata availability and tracing ability? I'm particulary concerned about being able to use metadata, i.e. being able to do lineage/impact analysis etc.
I would appreciate any thoughts, recommendations and best practices that you would like to share.