Hi DataStage Gurus,
I am new to DataStage. I am using IIS and DataStage 9.1.2. I have a job reading data from Sybase to Netezza. I tried two connection components Sybase Enterprise and Sybase OC
The job looks like this
Sybase Enterprise -> Transformer --> Netezza
Sybase OC -> Transformer --> Netezza
I would like to multi-thread the job to improve the reading efficiency from Sybase. I have four nodes. However, when I tried changing the "execution mode" in the two Syabase components from "sequential" to "parallel", I got 4 times number of rows in Netezza. So I think that is not the right way to multi-thread the data query from Sybase table.
My question is:
(1) Is there any way to customerize the SQL query, for example, passing a split number "N=4" so that the Sybase source table is partitioned in 4 parts and each node queries 1/4 of the table? Is it doable in the configuration file setting or else where? My ultimate goal is to get a parameterized datastage job that allows to speficiy the row range so I might be able to call it from the command line like this:
dsjob -run -param rowrange=0-1000 myproject myjob
dsjob -run -param rowrange=1000-2000 myproject myjob
which respectively pass the acutal SQL queries in datastage component like
select * from mytable where rowid<='1000' AND rowid>'0'
select * from mytable where rowid<='2000' AND rowid>'1000'
I wonder is this doable in DataStage?
(2) In the infosphere documentation (http://pic.dhe.ibm.com/infocenter/iisinfsv/v9r1/index.jsp?topic=%2Fcom.ibm.swg.im.iis.conn.sybase.doc%2Ftopics%2Fc_csybref_Using_Generated_Queries.html) it says Sybase OC is able to run in parallel canvas. What does that parallel canvas mean? How to see the difference of parallel canvas with non-parallel canvas (like Sybase Enterprise component)?
Thanks for any suggestion!