No replies
1 Post

Pinned topic Multi-threadng a DataStage job containing Sybase source connection

‏2013-10-18T15:54:09Z |

Hi DataStage Gurus,

I am new to DataStage. I am using IIS and DataStage 9.1.2.    I have a job reading data from Sybase  to Netezza.   I tried two connection components    Sybase Enterprise  and Sybase OC

The job looks like this

Sybase Enterprise    ->   Transformer   -->  Netezza


Sybase OC    ->   Transformer   -->  Netezza

I would like to multi-thread the job to improve the reading efficiency from Sybase. I have four nodes.  However, when I tried changing the "execution mode" in the two Syabase components from "sequential" to "parallel", I got 4 times number of rows in Netezza.  So I think that is not the right way to multi-thread the data query from Sybase table.

My question is:

(1) Is there any way to customerize the SQL query, for example, passing a split number "N=4" so that the Sybase source table is partitioned in 4 parts and each node queries 1/4 of the table?  Is it doable in the configuration file setting or else where?  My ultimate goal is to get a parameterized datastage job that allows to speficiy the row range so I might be able to call it from the command line like this:

    dsjob -run  -param rowrange=0-1000 myproject myjob

 dsjob -run  -param rowrange=1000-2000 myproject myjob


which respectively pass the acutal SQL queries in datastage component like

   select * from mytable where rowid<='1000' AND rowid>'0'

   select * from mytable where rowid<='2000' AND rowid>'1000'


I wonder is this doable in DataStage?

(2) In the infosphere documentation ( it says Sybase OC is able to run in parallel canvas. What does that parallel canvas mean? How to see the difference of parallel canvas with non-parallel canvas (like Sybase Enterprise component)?

 Thanks for any suggestion!