IBM Information Server is a revolutionary new software platform that helps organizations derive more value from the complex heterogeneous information spread across their systems. IBM InfoSphere DataStage is a critical component of the IBM Information Server. The parallel framework of IBM InfoSphere DataStage is also the foundation for IBM InfoSphere QualityStage, and IBM InfoSphere Information Analyzer components. IBM InfoSphere DataStage enables organizations to design data flows that extract information from multiple source systems, transform it in ways that make it more valuable, and then deliver it to one or more target databases or applications. An IBM InfoSphere DataStage job consists of individual stages linked together that describe the flow of data from a data source to a data target.
For job designs that use connectors to read or write data from data sources, you can use Balanced Optimization to give you greater control over the job. You design your job as normal, then use Balanced Optimization to redesign the job automatically to your stated preferences. This redesign process can maximize performance by minimizing the amount of input and output performed, and by balancing the processing against source, intermediate, and target environments. You can then examine the new optimized job design, and save it as a new job. Your root job design remains unchanged. The Balanced Optimization takes advantage of the power of the databases without requiring the job designer to become an expert in native SQL.
The Balanced Optimizer optimizes the job based on the optimization rules specified in the optimizer. To understand the optimization technique, look at the job in Figure 1.
Figure 1. Job NZ_unshipped5 to be optimized
When you click the Optimization button for the NZ_unshipped5 job, the Balanced Optimizer scans through the job to see what optimization options are applicable to the current job. The optimization page opens up with the following optimization options for the job, also shown in Figure 1.
- Pushing Processing to database sources: NZ_unshipped5 contains a
source database and number of processing stages like
Join,TransformerandLookup. This is a possible optimization option where all the processing can be done in the source database. - Pushing processing to database targets: NZ_unshipped5 contains a target database, therefore this is a possible optimization option where all the processing can be pushed to a target database.
- Push data reduction processing into database targets: Data reduction processing involves reducing the volume of data before it is written to the target database. Aggregations and transformers that have constraint expressions set, are both examples of data reduction processing. In normal circumstances, you do not want to push this processing to a target, because it means that a greater volume of data is written to the target. However, pushing data reduction processing to the target database can be useful; for example, when the aggregation or filtering is performed on data that is already in the target database. NZ_unshipped5 has a Transformer stage which can potentially be pushed to target database. By default this option is not selected.
- Push all Processing into the(target) database: Where NZ_unshipped5 data source and target are both on same Netezza database, and Netezza is capable of carrying out the processing done in transformer, join, and Lookup stage, by applying this optimization option the job could potentially be optimized into just two stages a dummy row generator stage feeding into the target database stage. The benefit of this optimization is that the data remains within the database, and there is no database I/O, and no actual data flows through the job. Complex SQL is generated to perform all the processing within the target database.
Figure 2. Optimization options page for NZ_unshipped5 job
Balanced Optimizer has a priority attached to each of these optimization options. In general pushing the processing to the target database has higher priority than pushing the processing to the source database. It always tries to push the data reduction to the data source, even if you select the push processing to database targets option. However if you choose the Push data reduction to target database option, then the data reduction stages like Aggregator and Transformer are pushed to the target database.
In order to fully optimize the job, the Balanced Optimizer makes several iterations through the job, identifying the subgroup of stages or pattern within the job on which the optimization options are applied in the order of priority, until no further pattern can be identified within the job which can potentially be optimized. All these iterations are in memory and only the fully optimized job is displayed on the GUI.
Optimization iterations for NZ_Unshipped5
The sequence in which optimization patterns are identified and the optimization options applied to them, and the resultant intermediate in memory representation of the optimized jobs are depicted in figures 3 through 14.
The first sequence of stages to be optimized using PushProcessingToTarget option are depicted in Figure 3.
Figure 3. First pattern to be optimized with option PushProcessingToTarget
Figure 4 shows the details of the Lookup stage Lookup_salesperson.
Figure 4. Input link - output link mapping in the Lookup stage Lookup_salesperson
After the job is optimized using the PushProcessingToTarget option, the processing of the Lookup stage Lookup_salesperson is pushed to the database, and therefore the Lookup_salesperson stage is removed. The in-memory representation of the job after the first pattern is optimized is shown in Figure 5.
Figure 5. In-memory representation of the job after optimizing the first pattern
As shown previously in Figure 5, the SQL in the Netezza_Connector_1 stage includes the processing of the Lookup_salesperson stage, which was removed during optimization. A condensed form of the SQL statement in the Netezza_Connector_1 stage is depicted in Listing 1.
Listing 1. SQL represented by Netezza_Connector_1 shown in Figure 5
insert into Late_customer_contacts (DaysLate, CompanyName, ContactName,
Phone,OrderID, RequiredDate, LastName, FirstName, TitleOfCourtesy)
..
from
..
from
(select ORCHESTRATE.CompanyName as CompanyName, ORCHESTRATE.ContactName
as ContactName, ORCHESTRATE.Phone as Phone, ORCHESTRATE.OrderID as
OrderID,ORCHESTRATE.DaysLate as DaysLate, ORCHESTRATE.RequiredDate as
RequiredDate, ORCHESTRATE.EmployeeID as EmployeeID
from ORCHESTRATE ORCHESTRATE) DSLink12
left outer join
(select DSLink16RemDup1.EmployeeID as EmployeeID,DSLink16RemDup1.LastName
as LastName, DSLink16RemDup1.FirstName as FirstName,
DSLink16RemDup1.TitleOfCourtesy as TitleOfCourtesy
from
..
from
..
(select DSLink16RemDup2.EmployeeID as EmployeeID, DSLink16RemDup2.LastName
as LastName,DSLink16RemDup2.FirstName as FirstName,
DSLink16RemDup2.TitleOfCourtesy as TitleOfCourtesy,
row_number() over (partition by DSLink16RemDup2.EmployeeID order by
DSLink16RemDup2.EmployeeID)
as RemdupPivotColumn3
from
(select employees.EmployeeID as EmployeeID, employees.LastName
as LastName,employees.FirstName as FirstName,
employees.TitleOfCourtesy as TitleOfCourtesy from employees employees)
DSLink16RemDup2) RemdupAliasB3
where RemdupAliasB3.RemdupPivotColumn3 = 1) DSLink16RemDup1) DSLink16
on DSLink12.EmployeeID = DSLink16.EmployeeID) DSLink15
|
Optimization now iterates through the optimized job shown previously in Figure 5 to identify the second pattern that can be optimized. Figure 6 shows the second pattern identified by the optimizer to optimize using the PushProcessingToTarget option.
Figure 6. Second pattern to be optimized with option PushProcessingToTarget
Figure 7 shows the details of the transformations in the Days_late Transformer stage.
Figure 7. Details of the Transformer stage Days_late
As a result of optimizing the second pattern using the PushProcessingToTarget option, the processing of Days_late Transformer is pushed into Netezza_Connector_1, and Days_late stage is removed from the optimized job. The in-memory representation of the job after optimizing the second pattern is shown in Figure 8.
Figure 8. In memory representation of the job after optimizing the second pattern
As shown previously in Figure 8, the SQL in the Netezza_Connector_1 stage includes the transformations of the Days_late stage which was removed during optimization. A condensed form of the SQL statement in the Netezza_Connector_1 stage is depicted in Listing 2.
Listing 2. SQL represented by Netezza_Connector_1 as shown in Figure 8
insert into Late_customer_contacts (DaysLate, CompanyName,...
from
...
from
....
from
(select ORCHESTRATE.CompanyName as CompanyName, ORCHESTRATE.ContactName
as ContactName, ORCHESTRATE.Phone as Phone, ORCHESTRATE.OrderID as
OrderID,CURRENT_DATE - TO_DATE(TO_CHAR(ORCHESTRATE.OrderDate,
'YYYY-MM-DD'),'YYYY-MM-DD') as DaysLate, ORCHESTRATE.RequiredDate as
RequiredDate,ORCHESTRATE.EmployeeID as EmployeeID
from ORCHESTRATE) DSLink12) DSLink12
left outer join
from
....
from
.....
from
......
|
The optimizer loops through the partially optimized job again to find the third pattern that can be optimized. Figure 9 represents the third pattern that would be optimized using the PushProcessingToDBSource option.
Figure 9. Third pattern to be optimized with option PushProcessingToDBSource
Figure 10 shows the details of the transformations in the Null_ship_dates Transformer stage.
Figure 10. Details of the Transformer stage Null_ship_dates
As a result of optimizing the third pattern using the PushProcessingToDBSource option, the processing of Null_ship_dates Transformer is pushed into Orders connector, and Null_ship_dates stage is removed from the optimized job. The in-memory representation of the job after optimizing the third pattern is shown in Figure 11.
Figure 11. In-memory representation of the job after optimizing the third pattern
As shown previously in Figure 11, the SQL in the DSLink10 includes the transformations of the Null_ship_dates stage that was removed during optimization. The SQL statement in the DSLink10 is depicted in Listing 3.
Listing 3. SQL represented by DSLink10 as shown in Figure 11
select DSLink5.OrderID as OrderID, DSLink5.CustomerID as CustomerID,
DSLink5.OrderDate as OrderDate, DSLink5.RequiredDate as RequiredDate,
DSLink5.EmployeeID as EmployeeID from
(select orders.OrderID as OrderID, orders.CustomerID as CustomerID,
orders.ShippedDate as ShippedDate, orders.OrderDate as OrderDate,
orders.RequiredDate as RequiredDate, orders.EmployeeID as EmployeeID
from orders orders) DSLink5
where DSLink5.ShippedDate is null
|
The optimizer again loops through the partially optimized job, as shown previously in Figure 11, to find the fourth pattern that can be optimized. Figure 12 represents the fourth pattern that would be optimized using the PushProcessingToTarget option.
Figure 12. Fourth pattern to be optimized with option PushProcessingToTarget
As a result of optimizing the fourth pattern using the PushProcessingToTarget option, the processing of Join_3 stage is pushed into Netezza_Connector_1, and Join_3 stage is removed from the optimized job. The in-memory representation of the job after optimizing the fourth pattern is shown in Figure 13.
Figure 13. In memory representation of the job after optimizing the fourth pattern
As shown previously in Figure 13, the SQL in the Netezza_Connectr_1 includes the processing of the join stage Join_3, which was removed during optimization. The SQL statement in the Netezza_Connector_1 is depicted in the Listing 4.
Listing 4. SQL represented by Netezza_Connector_1 in Figure 13
insert into Late_customer_contacts (DaysLate, CompanyName, ..
..
from
..
from
...
from
...
from
...
from
(select ORCHESTRATE.CustomerID as CustomerID, ORCHESTRATE.CompanyName
as CompanyName, ORCHESTRATE.ContactName as ContactName, ORCHESTRATE.Phone
as Phone from ORCHESTRATE ORCHESTRATE) DSLink4
inner join
(select DSLink5.OrderID as OrderID, DSLink5.CustomerID as CustomerID,
DSLink5.OrderDate as OrderDate, DSLink5.RequiredDate as RequiredDate,
DSLink5.EmployeeID as EmployeeID
from
(select orders.OrderID as OrderID, orders.CustomerID as CustomerID,
orders.ShippedDate as ShippedDate, orders.OrderDate as OrderDate,
orders.RequiredDate as RequiredDate, orders.EmployeeID as EmployeeID
from orders orders) DSLink5
where DSLink5.ShippedDate is null ) DSLink10
on DSLink4.CustomerID = DSLink10.CustomerID
order by DSLink4.CustomerID asc) Unshipped_Orders) DSLink12) DSLink12
left outer join
...
from
..
from
..
from
|
The optimizer again loops through the partially optimized job, shown previously in Figure 13, to find the fifth pattern that can be optimized. Figure 14 represents the fifth pattern that would be optimized using the PushAllProcessingToDBTarget option.
Figure 14. Fifth pattern to be optimized with option PushAllProcessingToDBTarget
As a result of optimizing the fifth pattern using the PushAllProcessingToDBTarget option, the processing of Customers stage is pushed into Netezza_Connector_1, and Customers stage is removed from the optimized job. The in-memory representation of the job after optimizing the fifth pattern is shown in Figure 15.
Figure 15. Completely optimized job as it appears on Balanced Optimizer page
SQL Statement in the Connector of optimized job
This is the complete insert statement that appears in the Netezza Connector of the optimized job. Figure 16 shows the original un-optimized job again.
Figure 16. Un-optimized job NZ_unshipped5
Listing 5 shows the SQL statement in Netezza Connector of the fully optimized job. The various segments in the SQL statement are de-marked by stage names to which they correspond. As various stages got removed in the process of optimization, these SQL segments were added to the SQL of the optimized job.
Listing 5. SQL represented by Netezza_Connector_1 Fully optimized job as shown in Figure 15
[Begin:Sales_staff/Lookup_salesperson]
insert into Late_customer_contacts (DaysLate, CompanyName,
ContactName, Phone,OrderID, RequiredDate, LastName, FirstName,
TitleOfCourtesy) select CAST(DSLink15.DaysLate AS INTEGER) as
DaysLate,CAST(DSLink15.CompanyName AS VARCHAR(40)) as CompanyName,
CAST (DSLink15.ContactName AS VARCHAR(30)) as ContactName,
CAST(DSLink15.Phone AS VARCHAR(24)) as Phone, CAST(DSLink15.
OrderID AS INTEGER) as OrderID, CAST(DSLink15.RequiredDate
AS TIMESTAMP) as RequiredDate, CAST(DSLink15.LastName AS
VARCHAR(20)) as LastName, CAST(DSLink15.FirstName AS VARCHAR(20))
as FirstName, CAST(DSLink15.TitleOfCourtesy AS VARCHAR(20)) as
TitleOfCourtesy
from
(select DSLink12.DaysLate as DaysLate, DSLink12.CompanyName as
CompanyName,DSLink12.ContactName as ContactName,
DSLink12.Phone as Phone, DSLink12.OrderID as OrderID, DSLink12.
RequiredDate as RequiredDate, DSLink16. LastName as LastName,
DSLink16.FirstName as FirstName, DSLink16.TitleOfCourtesy as
TitleOfCourtesy [End:Sales_staff/Lookup_salesperson]
[Begin:Days_late]from
(select DSLink12.CompanyName as CompanyName, DSLink12.ContactName as
ContactName, DSLink12.Phone as Phone, DSLink12.OrderID as OrderID,
DSLink12.DaysLate as DaysLate, DSLink12.RequiredDate as RequiredDate,
DSLink12.EmployeeID as EmployeeID
from
(select Unshipped_Orders.CompanyName as CompanyName, Unshipped_Orders.
ContactName as ContactName, Unshipped_Orders.Phone as Phone,
Unshipped_Orders.OrderID as OrderID, CURRENT_DATE - TO_DATE(TO_CHAR
(Unshipped_Orders.OrderDate,'YYYY-MM-DD'),'YYYY-MM-DD') as DaysLate,
Unshipped_Orders.RequiredDate as RequiredDate,
Unshipped_Orders.EmployeeID as EmployeeID
[End:Days_late][Begin:Orders/Join_3]
from
(select DSLink4.CompanyName as CompanyName, DSLink4.ContactName as
ContactName, DSLink4.Phone as Phone, DSLink10.OrderID as OrderID,
DSLink10.OrderDate as OrderDate, DSLink10.RequiredDate as
RequiredDate, DSLink10.EmployeeID as EmployeeID
from
(select DSLink4.CustomerID as CustomerID, DSLink4.CompanyName as
CompanyName, DSLink4.ContactName as ContactName, DSLink4.Phone as
Phone [End:Orders/Join_3]
[Begin:Customers]
from
(select customers.CustomerID as CustomerID, customers.CompanyName as
CompanyName, customers.ContactName as ContactName, customers.Phone
as Phone from customers customers)
[End:Customers]
[Begin:Orders/Join_3]
DSLink4) DSLink4
inner join
[End:Orders/Join_3]
[Begin:Null_ship_dates]
(select DSLink5.OrderID as OrderID, DSLink5.CustomerID as CustomerID,
DSLink5.OrderDate as OrderDate, DSLink5.RequiredDate as RequiredDate,
DSLink5.EmployeeID as EmployeeID
from
(select orders.OrderID as OrderID, orders.CustomerID as CustomerID,
orders.ShippedDate as ShippedDate, orders.OrderDate as OrderDate,
orders.RequiredDate as RequiredDate, orders.EmployeeID as EmployeeID
from orders orders) DSLink5
where DSLink5.ShippedDate is null ) DSLink10
[End:Null_ship_dates]
[Begin:Orders/Join_3]
on DSLink4.CustomerID = DSLink10.CustomerID
order by DSLink4.CustomerID asc)
[End:Orders/Join_3]
[Begin:Days_late]
Unshipped_Orders) DSLink12)
[End:Days_late]
[Begin:Sales_staff/Lookup_salesperson]DSLink12
left outer join
(select DSLink16RemDup1.EmployeeID as EmployeeID, DSLink16RemDup1.
LastName as LastName, DSLink16RemDup1.FirstName as FirstName,
DSLink16RemDup1.TitleOfCourtesy as TitleOfCourtesy
from
(select RemdupAliasB9.EmployeeID as EmployeeID,RemdupAliasB9.LastName
as LastName, RemdupAliasB9.FirstName as FirstName,
RemdupAliasB9.TitleOfCourtesy as TitleOfCourtesy
from
(select DSLink16RemDup2.EmployeeID as EmployeeID, DSLink16RemDup2.
LastName as LastName, DSLink16RemDup2.FirstName as FirstName,
DSLink16RemDup2.TitleOfCourtesy as TitleOfCourtesy,
row_number() over (partition by DSLink16RemDup2.EmployeeID order by
DSLink16RemDup2.EmployeeID) as RemdupPivotColumn9
from
(select employees.EmployeeID as EmployeeID, employees.LastName as
LastName, employees.FirstName as FirstName, employees.TitleOfCourtesy
as TitleOfCourtesy
from employees employees) DSLink16RemDup2) RemdupAliasB9
where RemdupAliasB9.RemdupPivotColumn9 = 1) DSLink16RemDup1) DSLink16
on DSLink12.EmployeeID = DSLink16.EmployeeID) DSLink15
[End:Sales_staff/Lookup_salesperson]
|
Special features in Netezza Connector
There are some features which are currently available only in Netezza Connector, such as action column, unique key column for update support, and temporary work tables.
When the Action Column write mode is used, you can designate a column in the target Netezza Connector as the action column.
Table 1. Action Column support
| Action value | SQL operation |
|---|---|
| I | Insert |
| U | Update |
| D | Delete |
When a processing stage or a join stage is pushed into a target Netezza Connector configured with the Action Column write mode, a sequence of DELETE/UPDATE/INSERT statements is generated in the After-SQL field in the target connector. These SQL statements combine the processing logic or the join logic with the original DELETE/UPDATE/INSERT operations implied by the Action Column write mode. Figure 17 shows a job were the Action Column support is specified in the target column connector.
Figure 17. Job with Action Column support in the target Netezza Connector
Figure 18 shows how Action Column support is specified in the target Netezza Connector page. The write mode is set to Action Column, and one of the columns of the connector is designated as the Action Column.
Figure 18. Target Netezza Connector page showing Action Column support
Figure 19 shows the columns page of the target Netezza Connector where the
column which was specified as Action Column is set to be of type
char.
Figure 19. Target Netezza Connector column page showing action column support
Figure 20 shows the Transformer stage Action_Flags of the Action Column job. It shows how the Action_Flag column is mapped to a stage variable.
Figure 20. Transformer page of the Action_Flags Transformer in the action column job
Listing 6 specifies the derivation specified for the Action Column StageVariable in the Action_Flags Transformer.
Listing 6. Derivation for the Action Column StageVariable in the transformer Action_Flags
if DSLink4.EmployeeID < 2 then 'I'
else if DSLink4.EmployeeID < 4 then 'U'
else if DSLink4.EmployeeID < 6 then 'D'
else if DSLink.EmployeeID < 8 then 'R'
else 'M'
|
Figure 21 shows a fully optimized Action Column job. Listing 7 shows a condensed snippet of SQL in the Netezza Connector of the optimized job.
Figure 21. Optimized Action Column job
Listing 7. SQL in the Netezza Connector of the optimized Action Column job
delete from founding_employees
where
exists (select 1
from
.....
inner join
.....
from
....
where (DSLink5.action_flag in ('D', 'R')) and (founding_employees
.EmployeeID = DSLink5.EmployeeID)) ;
update founding_employees
set
from
.....
inner join
.......
where DSLink5.action_flag in ('U', 'M')) DSLink5
where founding_employees.EmployeeID
= DSLink5.EmployeeID;
insert into founding_employees
from
....
inner join
.....
left outer join founding_employees founding_employees_action_insert
..
where (DSLink5.action_flag = 'M') and (founding_employees_action_insert
.EmployeeID is null )
union
....
inner join
....
where DSLink5.action_flag in ('I', 'R'))
|
Unique key column for update support
When the Use Unique Key Column option is set to Yes in the target Netezza Connector, BalOp generates additional SQL expressions in the target SQL update statement. This ensures that only one row from each group of rows with the same value in the unique key column is used in the update statement.
Essentially a clause like the one shown in Listing 8 gets appended to the update SQL of the target connector.
Listing 8. SQL clause for unique key column for update support
SELECT < keys > Max(< unique_key > AS unique_key_max
FROM
GROUP BY < Keys >)Link2
ON < key-expressions > AND Link1.unique_key = Link2.unique_key_max
|
Figure 22 shows how the unique keys for update column support are specified in the Netezza Connector page by setting the Use Unique Key property to Yes, and specifying the unique key column and update column.
Figure 22. Target Netezza Connector page showing unique keys for update column support
Listing 9 shows the snippet of SQL that appears in the target Netezza Connector of an optimized job with unique keys for update column support.
Listing 9. SQL in the target Netezza Connector of an optimized job with unique keys for update column support
update customersusa
set Phone=DSLink2_USA_Customers.Phone
from
(select distinct DSLink2_USA_Customers.CompanyName as CompanyName,
DSLink2_USA_Customers.ContactName as ContactName, DSLink2_USA_Customers.Phone as Phone,
DSLink2_USA_Customers.CustomerID as CustomerID, DSLink2_USA_Customers.Country as Country
from
(select DSLink1_All_Customers.CompanyName as CompanyName,
DSLink1_All_Customers.ContactName as ContactName, DSLink1_All_Customers.Phone as Phone,
DSLink1_All_Customers.CustomerID as CustomerID, 'United States' as Country
from
(select CustomerID as CustomerID, CompanyName as CompanyName, ContactName as ContactName,
Phone as Phone
from customers customers) DSLink1_All_Customers) DSLink2_USA_Customers)
DSLink2_USA_Customers
inner join
(select DSLink2_USA_Customers.CustomerID as CustomerID, max(ContactName) as
ContactNameUniqueKeyMax2
from
(select DSLink1_All_Customers.CompanyName as CompanyName,
DSLink1_All_Customers.ContactName as ContactName, DSLink1_All_Customers.Phone as Phone,
DSLink1_All_Customers.CustomerID as CustomerID, 'United States' as Country
from
(select CustomerID as CustomerID, CompanyName as CompanyName, ContactName as ContactName,
Phone as Phone
from customers customers) DSLink1_All_Customers) DSLink2_USA_Customers
group by DSLink2_USA_Customers.CustomerID) DSLink2_USA_CustomersUnique_1_
on (DSLink2_USA_Customers.CustomerID = DSLink2_USA_CustomersUnique_1_.CustomerID) and
(DSLink2_USA_Customers.ContactName =
DSLink2_USA_CustomersUnique_1_.ContactNameUniqueKeyMax2)
where customersusa.CustomerID = DSLink2_USA_Customers.CustomerID
|
Temporary work table support and staging table management
Netezza Connector uses Temporary Work Tables (TWT) as a staging area for rows received from the input link of a target Netezza Connector. The TWT tables can be managed in Automatic and Existing modes. In the Automatic mode, the TWT tables are automatically created by the Netezza Connector at run time. In the Existing mode, the TWT name is explicitly specified in the TWT Name property in the connector. InfoSphere DataStage Balanced Optimization supports both Automatic and Existing modes in different situations.
Balanced Optimizer has a property called
Staging table name on the Advanced Options
page. For the other connectors which do not support TWT, staging table
works as TWT. In case of Netezza, where both TWT and staging table property
are available, the final temporary work table name is based on following
criteria shown in Table 2.
Table 2. Temporary Work Table name in Netezza
| TWT mode | Optimization option | Final temporary table name |
|---|---|---|
| Existing | Push all processing into the database | TWT Name is not modified unless the BalOp staging table name is set to a different table name. If the BalOp staging table name is set to a different table name, the staging table is used as the new TWT table in the optimized job. |
| Automatic | Push all processing into the database | Every property in the TWT table properties area in the target connector is copied from the original job to the optimized job regardless of whether a BalOp staging table name is set or not. |
| Existing | Push processing to database targets option, and the Push data reduction processing to database targets. | TWT Name is copied from the original job to the optimized job unless the staging table name property is set to a different table name. |
| Automatic | Push processing to database targets option and the Push data reduction processing to database targets. | If the Separate connection for TWT option is not specified in the target connector in the original job. If a staging table name is not specified, then a staging table name is automatically generated and used as the TWT table in the optimized job. If a staging table name is specified, then the staging table is used as the TWT table. If the Separate connection for TWT option is specified in the target connector in the original job, a staging table name must be specified in the staging table name property. |
BalOp version 8.7 supports the optimization of Filter, Copy, and Transform stages with multiple output links, as shown in Figure 23. This feature was not available in prior releases.
Figure 23. Un-optimized and optimized Netezza filter job with multiple output links
This article provided a close look at how optimization patterns are identified iteratively in the DataStage job, and optimized by pushing the work done by various stages into the source or target connector. The article also highlighted features unique to Netezza Connector, such as Action Column support, which results in the generation of a sequence of DELETE/UPDATE/INSERT statements in the After-SQL field in the target connector. This includes Temporary Work Table support providing a staging area for rows received from the input link of a target Netezza Connector, and Unique Key Column for update support which ensures that only one row from each group of rows with the same value in the unique key column is used in the update statement. In summary, Balanced Optimization is a powerful tool to enhance the performance of the DataStage job.
Learn
- Go to the Information Center for IBM InfoSphere Information Server to find documentation for the
InfoSphere Information Server and InfoSphere Foundation Tools product
modules and components.
- Attend a free
developerWorks Live! briefing to get up-to-speed quickly on IBM
products and tools as well as IT industry trends.
- Follow developerWorks on
Twitter.
- Watch developerWorks on-demand demos ranging from product installation
and setup demos for beginners, to advanced functionality for experienced
developers.
Get products and technologies
-
Evaluate IBM
products in the way that suits you best: Download a product trial,
try a product online, use a product in a cloud environment, or spend a few
hours in the SOA Sandbox learning how to implement Service Oriented
Architecture efficiently.
Discuss
- Participate in the discussion forum.
- Get involved in the My developerWorks
community. Connect with other developerWorks users while exploring
the developer-driven blogs, forums, groups, and wikis.
Ritika Maheshwari has worked for IBM for the past 13 years. Initially, she worked in the EJB Query group for WebSphere, and now has been working on Balanced Optimization since its beta phase. She implemented the support for storing metadata in the optimized job. Ritika worked on the prototype to substitute the Prolog engine in Balanced Optimizer with a Java Rules Engine. She was specifically involved in coming up with a Java parser for the Java Rules engine, and has been actively involved in enhancing the Prolog engine functionality as well as the GUI side functionality involving dotnet.




