Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Performance enhancements in IBM WebSphere Federation Server V9.1, Part 1: Improve performance of federated queries with new WebSphere Federation Server capabilities

Susanne Englert (senglert@us.ibm.com), Senior Software Engineer, IBM
Susanne Englert
Susanne Englert is a member of the Websphere Federation Server performance team and has been working with the product since late 2001. She has worked extensively in the area of database performance and is interested in query optimization, parallel query processing, federated queries, and customer use cases. Susanne is a graduate of the University of Bonn and past chair of the TPC's Decision Support benchmark development subcommittee.
Simon Harris (simond_harris@uk.ibm.com), Senior Software Engineer, IBM
Simon Harris
Simon Harris is a Performance Engineer within the WebSphere Federation Server development team in the Silicon Valley Laboratory. Simon has been working with federated database technology since its inception in IBM in 1995, supporting many customers in both a pre- and post-sales capacity throughout Europe, the Middle-East, and Africa.
Holger Kache (kache@us.ibm.com), Advisory Software Engineer, IBM
Holger photo
Holger Kache is an advisory software engineer at the IBM Silicon Valley Laboratory. He is a member of the Information Integration Solutions organization with expertise in data integration and performance. Holger obtained his masters degree in Computer Science from the Brandenburg Technical University Cottbus, Germany before joining IBM in 2001.

Summary:  IBM WebSphere Federation Server V9.1 introduces several interesting performance enhancements for federated queries. Learn about better nickname statistics, improvements for environments that use ODBC access to remote sources, and the new "asynchronous execution" capability that benefits complex multisource queries accessing large amounts of remote data. We describe each enhancement and provide sample performance measurements that demonstrate the potential benefits that can be achieved in applicable environments. This article is Part 1 of a 2-part series. Part 2 discusses performance characteristics of new functionality in Federation Server V9.1.

View more content in this series

Date:  07 Dec 2006
Level:  Intermediate
Also available in:   Chinese

Activity:  9152 views
Comments:  

Introduction

IBM WebSphere Federation Server V9.1 (WFS) introduces several new features that are designed to make the product more usable and versatile. Several features are specifically intended to improve performance, while others add desirable functionality. While Part 2 explores performance characteristics of new functionality including federated stored procedures and two-phase commit, this article focuses on features intended purely to improve performance. Performance enhancements in WFS V9.1 include:

  • Improvements in gathering and maintenance of nickname statistics
  • Customization of the ODBC wrapper used to access IBM WebSphere Classic Federation Server for z/OS (CF) to increase statement pushdown
  • Optimizing ODBC wrapper performance with the ODBC tuning utility
  • Asynchronous execution of federated statements (available on partitioned servers only)

We describe the nature of the performance improvements implemented in each case and the circumstances in which they are achieved.


Nickname statistics

Background

One of the distinguishing features of Federation Server is that it uses the powerful technology of DB2's cost-based optimizer to find the best access path for federated queries. However, the optimizer depends on accurate knowledge of statistical information about the remote objects being accessed. When nicknames are created, Federation Server collects as much information as possible about the underlying remote objects, including:

  • The number of rows in remote tables
  • The number of distinct values, and the maximum and minimum values of each column of a remote table
  • The number of distinct values of index keys in remote indexes
  • Some physical attributes of remote tables and indexes (e.g. number of blocks, number of index levels)

Not all of these statistics can be collected from every source, but Federation Server retrieves as many of them as possible at CREATE NICKNAME time. For relational sources, nickname statistics are usually automatically retrieved by lookup from the catalog of the remote source. For nonrelational sources, or remote relational objects that do not have statistics (such as views), it is generally not possible to collect statistics at create nickname time; either because the remote catalogs do not exist, or statistics for those objects are not maintained within the catalog.

However, you can use the Control Center's nickname statistics update facility to gather statistics afte the nickname is created by right-clicking on the nickname and using the "Statistics ... Update" menu. When you do this, the Control Center calls a stored procedure known as SYSPROC.NNSTAT. NNSTAT can update statistics for a nickname using remote catalog lookup ("catalog based statistics"), but it also has the capability to issue queries against the nicknames themselves in order to deduce the needed statistics if they are not available from the remote catalog. This approach, called "query-based statistics" can be quite resource-intensive. Once statistics for a nickname have been initially populated by any method, they can be kept up-to-date using the NNSTAT stored procedure or Control Center's nickname statistics update facility.

V9.1 improvements for nickname statistics

WebSphere Federation Server v9.1 offers two kinds of improvements related to nickname statistics gathering and maintenance. Details can be found in Chapter 11 of the "Administration Guide for Federated Systems" manual (Resources).

First, more statistics are retrieved by the catalog-based method for some relational sources than in previous releases. These improvements were actually implemented starting in V8 FP9. We mention them here because of their potential impact on users upgrading to V9.1 directly from earlier V8 fixpacks. The following table shows which statistics are now collected at CREATE NICKNAME time. An "x" indicates that the statistic is collected; a blank indicates that it is not. An "o" indicates that the statistic is not easily available from the remote source. Please note that this table is accurate as of the time of publication, but that work to discover and collect more remote statistics is ongoing.


Table 1. Statistics created at CREATE NICKNAME time
Data sourceInformix (IDS)Oracle (NET8)SQL ServerDB2 luwDB2 z/OSTeradataSybaseDB2/VMDB2 / iSeries
Statistics
TABLE
cardxxxxXoxxo
npagesxxxxxxxXo
fpagesxxxxxx xo
overflowoxoxoooxo
COLUMN
colcardxxxxxxoxo
high2keyxxxxxxo o
low2keyxxxxxxo o
numnullso oooxo o
avgcollenooxxoooxo
INDEX
fullkeycardxxxxxooxo
firstkeycardxxxxxooxo
first2keycardoo ooooo
first3keycardoo ooooo
first4keycardoo ooooo
clusterratioxooxxooxo
nlevelsxxoxxooxo
nleafxxxxxooxo

The most significant of the newly-retrieved nickname statistics are the minimum and maximum values stored in each column of a remote table1. Previously, these values, which are stored in the Federation Server catalog as SYSCAT.COLUMNS.LOW2KEY and SYSCAT.COLUMNS.HIGH2KEY, respectively, were available only for remote DB2 sources. Now they are collected for non-character columns of remote Oracle, Informix, Teradata, SQL Server, DB2/zOS, and DB2/VM tables. Having accurate values for these statistics allows the federated server to better calculate the likely selectivity of inequality and range (BETWEEN) predicates, as well as join cardinality in some cases. The new statistics may cause some plan changes relative to earlier releases; in most cases, these should be advantageous.

Secondly, the NNSTAT stored procedure now offers a choice of statistics-gathering methods, as well as fine-grained control over the columns and indexes for which statistics are to be collected. The new functionality is also available through more flexible options in Control Center's nickname statistics update facility, which calls NNSTAT. NNSTAT can collect statistics in one of two modes:

  • "Catalog-based" statistics: In this mode, NNSTAT retrieves as many statistics as possible for a nickname by simply looking them up in the remote source's own catalog. The quality of nickname statistics obtained depends on what has been collected for the underlying table by the remote source's native RUNSTATS/ANALYZE/UPDATE STATISTICS capability. This mode is used by default at CREATE NICKNAME time, and is appropriate for nicknames over relational tables. It generally retrieves no useful information for nicknames over nonrelational objects, remote views, or aliases.
  • "Query-based" statistics: This mode is appropriate for nicknames defined over nonrelational objects, remote views or aliases, or remote tables for which statistics at the remote source are not completely obtainable by Federation Server (Sybase, DB2 for i-Series) using the catalog-based method. In this mode, Federation Server generates COUNT(*) and COUNT (DISTINCT...) queries against the nickname objects themselves to deduce table cardinality, column cardinality, and index key cardinality. Even if such queries are pushed down to the remote source, they can be very costly, especially if the remote object is large.

NNSTAT's default behavior is to attempt statistics collection using the catalog-based method first, and then, if no useful information is retrieved, to fall back to the more expensive query-based method.

One of several new arguments in NNSTAT's signature allows the user to control the method to be used in obtaining or refreshing statistics for a particular nickname or set of nicknames. Furthermore, in order to reduce the overhead of the query-based statistics collection mode, the user can also choose to limit statistics collection to only certain columns of a nickname or to only certain indexes of a relational nickname. Please refer to Chapter 11 of the "Administration Guide for Federated Systems" for assistance in using NNSTAT. All features of NNSTAT for nickname statistics maintenance are also integrated into the Control Center's nickname statistics update facility, as shown in this sample screen.


Figure 1. Nickname statistics update
Nickname statistics update

Improvements for nickname statistics: Example

The following query demonstrates the kind of performance benefit that the improved statistics, notably HIGH2KEY and LOW2KEY, can offer. Assume that there is a LINEITEM fact table on one remote source and a PART dimension table on another. The primary key of PART is P_PARTKEY, the corresponding (indexed) foreign key in LINEITEM is L_PARTKEY. PART also has a column called P_RETAILPRICE, with values between $900 and $2100. PART has about 800,000 rows, LINEITEM has several million rows. Consider the following query between nicknames for PART and LINEITEM:


      SELECT  COUNT(*) from PART, LINEITEM WHERE 
      P_PARTKEY = L_PARTKEY AND
      P_RETAILPRICE between  950 and 1700
      

The correct execution plan for this join depends on the optimizer's estimate of how many of PART's 800,000 rows will be selected by the predicate involving P_RETAILPRICE. In actual fact, 555,301 rows are selected. Without the HIGH2KEY and LOW2KEY statistics for the P_RETAILPRICE column, the optimizer guesses that only 2640 of the 800,000 PART rows qualify for the predicate, and selects a nested loop join plan with PART as the outer table. This is an unfortunate plan, since the number of nested join probes from PART to LINEITEM is severely underestimated. The plan runs in 808 seconds.

With accurate statistics for HIGH2KEY and LOW2KEY on the PART nickname as retrieved by WFS V9.1, the optimizer estimates quite accurately that over half (500,808 to be exact) of the PART rows will qualify. It selects a hash join with PART as the inner table which runs in 299 seconds. This example shows the importance and advantage of correct HIGH2KEY and LOW2KEY column statistics.


ODBC wrapper customization for access to IBM WebSphere Classic Federation Server for z/OS

WebSphere Federation Server can access legacy data such as IMS, Adabas and VSAM via the Classic Federation (CF) product that runs on z/OS. The CF server offers an ODBC interface, so it can be accessed from Federation Server via the ODBC wrapper, as shown below. Thus, legacy data can be joined with other relational data such as Oracle, Informix, and DB2 using Federation Server and Classic Federation Server together.


Figure 2. Classic Federation server access using ODBC
Classic Federation server access using ODBC

The ODBC wrapper is a generic wrapper which supports access to all ODBC 3.x compliant data sources. These data sources range from powerful relational databases such as mySQL to non-relational objects such as Excel spreadsheets. The ODBC wrapper has no way to know which SQL capabilities are supported by which data sources, so it simply assumes that all ODBC data sources support only the lowest common denominator of functionality. As a consequence, only the simplest SQL functionality is pushed down to the remote source, while more complex operations (grouping, joins, and so on) are virtually all done at the federated server. In this way, the ODBC wrapper can support access to a wide range of ODBC-compliant data sources, but the limited pushdown of functionality to the remote data source means that performance is often not optimal.

However, since the CF server includes a SQL engine that can handle a wide variety of constructs, performance of Federation Server queries that access CF can be significantly improved by pushing more SQL operations to CF. Improved pushdown of predicate evaluation, joins, and grouping operations results in reduced data movement between CF and Federation Server, which usually improves performance.

Internally, WFS v9.1 uses a set of so-called server attributes to define the capabilities of each data source. These server attributes are individually set for each of the native wrappers (such as Oracle, DB2, MS SQL Server, and so on) so that the federation server understands the capabilities of those data sources. Understanding the capabilities of the data source enables the federation server to push down more functionality to that data source, which generally improves performance.

Because the ODBC wrapper is necessarily generic, its server attribute settings are very conservative and reflect lowest-common-denominator functionality. If the ODBC wrapper could be made aware of the particular capabilities of the CF server, the server attributes could be set more aggressively in a manner that reflects those capabilities. This is exactly what the customization of the ODBC wrapper for CF in V9.1 achieves. No user intervention is required; the ODBC wrapper automatically recognizes that it is accessing a CF source and changes its behavior accordingly. For example, the wrapper is aware that CF can process many column functions, certain scalar subqueries, IN lists, GROUP BY constructs, equality predicates involving VARCHAR columns, and many more operations.

The degree of performance improvement for federated queries involving a CF server relative to WebSphere Information Integrator V8.2 depends on the particular query in question. Being able to push down aggregation operations or very selective predicates to CF (i.e. those that filter out many rows) will result in the greatest benefit. Pushdown of these operations is most beneficial because it greatly reduces the amount of data that must be moved between CF and Federation Server.

For example a simple SELECT COUNT(*) query involving a nickname to CF and a WHERE clause with an equality predicate on a VARCHAR column ran in 205 seconds in V8 Information Integrator and in 4.7 seconds in V9 Federation Server. In V8, the predicate was not pushed down to CF, and nearly one million rows had to be moved to the federated server to evaluate the predicate and perform the aggregate. As a result of the ODBC wrapper customization in V9, the same query could be completely pushed down to CF. Only a single row - the answer set- needed to be moved from CF to Federation Server. Performance improvements will not be this dramatic in all cases

In very rare cases, because of machine loading or query optimization at the CF server, pushdown of some operations to CF may actually run slower than moving the data to Federation Server and doing the operations there. However, these cases are exceptional. In most cases, increased pushdown to Classic Federation due to the ODBC wrapper customization results in significant performance gains.


Optimize ODBC wrapper performance using the ODBC tuning utility

In the previous section, we explained that because the ODBC wrapper is used to access a wide variety of sources, it is necessarily generic and makes very conservative assumptions about the functionality offered by the sources. Thus, Federation Server can push down only a few SQL operations through the standard ODBC wrapper to the remote source and executes most predicate evaluations, aggregations, sorts and joins locally.

The customization of the ODBC wrapper for Classic Federation increases pushdown and performance to CF sources. However, as a practical matter, similar customizations are not available for many other ODBC data sources. WebSphere Federation Server V9.1 thus introduces a utility that does a limited degree of "on-the-fly" customization of the ODBC wrapper for any data source. It does this by setting a small number of internal server attributes according to capabilities that are discovered at the remote source. The tuning utility consists of three phases illustrated in the figure below:


Figure 3. Tuning utility phases
Tuning utility phases

During the Preparation phase, the utility creates tables at the ODBC data source and populates the tables with a small amount of data. In the Analysis phase, a set of pre-defined statements is used to test the specific capabilities of the data source which relate to the server attributes being tested. The final Cleanup phase removes the tables from the data source. Output from the ODBC tuning utility consists of a file containing 11 ALTER SERVER statements which are designed to customize the wrapper so that it better understands the capabilities of the data source. These statements have the form


ALTER SERVER "DS1" OPTIONS ( ADD <internal server option> 'Y' );
            

Here, <internal server option> is a string generated by the utility that indicates to the ODBC wrapper a certain functionality or limitation of the remote source. For example, these internal server options can indicate whether or not a remote source supports a particular SQL feature such as outer joins or nested table expressions, or they may express a limitation such as the maximum length of a SQL statement accepted by the remote source. The file containing these statements can be executed from the Control Center or the DB2 command line. Once the options have been set, you must set the PUSHDOWN server option to 'Y' (if it is not already set), otherwise the above changes will not improve the pushdown of functionality:


ALTER SERVER "DS1" OPTIONS ( ADD PUSHDOWN 'Y' );
            

More details on the ODBC tuning utility can be found in the "Configuration Guide for Federated Data Sources". While its current scope is very modest, IBM hopes to enhance the tuning utility in future releases to more accurately characterize the capabilities of and improve pushdown to ODBC sources.


Asynchronous execution of federated statements

Background

In prior releases of WebSphere Federation Server, many federated queries execute in a single-threaded manner. This means that even if multiple remote sources participate in a query, only one remote source or the federated server is active at any given moment. Queries that involve both local tables and remote (nickname) data have the possibility of execution plans that allow concurrent and asynchronous execution of local and multiple remote query fragments. But queries that involve only nicknames have only very limited opportunities for such asynchrony2. The enhancements described here introduce query plans into WFS V9.1 that enable asynchronous execution of multiple remote queries in greatly expanded circumstances. This asynchrony can lead to significant improvements in query performance. For example, consider a federated union query across three sources. Each leg of the union does a two-table join on one of the sources.


Select * from <oracle_nickname1> o1, <oracle_nickname2> o2 where o1.a = o2.b 
UNION ALL
Select * from <sybase_nickname1> s1, <sybase_nickname2> s2 where s1.a = s2.b 
UNION ALL
Select * from <mssql_nickname1> m1, <mssql_nickname2> m2 where m1.a = m2.b;
            

The 'legs' of the union are processed one at a time, in left-to-right order in the execution plan (this happens to correspond to top-to-bottom as written in the query, but this is not always the case). So in this case, exactly one of the joins that comprise the legs of the union is active at any given time. For example, in the following diagram, the join (leftmost leg) on the Oracle source is complete, and the join on the Sybase source (middle leg) is in progress. The join on the MSSQL Server source will begin when the one on the Sybase source is complete and has delivered its entire result set to the federated server.


Figure 4. Federated join
Federated join

It is important to understand that this serialization is unrelated to federation; rather, it is brought about by the single-threaded nature of the DB2 UNION operator, which activates and drains the result set from one of its legs at a time whether they involve remote sources or not. Other DB2 operators with more than one input (joins) are also single-threaded. In a non-federated environment, this is not generally an issue, as local resources (CPU, I/O) are usually adequately utilized by performing one leg or input of a union or join operator at a time. In addition, in a strictly local environment, competition for scarce resources by multiple legs/inputs of a query operation might actually degrade performance. In a federated environment, however, serialization represents a lost opportunity to improve performance by allowing overlapped processing on different remote sources. In the above example, all three remote legs of the union could be executed at the same time.

Asynchronous execution of federated statements in IBM WebSphere Federation Server V9.1

The current release introduces the capability to execute remote inputs to binary/n-ary operators such as joins and unions asynchronously. Its primary motivation is to reduce the elapsed time of federated queries by enabling overlapped processing of remote queries that provide data to n-ary operators. Please note that asynchronous execution of federated queries is available only in DPF-enabled (partitioned) instances of WebSphere Federation Server. For details on how to enable asynchronous execution of federated queries in partitioned WFS instances as well as discussion of implementation aspects, please refer to the Resources section for a link to "Asynchrony in Websphere Federation Server" or Chapter 18 of the "Administration Guide for Federated Systems". This section briefly explains how asynchrony works, and illustrates the idea with two detailed examples. The section thereafter summarizes the scope and impact of asynchrony in federated execution plans.

For our purposes, it suffices to understand that asynchrony is introduced into a federated query plan by the optimizer's judicious placement of the TQ (Table Queue) operator. Briefly, the TQ operator, available in partitioned DB2 instances, provides the means to start an independent process or thread that asynchronously initiates remote processing of a query and is able to provide limited buffering of its results. The optimizer uses a sophisticated cost-benefit model to consider placing TQ operators directly above some of the SHIP operators found in federated execution plans. Recall that a SHIP operator containing a remote statement indicates a portion of the query that is to be processed remotely. Placing a TQ operator directly above such a SHIP operator in a plan enables the SHIP to be processed asynchronously. Only SHIP operators with associated remote statements that appear in the plan tree underneath join or union operators are candidates for asynchrony. The TQ operators used to achieve asynchrony in federated plans are called Asynchronous Table Queues, shown in execution plans as ATQ.

The best way to understand the impact of asynchrony is through an example. Consider another union query with two legs. Each leg (indicated by the pink and blue shaded areas in the figure below) is a remote join. Without asynchrony, the federated UNION operator initiates the blue leg first, waits for the remote join to complete, and retrieves its result set. Then it initiates the pink leg, waits for it to complete, and retrieves its result set. In the figure below, asynchrony has been enabled, and the optimizer has chosen to place an ATQ operator above the rightmost (pink) SHIP. In this way, the blue SHIP starts off at the usual time, at the beginning of the main query, and the pink SHIP is also able to start immediately, initiated by an independent process or thread.


Figure 5. Federated join with asynchrony
Federated join with asynchrony

In this plan, both remote joins run on their respective servers at the same time rather than one after another. It's important to remember, though, that only the remote operations are asynchronous - the UNION operator is still only able to process the result sets of each remote operation one at a time. That is, it first retrieves the result of the blue SHIP, then it retrieves the result of the pink SHIP. The difference is that by the time the rows from the blue SHIP have been received, the result set of the pink SHIP is hopefully ready and waiting to be retrieved from the remote source. Actually, as mentioned above, a limited portion of the result set can be buffered locally by the TQ operator, but this capacity is not generally sufficient to hold entire result sets of remote operations.

How much time is saved by the asynchrony in this plan? Ideally, if both remote operations (pink and blue SHIPs) take about the same amount of time and both return very small result sets, then the degree of overlap is maximized and the elapsed time is cut in half with respect to the non-asynchrony plan. However, in actual measurements, it happened that the server on which the blue SHIP ran was a little faster than the one on which the pink SHIP ran, and so the pink SHIP ran on its own for a portion of the query's execution. Elapsed time in this particular case decreased from 348.5 seconds to 204.5 seconds, an improvement of 41%.

This is not far from the theoretical maximum elapsed time reduction of 50% for a two-legged union. For unions with more than n > 2 well-matched legs, we have every reason to believe that users will see close to an n-fold speedup, or a percentage reduction in elapsed time of 1-1/n. Again, though, such improvements are possible only if the union's remote legs perform significant processing before beginning to return results, and if the result sets are relatively small. Remember that only the remote processing that precedes the return of results can be overlapped. Return of result sets from different sources remains serialized.

Let's look at another example that includes local joins (i.e. ones that are performed at the federated server) and a union. Each join involves two remote operations, and the union merges the results of all three joins.


Figure 6. Join with both remote and local operations
Join with both remote and local operations

The circled green numbers show the sequence of execution that would take place if asynchrony were not enabled. Recall that union queries are processed from left to right. The joins in this plan are hash joins (indicated by the HSJOIN operator), which process first their right (or "inner") leg to build a hash table and then their left ("outer") leg. Thus, without asynchrony, execution would proceed with one SHIP at a time: SHIP 1 and then SHIP 2 (green circled numbers) would execute first, and the HSJOIN above them would deliver its rows to the UNION operator. Next, the middle HSJOIN would cause execution of SHIPs 3 and 4 in sequence, after which the join result would be delivered to the UNION. SHIPs 5 and 6 would execute last.

With asynchrony, all SHIPs with an ATQ operator directly above them would start at the same time as the main query starts. SHIP 1 does not have an ATQ, but since it would have been first in the execution sequence anyway, it starts as well. Thus, all SHIPs in this query execute at the same time until they are ready to deliver rows back to the federated server. Row movement from the remote servers back to the federated server, as well as between the local HSJOIN operators and the UNION operator is serialized. Note that the execution plan indicates that the left leg of each hash join will return about 800,000 rows, a significant number. However, the overlap in remote execution time is enough to bring about significant improvement: The query runs in 432 seconds without asynchrony and 327 seconds with asynchrony, an improvement of 24%.

Asynchronous execution of federated statements: Summary

Let's recap the main points of the preceding detailed discussion and summarize the situations in which asynchrony is likely to provide performance benefit.

  • The primary motivation of execution plans with asynchrony is to reduce elapsed time of federated statements by overlapping processing of remote operations
  • Execution plans with federated asynchrony are available only to partitioned (DPF-enabled) instances of WebSphere Federation Server.
  • Information on how to enable and configure federated asynchrony is contained in "Asynchronous Execution of Federated Queries in WebSphere Federation Server V9.1" (see the link in the Resources section) or in Chapter 18 of the "Administration Guide for Federated Systems"
  • Federated asynchrony can be recognized in execution plans by the existence of the new ATQ (Asynchronous Table Queue) operator directly above SHIP operators corresponding to remote operations
  • The performance benefit of federated asynchrony completely depends on the nature of the query. The benefit is greatest for queries that contain remote operations that require significant processing before they begin to return rows.
  • Federated asynchrony enables queries executed at remote servers to run concurrently. However, the retrieval of result sets from these remote queries by the federated server is still synchronous and takes place one remote query (SHIP operator) at a time. Thus, remote operations that begin to return large result sets soon after being initiated will not benefit from asynchrony, since almost none of their processing can be overlapped with other operations.
  • In practice, federated unions and (to a lesser extent) joins involving either multiple remote sources or both local and remote data stand to benefit most from federated asynchrony, as long as the remote operations require significant processing before beginning to return a modest number of rows.

Conclusion

We've looked at four performance enhancements in IBM WebSphere Federation Server V9.1. While their performance impact varies with workload and environment, the enhancements, including

  • Improved nickname statistics collection
  • ODBC wrapper customization for Classic Federation
  • ODBC wrapper tuning for generalized ODBC sources
  • Asynchrony for federated queries

can result in very significant improvements with no or minimal administration effort on the user's part. These enhancements represent a continuing effort to optimize performance of Federation Server in a way that increases its usability and value for customers.

Footnotes

1.Some database systems, including DB2, actually store the second highest and second lowest values in a column , instead of maximum and minimum values. The motivation is to avoid "flag" values such as -1 and 99999 which will distort the true range of values. Depending on the remote source, values for HIGH2KEY and LOW2KEY stored in the federated server's catalog for nickname columns may be either actual maxima and minima or second highest/lowest values, respectively.

2.Some nickname-only queries in DPF (partitioned) systems may be executed using access plans that make use of computational partition groups. These plans achieve asynchrony among remote sources as a side effect


Resources

About the authors

Susanne Englert

Susanne Englert is a member of the Websphere Federation Server performance team and has been working with the product since late 2001. She has worked extensively in the area of database performance and is interested in query optimization, parallel query processing, federated queries, and customer use cases. Susanne is a graduate of the University of Bonn and past chair of the TPC's Decision Support benchmark development subcommittee.

Simon Harris

Simon Harris is a Performance Engineer within the WebSphere Federation Server development team in the Silicon Valley Laboratory. Simon has been working with federated database technology since its inception in IBM in 1995, supporting many customers in both a pre- and post-sales capacity throughout Europe, the Middle-East, and Africa.

Holger photo

Holger Kache is an advisory software engineer at the IBM Silicon Valley Laboratory. He is a member of the Information Integration Solutions organization with expertise in data integration and performance. Holger obtained his masters degree in Computer Science from the Brandenburg Technical University Cottbus, Germany before joining IBM in 2001.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=182534
ArticleTitle=Performance enhancements in IBM WebSphere Federation Server V9.1, Part 1: Improve performance of federated queries with new WebSphere Federation Server capabilities
publish-date=12072006
author1-email=senglert@us.ibm.com
author1-email-cc=
author2-email=simond_harris@uk.ibm.com
author2-email-cc=
author3-email=kache@us.ibm.com
author3-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers