This article introduces some of the basic data-access performance issues to consider while developing database applications using ADO.NET.

Ramesh Theivendran, Architect, Borland Software Corporation

Ramesh Theivendran has been a member of the RAD database connectivity R&D team at Borland since 1995. Currently, he leads the database connectivity efforts in their Win32 and .NET products group and serves as the Architect for dbExpress and Borland Data Provider (BDP) for .NET.



22 April 2004

Introduction

Data access plays a vital role in business applications. Performance is one of the key factors to consider in any data-intensive application. There are many factors that can affect data-access performance negatively, such as network load, load on the database server, un-optimized SQL, and so on. Apart from these, there are other factors to consider with respect to the various data-access operations that most applications perform, such as opening and closing a connection, fetching a result set, blob access, and metadata retrieval. In this article, I will analyze some of these data-access operations and make a few suggestions for improving database application performance.

I will use the Borland Data Provider (BDP) for .NET that ships with Borland® C#Builder™ and Borland® Delphi™ 8 for the Microsoft® .NET Framework (abbreviated to "Delphi 8 for .NET"),and also the IBM® DB2® data provider for accessing IBM® DB2® Universal Database™ (UDB).


Connection Pooling

Establishing a new database connection can sometimes be very expensive because it involves allocating client and server resources, authenticating the user, and other validations. Applications can significantly improve performance by establishing a connection and reusing the same connection for subsequent requests. Since database connections are not always active while a client is processing data locally, it is possible for a single connection to be shared across multiple clients. So a connection pool, which is a cache of database connections, can improve both the performance and scalability of an application, especially in a multi-tier architecture.

In ADO.NET, a connection pool is identified by a unique connection string. When a new connection is opened, a new connection pool is created if the connection string is not an exact match to any existing pool. When a new connection pool is created, a minimum number of connection objects are created and added to the pool in the background. If all the existing connections in the pool are busy, then new connections are added to the pool up to the maximum pool size. The defaults for connection pooling parameters - such as Min Pool Size and Max Pool Size - can be overridden in the connection string.

Connections in the pool are divided into connections with no transaction context and connections with a particular transaction context. While opening an ADO.NET connection, a connection is drawn from the pool based on a transaction context. If the connection is not already associated with a transaction, then it is drawn from the non-transactional pool.

Closing a connection returns the connection to the pool for reuse. Connections in the pool are associated with a lifetime and the connection pool manager periodically scans for unused and expired connections and removes them from the pool. Once created, connection pools remain active for the lifetime of the process.

To show that connection pooling indeed provides performance benefits, I will write a simple .NET remoting application. For some basics on .NET remoting, you can refer to my previous article, Building Distributed Database Applications in .NET using BDP and DB2 UDB.

The remote server exposes two methods - GetDataBDP() and GetDataDB2() - to populate and return a DataSet using the Borland Data Provider (BDP) (Borland.Data.Provider) and the IBM DB2 Data Provider (IBM.Data.DB2) respectively. For every request from the client, a connection is opened and closed after processing the SQL request. GetDataDB2() takes a flag to decide if connection pooling should be enabled or not. The current version of BDP doesn't support connection pooling.

Here are some basic test results showing the time elapsed in minutes. These results should not be considered as benchmarks. However, you can see that as more requests hit the server, the application performs poorly without connection pooling on the middle tier.

Number of
Requests:
250 requests
with pooling
250 requests
without pooling
500 requests
with pooling
500 requests
without pooling
Data Providers:IBM DB200:17.546875002:01.453125000:32.875000004:03.5468750
BDP - DB2N/A02:01.1406250N/A04:01.6718750

Here are two code snippets for the server and client. Please refer to the source code for a full listing.

RemoteServer.cs

public class RemoteDataProvider :
MarshalByRefObject, IRemoteDataProvider { public DataSet GetDataBDP() { DataSet ds = null; String connString =
"Provider=DB2;Assembly=Borland.Data.Db2,Version=1.
5.1.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1
b1b;Database=toolsdb;UserName=myuser;Password=mypasswd"; try { ds = new DataSet(); BdpConnection Conn = new BdpConnection(connString); Conn.Open(); BdpDataAdapter adapter = new BdpDataAdapter(m_commText, Conn); Console.WriteLine("SQL to DB2 : " + m_commText); adapter.Fill(ds,"Table1"); Conn.Close(); } catch (Exception e) { throw e; } return ds; } public DataSet GetDataDB2( bool bPool ) { DataSet ds = null; String connString = "Database=toolsdb;UID=myuser;PWD=mypasswd;"; if ( bPool ) { Console.WriteLine("Connection Pooling ON ..."); connString = connString + "pooling=true;Min pool size=100"; } else { Console.WriteLine("Connection Pooling OFF ..."); connString = connString + "pooling=false"; } try { ds = new DataSet(); DB2Connection Conn = new DB2Connection(connString); Conn.Open(); DB2DataAdapter adapter = new DB2DataAdapter(m_commText, Conn); Console.WriteLine("SQL to DB2 : " + m_commText); adapter.Fill(ds,"Table1"); Conn.Close(); } catch (Exception e) { throw e; } return ds; } }

RemoteClient.cs

public class RemotingClient
{
   public static void Main()
   {
      TestPooling();
   }

   private static void TestPooling()
   {
      IRemoteDataService remDS = null;
      ArrayList stat = new ArrayList();
      HttpChannel channel = new HttpChannel();
      ChannelServices.RegisterChannel(channel);

      String ClientID = Guid.NewGuid().ToString();
      try
      {
         remDS =
(IRemoteDataService)Activator.GetObject(typeof(IRemoteDataService),
"http://testserver:8000/RemoteDataService.soap"); if (remDS != null) { stat.Add(GetData(remDS, 250, false, true)); stat.Add(GetData(remDS, 250, false, false)); stat.Add(GetData(remDS, 250, true, false)); } Console.WriteLine(); for( int i = 0; i < stat.Count; i++) { Console.WriteLine((String)stat[i]); } } catch (Exception e) { Console.WriteLine(e.Message); } } private static String
GetData(IRemoteDataService remDS , int noofRequest, bool bBDP, bool bPool) { IRemoteDataProvider remDP = null; DataSet ds = null; String Out = ""; DateTime stime = DateTime.Now; String ClientID = Guid.NewGuid().ToString(); for (int i = 0; i < noofRequest; i++) { remDP = remDS.GetDataProvider(ClientID); remDP.CommandText = "SELECT * FROM ADDRESSBOOK"; if ( bBDP ) { ds = remDP.GetDataBDP(); } else { if ( bPool ) ds = remDP.GetDataDB2(true); else ds = remDP.GetDataDB2(false); } if (ds != null) { Console.WriteLine("Data received from the remoteserver"); Utils.PrintData(ds); } } TimeSpan ts = DateTime.Now - stime; if ( bBDP ) { Out = "Time duration without Pooling (BDP) = " + ts.ToString(); } else { if ( bPool ) Out = "Time duration with Pooling (DB2) = " + ts.ToString(); else Out = "Time duration without Pooling (DB2) = " + ts.ToString(); } return Out; }

Forward-Only Cursor

Forward-only, read-only cursors provide better throughput and also use fewer client and server resources. With Forward-only cursors there is no caching in the data-access layer and there is no need to maintain a current record position with respect to the records in the server. Data is read as a stream and records can be processed one by one. Forward-only resultsets are ideal for reporting, data processing applications where the same operation is performed on the fetched records.

In ADO.NET, the DataReader returns a forward-only resultset. The DataAdapter acts as a conduit between the database and the DataSet, and uses the DataReader to fetch records from the database and populate the DataSet. DataSet caches the data and acts as an in-memory relational database.

So depending on the application needs, you can use the DataReader directly to process one record at a time, or use the DataAdapter to fill a DataSet to provide a whole set of records and later resolve the client changes back to the database. In either case, selectivity of your SQL is very important for better throughput and overall performance.


Blob Access

Blob data can be of variable size up to 4 GB. Since a huge amount of data may be transferred over the wire, it is not always a good idea to fetch blob data along with other scalar data. While working with blobs it is also important to understand the underlying blob access mechanisms within the database client library. Most database clients provide more than one way of accessing blob data types. Depending on the blob data type, a client can bind a huge buffer or use a blob locator to fetch blob data.

While binding huge buffers for each fetch, the available blob data, or blob data up to the maximum buffer size, is transferred to the client. Blob locators, on the other hand, are basically references to the blob data on the database server. Only locators are transferred to the client during the initial fetch. Once a client has a blob locator, it can later call blob access methods to read or write the blob data.

So, to improve performance in applications dealing with blob data, care must be taken to fetch blobs separately, either with a new SQL request or using locators. Also, since blob data is not always processed, fetch them only when needed or explicitly requested by the application.


Schema Retrieval

Metadata retrieval is another expensive operation (because it may involve joining several system tables to retrieve the metadata for a particular database object) that should be minimized or totally eliminated at runtime. Most of the metadata retrieval for database objects can be done at design time, and the schema information can be persisted as XML or any application-specific formats.

Runtime metadata cannot be totally eliminated. In a few sophisticated applications for resolving relational data or object persistence, it may be required to discover the characteristics of the database objects at runtime. In those cases, the SQL for accessing the system tables must be tuned.

In the current version of ADO.NET, the metadata retrieval abstraction is not rich enough to retrieve information about all database objects. DataReader and DataAdapter have GetSchemaTable and FillSchema methods, respectively, to provider metadata for the current SQL request. The BDP extends ADO.NET and provides an abstraction for retrieving metadata on various database objects.

The following test results show that both the BDP and the IBM DB2 data provider perform equally well on most of the basic data-access operations. However, I did notice that if you use CHAR data type instead of VARCHAR, the IBM DB2 data provider seemed to blank-pad the data, resulting in degraded performance.

Data
Access:
Fetching 10,000 records
using DataReader
Fetching 10,000 records
with GetSchemaTable
Fetching 100 records
with 6K BLOB data
Data Providers:BDP - DB200:51.724376000:52.10492321:46:2527840
IBM DB200:51.744404800:51.92466401:38.2012064

Block Reads and Writes

Database client libraries allow clients to bind a single buffer and fetch one record at a time. There is a network round-trip for each fetch, and this can impact performance if an application is processing a huge resultset. Although retrieving huge result sets is not recommended, it cannot always be avoided, especially in application scenarios such as OLAP or collecting statistics from historical data. A few database client libraries allow block reads, where clients can bind an array of buffers and retrieve a block of records in a single round-trip.

In any disconnected data-access model (such as Borland DataSnap), ADO.NET, while persisting client changes back to the database, executes a SQL statement for each modified record. For example, if there are n inserted records instead of executing the same INSERT SQL command n times, the client can pass an array of parameter buffers to perform batch inserts. Block reads and writes can significantly improve performance, especially in a WAN environment, since records can be received and sent as batches in a single network round-trip. BDP doesn't currently support block reads or writes.


Asynchronous Execution

Long-running queries, such as complex joins or queries that involve a full table scan, can negatively impact application responsiveness. While the database is processing a SQL request, the client can do local application housekeeping if the SQL request is non-blocking. If asynchronous execution is not available, the SQL request can be made on a separate thread while the main thread continues.

Currently, ADO.NET framework doesn't support asynchronous execution but future versions may support the same.


Conclusion

Data access can turn out to be a major bottleneck if all the optimization measures are not carefully considered. Apart from tuning the database and tuning the SQL for better selectivity, other measures (such as connection pooling, minimizing metadata retrieval at runtime, moving long running queries to separate threads, and fetching blobs only when needed) can optimize data-access performance and provide better responsiveness for any data-intensive application. So, depending upon your application needs, choosing the appropriate data-access operation can improve performance and scalability.


Download

DescriptionNameSize
Code samplesource.zip  ( HTTP | FTP )35.5 KB

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13661
ArticleTitle=Data access performance in ADO.NET
publish-date=04222004