An introduction to IBM DB2 Connect: It's more than meets the eye
DB2 Connect unifies access to distributed, heterogeneous data
Part 2 of 2
This content is part # of # in the series: An introduction to IBM DB2 Connect: It's more than meets the eye
This content is part of the series:An introduction to IBM DB2 Connect: It's more than meets the eye
Stay tuned for additional content in this series.
In Part 1 of this series, we touched upon the different programming interfaces that DB2 Connect provides and the drivers that implement these interfaces. In the last few sections we scratched the surface of the communication infrastructure that DB2 Connect delivers and we saw how this infrastructure greatly reduces the usage of mainframe resources and allows distributed applications to fully exploit the strengths of the mainframe platform (for example, the ability to easily manage mixed workloads and provide continuous application availability).
You may remember a picture like the one in Figure 1 that shows DB2 Connect as a combination of programming interfaces (implemented as JDBC™, SQLJ, ODBC, DB2 CLI, OLE DB, .NET®, and Embedded SQL drivers) and a communications infrastructure.
Figure 1. DB2 Connect is a combination of programming interfaces and a connectivity infrastructure that delivers mainframe strengths to client server and Web-based applications
In this article, Part 2 of the series, we discuss one of the functions that is a part of this communication infrastructure, namely how DB2 Connect provides unified access to distributed heterogeneous data.
Before we dive in to the details of unified access, distributed, and the heterogeneous aspects of this solution, we need to turn our attention to the communication infrastructure itself. DB2 Connect delivers this communication infrastructure in the form of a communication server that can be deployed on Windows®, Linux® (including Linux for zSeries) and UNIX® servers. This communication server is built using the same code base that is used for building DB2 UDB database servers and, therefore, inherits all of the qualities found in that architecture for DB2 UDB servers.
As a matter of fact, the functions we describe in this article require that a database be created on the DB2 Connect server itself (no, you don't need a copy of DB2 Universal Database™ (UDB) for this). At first this may seem to contradict the statements in Part I of this series that DB2 Connect simply connects applications to DB2 for z/OS and DB2 for iSeries® databases and that DB2 Connect does not manage data. To clarify, the database that you will create on the DB2 Connect server will not hold any data. It will simply serve as a single connection point to deliver the unified or single database image to the application. DB2 Connect server then simply routes requests for data to the different database servers that actually manage the data.
Although it's likely the case that you learned about some features of the communication pipe that really separates DB2 Connect from its competitors in Part I, you likely already knew that DB2 Connect at least performed this function (connecting applications to mainframes). Now that you're more fluent with the underlying architecture of DB2 Connect, it's time to deliver on the more than meets the eye subtitle in Part I -- we'll start here in Part 2.
In Part 2, we're going to talk about DB2 Connect as a data-access platform, and we're not just talking DB2 on the mainframe here. For example, did you know that your DB2 Connect workstation is capable of performing a distributed join between a DB2 for z/OS database and an Informix® IDS on Windows database in the same transaction? It can also update these data sources under the same commit scope using built-in support for two-phase commit (2PC). We told you that you were going to find some neat features! If this sounds like federation, or more so WebSphere® Information Integrator (formerly known as DB2 Information Integrator), you're right. In fact, all DB2 UDB and DB2 Connect servers come with the WebSphere Information Integrator federation support for the entire DB2 UDB family and Informix IDS built into the engine. Products like WebSphere Information Integrator extend the reach of the federation engine to other relational (Oracle, Microsoft® SQL Server, and so on), non-relational (ADABAS, VSAM), OLE DB, XML, and virtual any other data source in your enterprise.
Unified access to distributed heterogeneous data sources
You probably have an idea of what unified, distributed, and heterogeneous mean, but may not be aware of how DB2 Connect implements these concepts. You may also be familiar with the IBM WebSphere Information Integrator products and may be thinking that these words describe these products very well. Read on and the interrelations between these products will become even clearer.
Unified access is an excellent way to reduce the complexity of developing applications in a heterogeneous environment. While it is true that an application programmer can always establish a separate connection to each individual data source, it is much easier to have a single database connection in the application. Separate connections to different data sources require multiple drivers (for example, a separate DB2 and Informix JDBC driver). Separate connections in an application do not allow you to treat data as if it was managed by a single database (for example, an application programmer has to fetch the data from multiple data sources and perform his own joins). Separate connections also hard code location in to an application and limit a data architect's ability to alter the location of the data to better suit a changing set of demands for the business.
In contrast, a unified data access mechanism provides an application programmer with a single point of connection to all of the enterprise's data assets. It allows for the use of a single API (driver), provides a single flavor of SQL to work with (you don't have to worry that SQL Server uses a money data type and that DB2 UDB doesn't), and abstracts data location in such a way that it can be changed without impacting existing applications. Finally, it allows the programmer to treat all data as coming from a single relational database with that database being able to manage the joining, sorting, and filtering of the data while assuring complete transactional integrity -- and with extensions to the base features in DB2 Connect, those back-end data sources don't have to be relational (for example, it could be a VSAM or ADABAS data source).
I hope it's obvious that working with a single database is much simpler than coordinating access to multiple data sources. The difference with our approach from IBM Information Management is that we don't expect you to rip-and-replace and stick it all in a DB2 database, because that isn't realistic.
DB2 Connect implements this intuitively simple access methodology in one of three different mechanisms:
- Federated databases
- Stored procedures
- SQL functions
DB2 Connect and the federated database
DB2 Connect comes with a base level of federated database capabilities built into the product. You may be familiar with this function, as it was previously provided by the IBM DataJoiner product. Since Version 8, federated database support has become an integral part of the DB2 Connect and DB2 UDB servers and is available for anyone to use without a need to purchase extra products. In other words, when you deploy DB2 Connect servers on Linux, Windows, and UNIX servers, you can create a federated database that an application can connect to. Once connected to the federated database, requests are routed to the real source of data -- while complexities like function compensation, data type conversion, optimization for efficient data retrieval, and more are all transparently handled for you.
The federation component of DB2 Connect includes READ/WRITE support for DB2 UDB for Linux, UNIX, and Windows, DB2 UDB for VSE/VM, DB2 UDB for z/OS, DB2 UDB for iSeries, and Informix IDS database servers.
You can use the federation capabilities in DB2 Connect to perform distributed requests across these servers, as shown in Figure 2 below:
Figure 2. DB2 Connect's federated database capabilities
For example, the following statement:
SELECT * FROM T1, T2 where T1.C1=T2.C2
could be used to return the results from tables that reside in different databases on different servers. For example, table T1 may reside on a DB2 UDB for z/OS database and it could be joined with table T2, which resides on an Informix IDS for Windows database.
This function allows application developers to refer to objects managed by multiple members of the DB2 UDB family (and Informix IDS) in the same SQL SELECT statement -- completely unaware of the distributed nature of the query. What makes this feature even more powerful is that the developers charged with writing this type of application see these data sources as local DB2 tables (via nicknames) and use the same SQL API to access each of them -- even from their respective Integrated Development Environments (IDEs)! There is no need for them to understand any differences between the back-end data stores, as they are presented as one virtual database to the SQL API and within the integrated development environment.
DB2 Connect as a Federation Server with WebSphere Information Integrator
In the previous section, you saw how DB2 Connect could build a federated database with DB2 UDB and Informix IDS-based data sources. When combined with WebSphere Information Integrator, DB2 Connect can to deliver transparent access to other data sources such as Oracle, Microsoft SQL Server, Sybase, IMS, VSAM, and a whole host of other data sources. The previous figure, when DB2 Connect is coupled with WebSphere Information Integrator, could be enhanced to look like Figure 3 below:
Figure 3. DB2 Connect's federated database capabilities with WebSphere Information Integrator for broader relational database access
In fact, a complete picture of the capability in this scenario would look like:
Figure 4. DB2 Connect's federated database capabilities with WebSphere Information Integrator gives access to any data.
You may be wondering if why you would need to have DB2 Connect if you already have WebSphere Information Integrator in your enterprise. First, WebSphere Information Integrator does not come with licensed access to zSeries and iSeries-based DB2 UDB databases. Quite simply, if you're not trying to federate access to these databases, then you don't need DB2 Connect. In addition to this, as your learned about in Part I - DB2 Connect's communication subsystem has key features that provide better scalability, reliability, and availability to ordinary connections.
DB2 Connect has a smart optimizer built into its federation core. This means that when you select a large amount of data from an Oracle database (if you are using WebSphere Information Integrator in this example), the optimizer knows the heuristics, indexes, cardinality, and general landscape of the Oracle data store and can rewrite the query for better access.
DB2 UDB has long been known for it autonomic powerful optimization techniques that lead to fast access to data: these benefits do not go away with federation. Rather, they are built into the federated optimizer in DB2 Connect. Therefore, as DB2 Connect acts as a federation layer, it can perform predicate push-down or branch-tree elimination to speed up data access. DB2 Connect will decide if it makes more sense to push the predicates down to the native store, or bring back an entire result set and apply them locally. DB2 Connect can even create a warm data cache on these foreign data sources such that, depending on the concurrency characteristics or business requirements of the data store, the heterogeneous database may not even need to get involved!
To dive deep into the powers of federation is out of the scope of this article, but the most important thing to keep mind is one developer in his own IDE, one API, endless data, anywhere. Those characteristics are literally unmatched in today's marketplace and represent a tangibly responsible solution to proprietary API access or "rip-and-replace" methodologies all but outlawed in today's cost-conscious climate. The bottom line is that DB2 Connect is a start to these efficiencies with a dynamic option to add more and more data sources as they are needed.
Stored Procedures and the unified data access
Stored Procedures in DB2 servers are somewhat unique in their capabilities when compared to stored procedures from other competing database vendors. Code in DB2 stored procedures can be written in pretty much the same way as the rest of an application (for example, a programmer or an application DBA can use a preferred programming language -- Java, COBOL, .NET, or C -- and can make calls to any APIs that are available in that programming language on the database server platform. For example, a programmer can write a stored procedure (in COBOL) that calls a CICS transaction and deploy this stored procedure to DB2 for z/OS. Because applications calling stored procedures are completely isolated from the details of the stored procedure's implementation, calling applications are completely unaware of the fact that their execution of the stored procedure will result in the access of a CICS resource (for this example), or the fact that the stored procedure is even written in COBOL (a language the programmer or DBA may have no skill in).
The technique of using stored procedures for data access can be used on all DB2 family servers. However, it's most popular on DB2 for z/OS servers simply because there is a great need to provide a unified SQL-based access to non-relational mainframe data such VSAM/IMS databases and a large inventory of business logic implemented as CICS and IMS transaction programs. Another popular technique is to use message queues, from WebSphere MQSeries®, to create stored procedures that use messaging to initiate the processing of transactions or business logic on other systems. Some of the capabilities provided by a product like DB2 Connect and the DB2 family's support for stored procedures are shown in Figure 5 below:
Figure 5. DB2 Connect's data access capabilities via stored procedures.
The advantages of providing a unified standard API-based access pattern to non-relational mainframe data is hard to overstate. The DB2 stored procedure approach does not just simplify a Windows, Linux, and UNIX programmer's job, it completely eliminates any need to be even aware of the nature and location of the data and program logic. There is absolutely no need to have programmers educated in the concepts of CICS, IMS, VSAM and other mainframe technologies -- yet these systems are fully exploitable to even the most basic of programmers (everyone knows how to call a stored procedure) in a well architected and controlled way (the word controlled is very key here as well). For example, there are a lot of application programmers that are well-versed in Visual Basic.NET but may have never even heard of CICS. Using a stored procedures approach, a Visual Basic.NET developer could develop a CICS application using the very same interfaces and methods that are used to access all relational data. This can result in considerable savings with respect to application development time and developer costs as the more standardized data access APIs (for example, JDBC and ADO.NET) can be used to get to the data.
Naturally, the stored procedures themselves do need to be developed and deployed. These stored procedures are typically built by mainframe programmers and DBAs who are skilled in these mainframe technologies. To further simplify the task of building stored procedures, IBM provides source code samples (that can easily be used in production without modifications) for CICS and IMS transactions. For CICS in particular, IBM provides a sample stored procedure (called DSNACICS) that helps to greatly simplify the process of accessing CICS transactions from distributed- and Web-based applications through DB2 Connect. In addition, a user exit (called DSNACICX) provides DBAs with a mechanism to abstract the knowledge of the CICS interfaces and check or override parameters provided by these programmers for better control.
You can learn more about CICS and DB2 Connect by reading the DB2 Connect documentation. A similar stored procedure, called DSNAIMS, for IMS is provided as well.
By now you may be wondering if this access to non-DB2 data using stored procedures is suitable for read-only types of operations or if can be used for real-world transactions. The answer to this question is that access to these data sources is atomic and data integrity is assured through the z/OS Resource Recovery Service (RRS) facility. For example, a stored procedure could update some DB2 UDB data, execute a CICS transaction, and be assured that if any of these operations was to fail, the entire transaction would be rolled back. As a result, you are assured complete data integrity when using stored procedures to update non-DB2 data and to execute non-DB2 transactions.
There are other avenues to access mainframe data and transactions from applications running on Linux, UNIX, and Windows systems. Why use DB2 Connect and DB2 stored procedures is a common question? The following bullets summarize what we think the main benefits of this approach are:
- A DB2 Connect and stored procedures solution do not require a separate driver or a separate connection to these non-DB2 mainframe data sources.
- The DB2 stored procedures method allows integration of the data from the non-DB2 mainframe data sources by DB2 Connect instead of relegating this task to the application. This lets an application get a consolidated data view of the data and therefore does not have to perform its own joins, filtering, sorting, and so on.
- The DB2 stored procedures method provides for a well managed and controlled environment that can be used for access to the data -- this is a must in the mainframe world. Access to non-DB2 data is provided through stored procedures only and individuals can be authorized or denied access by simply authorizing access to that stored procedure. Most mainframe shops are not very receptive to ad-hoc access from say, Microsoft Excel to a production IMS database through an ODBC driver for IMS. In addition, stored procedures are run in the Workload Managed (WLM) address spaces where resource consumption can be controlled and well managed.
Using SQL Functions to provide a unified view of distributed heterogeneous data
Most programmers are familiar with the SQL functions that databases provide to perform string manipulations, arithmetic and statistical calculations, and even calculate positional locations on a map, amongst other functions. DB2 comes with literally hundreds of built-in functions. In addition to this, DB2 servers allow programmers to build their own functions -- called User Defined Functions (UDFs) -- to augment what is offered in the product. Since a DB2 Connect server is really built on the DB2 UDB database server infrastructure, it too allows customers to create their own functions. However, because DB2 Connect keeps no data itself, these functions typically rely on accessing data from other data sources and make this data available as part of the federated database. To simplify the creation of such functions, DB2 Connect provides built-in accelerators for four types of such functions:
- Functions for connecting to data sources using the OLE DB API
- Functions for connecting to XML stored in files (outside of DB2)
- Functions for connecting to industry standard Web Services
- Functions for getting data using MQSeries message queues
All of these functions have one thing in common: When called, each one returns a result set that looks like a DB2 table (which is why we call them table functions). Just like any other built-in function, table functions can be used in a SQL SELECT statement. Another popular use of these functions is to actually use them in the definition of a view and have applications use the view instead of using the function directly.
OLE DB is a Microsoft technology that is embedded into the Windows operating system. The goal of OLE DB (as was the goal of ODBC before it) is to provide transparent access to a variety of relational and non-relational data sources. In the context of DB2 Connect, a custom written UDF can access any data source using the OLE DB interface provided there is an OLE DB driver (provider is the proper name rather than driver -- but many people use the term driver, so we included both here) is available. Fortunately, there are OLE DB providers available for most data sources (relational and non-relational). For example, Microsoft Data Access Components (MDAC) provides OLE DB drivers for popular data sources such as Microsoft Access, SQL Server, text files, Excel spreadsheets, and more. As previously mentioned, there are many other sources that have OLE DB drivers with most vendors providing OLE DB drivers for their respective data sources. DB2 Connect provides OLE DB drivers for DB2 -- but using the built in federated support provides a better alternative to access DB2 data sources versus an OLE DB UDF. An example of using an OLE DB UDF to access an OLE DB data source is shown in Figure 6 below:
Figure 6. Using an OLE DB function in DB2 Connect to access data.
In fact, to create the UDF in the previous figure (it exposes data stored in a Microsoft Access database as a DB2 view) requires no coding whatsoever! It's so straightforward and simple, that when coupled with the DB2 Development Center, you just click buttons. Detailed instructions are outside of the scope of this article -- but you can check out Developer Works for more information.
OLE DB access is great (and needed), but most of today's data doesn't actually exist in nicely formatted rows and columns (which sometimes gets lost when we talk about relational databases so much). While the scope of non-relational data could be endless, it's important to at least talk briefly about DB2 Connect and two of the most popular data medians: XML and messaging queues.
As mentioned above, DB2 Connect includes built in XML support and can access XML stored in both DB2 databases and file systems. To access XML data stored outside of DB2, you can create a UDF in a DB2 Connect federated database. This function would read the XML documents from a file and return the contents as structured data output.
The XML functions on the DB2 Connect server can assist in developing such UDFs. For example, db2xml.XMLVarcharFromFile, db2xml.XMLCLOBFromFile, and db2xml.Content are all functions that can all be used to read XML data from files into memory in a UDF. As with data returned by the OLE DB UDFs, data returned by the UDFs that read XML from files can be accessed by calling the UDFs directly or defining views that use these functions to materialize the data -- and yes, there is a wizard to help build thse functions too!
Data access to XML stored in DB2 for z/OS can be accomplished using either the XML Extender for DB2 for z/OS or the more ubiquitous extensions to the SQL API: SQL/XML (SQLx). Using either of these methods, DB2 Connect can shred and compose XML, validate it against a DTD or XML Schema document, transform it using an XSLT engine, etc. Figure 7 shows an example of using the SQLx functions built into DB2 Connect to access data from a DB2 UDB for z/OS database and place the output of the statement into an XML file.
Figure 7. Using the XML catapabilities in DB2 Connect to work with XML data - there are multiple ways to do this.
When you want to integrate data or business process flows with other systems, message queuing is another powerful tool that can be used with DB2 Connect. WebSphere MQSeries is by far the market share leader in this space and DB2 Connect can be used to peak at these queues, write to them (either table data or XML data), or peek and destroy them. This function is separate and distinct from reading and writing queues from stored procedures on the mainframe as described in the stored procedures section.
The support for WebSphere MQSeries is handled through UDFs that are called from SQL. So DB2 Connect gives developers who don't know how to write MQSeries-based applications the ability to work with data that is available in these queues as simple relational tables.
For example, to perform a simple select from a table and publish the contents of that table to a WebSphere MQSeries message queue, you could use the following statement:
SELECT MQSWND(LASTNAME) FROM EMPLOYEE
To insert the contents of a queue into a table, you could enter the following command:
INSERT INTO T VALUES(SELECT * FROM TABLE(MQRECEIVALL()) T)
These kinds of operations are shown in Figure 8 below:
Figure 8. Using the WebSphere MQSeries catapabilities in DB2 Connect to work with data.
Our discussion of UDFs and the DB2 Connect technology, and the examples we've provided, all dealt with reading the data. We showed how you can read data from XML files, message queues, other data sources using an OLE DB interface, and so on. However, it is important to point out that while reading seems like a more common usage, SQL functions can be used to write data out as well, though using stored procedures for updates is a more natural programming paradigm.
In this part of our series on DB2 Connect we hope that you have seen just how powerful of a data access platform DB2 Connect server really is. There are endless amounts of data sources out there, and with DB2 Connect and its associated products, you can leverage your investments and efficiently access this data without retraining, reinstalling, re-architecting, and so on.
So far we've seen how DB2 Connect is a highly optimized and integrated communication subsystem for application connectivity to mainframe databases. In addition to this, it's also a data access platform beyond DB2 for z/OS--your data just isn't in a relational DB2 database!
- The DB2 Connect product page links you to announcements and other resources for DB2 Connect.