Database workloads and DB2 10.1
There are two main types of database application workloads: online transactional processing (OLTP), and data warehousing,
which includes reporting, online analytical processing (OLAP), and data mining applications.
DB2 10.1 excels at both.
What differentiates an OLTP system from a business intelligence (BI) data warehousing system? The queries that are typically used to access the data. An OLTP system is typical of a web order system, where you perform transactions over the web (such as ordering a product). These applications are characterized by granular, single-row lookups with logic that likely updates a small number of records. In contrast, BI-type queries perform large table scans as they try to find data patterns in vast amounts of data. If you've ever been asked to summarize all of the sales for a particular region, that's an example of a warehousing query.
Quite simply, when you think of OLTP, think short and sweet. On the other hand, with BI, think of looking for needles in a haystack or aggregating a lot of data for reporting. Of course there's more to it than that, but you get the point.
Systems that contain operational data — data that runs the daily transactions of a business — are OLTP systems. However, these systems contain information that business analysts can use to better understand how a business is operating. For example, they can see what products were sold in which regions at which time of year. This helps identify anomalies or can be used to project future sales. However, several problems can present themselves if analysts access operational (OLTP) data directly for reporting and other BI activities:
- They might not have the expertise to query the operational database. In general, the programmers who have the expertise to query an operational database have a full-time job maintaining the database and its applications.
- Performance is critical for many operational databases (for example, a database used to process banking transactions). These systems can't handle users making ad-hoc queries on operational data stores. Consider, for example, the time you take to pay bills online. When you select OK, it usually takes only a few seconds to process a payment. Now, consider a bank analyst trying to figure out how to make more money from an existing customer base. The analyst runs a query that is so complex that your banking transaction now takes about 30 seconds to complete! Obviously that performance time is not acceptable (and neither are the new charges the analyst is dreaming up). For this reason, operational data stores and reporting data stores (including OLAP databases) are usually separated. Over the last few years, though, reporting data stores have tended to become pseudo-operational and current. Such stores are called operation data stores (ODSs) or even active data warehouses. Consider the telecommunications industry, for example. ODSs are popular with these companies as they try to identify fraudulent charges as early as possible. DB2 is one of the few databases that is well-suited for both operational and reporting workloads.
- Operational data is not generally in the best format for use by business analysts. Sales data that is summarized by product, region, and season is much more useful to analysts than raw transaction data.
Data warehousing solves these problems. In data warehousing, you create stores of informational data — data that is extracted from operational data and then transformed and cleansed for end-user decision making. For example, a data warehousing tool might copy all the sales data from an operational database, perform calculations to summarize the data, and write the summarized data to a database that is separate from the operational data. End users can then query the separate database (the warehouse) without affecting the OLTP database.
Now that you know the difference between OLTP and data warehousing, let’s look at the solutions IBM has for both types of workloads.
As already stated, DB2 delivers exceptional results when used to process both types of workloads — but as the volumes grow, you may have to expand your environment (one server might have to become a cluster of servers, more memory may be needed, more powerful processors may be required, and so on). DB2 can be expanded in same manner, through InfoSphere Warehouse (for data warehousing workloads) and DB2 pureScale (for OLTP workloads) — as you can see on Figure 4.
Figure 4. Methods for growing and scaling a DB2 environment.
IBM Infosphere Warehouse is a complete warehousing/OLAP/analytics solution that has DB2 Enterprise Edition at its core. This product also includes the Database Partitioning Feature (DPF); DPF is used to partition data among different databases, servers and storages, such that all servers process queries by retrieving their own (separate and different) portion of data, thereby achieving high levels of parallelism as more partitions are used. In the past, it was possible to add DPF to DB2 ESE. But, beginning with version 10.1, if you want DPF, you must use InfoSphere Warehouse. That really makes sense, because DPF is suited for warehouse/OLAP workloads, and InfoSphere Warehouse is the IBM product that has been designed specifically for those types of environments.
InfoSphere Warehouse benefits from all the new features present in DB2 10.1, plus it offers several other functionalities to optimize your data warehouse needs. More information on InfoSphere Warehouse and its features and functionalities can be found on the InfoSphere Warehouse web site.
To leverage the use of DB2 in critical environments that demand exceptional performance (usually made available through expensive and over-sized servers) where scaling processing power is often necessary, DB2 for Linux, UNIX, and Windows offers a new functionality that is based on the SYSPLEX coupling facility that has been in use for quite some time on DB2 for z/OS: pureScale. DB2 pureScale is an add-on feature that enables DB2 to better support transactional workload demands. Its use enables applications with big (or huge) online transaction processing (OLTP) volumes to obtain a high level of parallelism through a set of servers, working as a cluster, that access shared storage. Applications can connect to any member of a pureScale cluster, and each member processes transactions independently, delivering the desired performance, and enabling environment growth whenever necessary. For buffer coherency and global locking, a pureScale cluster relies on a component known as the Cluster Caching Facility (CF). Figure 5 illustrates a simple DB2 pureScale environment.
Figure 5. A typical DB2 pureScale environment
Scaling a pureScale cluster is easy: you simply add a new “member" (as each server within a pureScale server is called), without any application outages. Removing member(s) also works in a similar fashion. Having such scalability available is really a differential, but that’s not the only thing that makes DB2 pureScale unique. By enabling use of servers working in a cluster environment, DB2 pureScale increases database availability—whenever a cluster member fails, its requests will be routed to another member in the cluster automatically, and this re-routing is transparent to applications that are accessing the database. The same behavior will be applied to the CF in case of failure (when multiple CF servers are in place).
As stated earlier, DB2 pureScale is a paid-separately, add-on product that can be used with DB2 WSE (free-of-charge), DB2 ESE and DB2 AESE. Its use is restricted to a specific hardware and can only be run on some versions of IBM AIX (AIX 6.1 and 7.1) and Linux (SUSE and Red Hat Enterprise Linux - RHEL). For more information, refer to the links provided at the end of this tutorial. Every member of a pureScale cluster requires a DB2 license, as well as a pureScale license. However, no additional license is needed for the CF server(s) used.
Before DB2 10.1, if you wanted to use DB2 pureScale, a special release of DB2 was necessary: DB2 9.8. (This version was the first implementation of DB2 pureScale.) Now, with DB2 10.1, pureScale has been integrated to DB2’s core. It’s important to note that no application changes are necessary when migrating from traditional DB2 to DB2 pureScale. In fact, to applications, it appears as if traditional DB2 is being used, but that performance has improved significantly.