When getting my taxes ready every year, I review the previous year’s activity. Reviewing 2008 showed that the majority of my consulting was spent fixing and tuning DB2 Java based systems. This is not a big surprise since the trend and majority of my clients since 2005 have had DB2 Java performance tuning opportunities. Even in 2011, the trend of Java applications being built and implemented with DB2 z/OS and LUW continues at an ever-growing pace.
In most of these client situations, the majority of the problem has not been the database or the DB2 LUW or z/OS system; it has been related to the DB2 Java application processing. Java is the new workhorse and many systems are being implemented and not performing well. Working with my clients, I’ve discovered a variety of issues with these systems and over the next several weeks I will highlight the most common factors that kill performance for these new DB2 Java application systems.
Many object framework, architecture and programming pattern options are implemented with these languages. Given that there is no one object framework, architecture or application programming pattern that is right for every situation, there is neither a single right nor wrong way for performance success with your DB2 Java application.
(To be continued next week.)
In late 2010 the major theme of customer conversations was that every company was analyzing ways to save money and trim their IT budget. Those cost savings efforts slashed budgets by about 8.1% according to a 2010 CIO poll. The main method used was virtualization, combining all those UNIX and Windows environments that have sprouted up over the years. The new DB2 release comes along at just at the right time because Version 10 DB2 performance tuning solutions are potentially huge for many of the common designs, practices and DB2 performance tuning problems that are being used today.
DB2 10 Performance Solutions for Too Many DB2 Indexes Many DB2 index designs have far too many separate indexes defined on high performance tables. The DB2 10 Index INCLUDE COLUMNS feature provides the ability to consolidate some of those many indexes for improved DB2 performance. This will provide a CPU reduction for any data modification process and save storage with fewer indexes to manipulate and store.
Even if your design cannot handle removing all the indexes, DB2 10 also improves INSERT performance by using more parallelism. When INSERT SQL modifies a table with multiple indexes, DB2 10 does the pre-fetch of multiple DB2 indexes in parallel. By initiating parallel I/Os for the multiple indexes, the process is not waiting for synchronous I/Os, reducing the overall INSERT processing time. This cuts down the timeframe of possible contention within your system and improves DB2 performance tuning opportunities for all your applications.
DB2 10 Performance Solutions for Access Paths DB2 10 Optimizer SQL improvements are going to help everyone with their DB2 performance tuning efforts and especially applications and DB2 systems that have a large number of list pre-fetch access paths. These access paths are improved because the SQL Optimizer does more processing during the beginning of the data retrieval process Stage 1 SQL evaluation.
The DB2 10 SQL optimizer now evaluates scalar functions and non-matching index predicates during the Stage 1 evaluation of the SQL access path. Data entries that previously waited until the Stage 2 process are done early in the optimization process and dramatically limit the number of qualifying data pages and rows that have to be evaluated. By eliminating these rows early in Stage 1, I/Os and CPU of these additional data entries are not passed into Stage 2, significantly reducing and dramatically improving query elapsed time and overall query performance. This DB2 10 Optimizer SQL performance tuning solution alone will make a huge difference in all of your application’s performance.
DB2 10 Performance Solutions for All Applications
These and other features in DB2 10 provide immediate CPU and cost savings for any company seeking to improve DB2 performance tuning. With additional new SQL, XML and data warehousing features, DB2 10 provides more availability, design options and DB2 performance tuning opportunities for new or existing applications. DB2 10 provides performance solutions and a way for your business to reduce costs and improve performance exactly when your CIO and company need it the most.
To figure out the best temporal table design aspects you need to think of the various options and considerations that will affect its performance. The most important aspect for your temporal table is the answers that your applications or users are expecting from it. The best way is to figure out the time aspect that the application is trying to capture. Are your applications looking for the financial value, the insurance coverage level, enrollment status, customer value or something else?
The temporal table status can be contingent on two types of settings: business time or the system processing time. If the processing is delayed and the system time is later than expected, does that affect your temporal table status? Or are you using the temporal table in a real time scenario where either the business or system time will affect the meaning of the data? There are many ways to respond to the situations and questions, but the design decision should be based on the application and user questions that need to be answered. So it is best to test both SYSTEM_TIME and BUSINESS_TIME scenarios out and see which design provides the best answers with the best performance.
The next design point is to figure out your timestamp type. Do your temporal table application answers require distinct timestamps throughout the system? Your DB2 10 system now has new capabilities to provide a column that is unique within the table system wide. This DB2 syntax is defined WITHOUT OVERLAPS and can be used for your temporal table only for your BUSINESS_TIME values. After the temporal table is created, an index is defined for it using your unique columns and the BUSINESS_TIME WITHOUT OVERLAPS keyword. BUSINESS_TIME is the only option the WITHOUT OVERLAPS keyword works with.
When BUSINESS_TIME WITHOUT OVERLAPS is specified, the columns of the BUSINESS_TIME period must not be specified as part of the constraint. The specification of BUSINESS_TIME WITHOUT OVERLAPS adds the following to the constraints:The end column of the BUSINESS_TIME period in ascending orderThe start column of the BUSINESS_TIME period in ascending orderThe minimum value of a TIMESTAMP(12), the value is 0001-01-01-00:00:00.000000000000The maximum value of a TIMESTAMP(12), the value is 9999-12-31-24:00:00.000000000000
For DATE the minimum is 0001-01-01 and the maximum value is 9999-12-31.
A system generated check constraint named DB2_GENERATED_CHECK_CONSTRAINT_FOR_BUSINESS_TIME is also generated this definition process to ensure that the value for end-column-name is greater than the value for start-column-name. BUSINESS_TIME WITHOUT OVERLAPS must not be specified for a PARTITIONED index.
There are a number of considerations when creating your DB2 10 temporal table. When your application needs it to be unique, the system wide the BUSINESS_TIME option provides the capabilities with some cautions. Check out other posts on temporal tables via Developer Works or at my site, www.davebeulke.com.
DB2 offers application designers new functionality for their data warehousing requirements. The new DB2 10 Temporal Tables provide a way to have a snapshot in time of the status of customers, orders or any other type of business situation.
DB2 Temporal Tables, with their built in functionality, automatically understand the business time or system time of the data entered into the system. This functionality is ideal for handling and documenting the condition of the any business aspect at a certain time. This functionality is driven from two new column definitions, BUSINESS_TIME and SYSTEM_TIME, defined within a table definition. Using these new time period columns within a DB2 Temporal Table definition provides a system-maintained, a period-maintained or bi-temporal time period for your data.
Many systems today have manual processes or utilities that manage or migrate their real time data to history tables. The new DB2 Temporal Tables with their new system time and business time columns can be used in conjunction with a user-defined trigger to automatically migrate transactional temporal table data to another user defined HISTORY table. Having these facilities built into the database greatly improves regulatory compliance, operations and overall DB2 performance tuning.
Separating out the real time transaction data versus the old data within your database using the HISTORY table requires planning and design steps. The separation of the old data from new data guarantees application and SQL performance does not suffer when your database is fully populated. Separation of the old and new data also helps DB2 performance tuning management so more resources can be delegated to maintaining base new transaction data where DB2 performance tuning matters for business operational success.
Over the coming weeks I will go through the steps and design decisions required to set up a Temporal Table. We will go through the SYSTEM_TIME, BUSINESS_TIME and a bi-temporal table design.
Well the votes are in, 22 billion rows is big enough data. It’s not the billions of web logs rows of a Google or Facebook but its big enough for everyone. One of the comments that struck me was that one in a million happens 22,000 times. So whatever your criterion is for big data, it is more a state of mind about the amount of data as opposed to the actual terabyte amounts or the number of rows. Regardless of what database systems you work with, big is a relative term. Just ask your SQL Server, Oracle and Sybase DBA friends what they consider a big system. Usually the answer is nowhere near what you get for DB2 z/OS or even DB2 LUW systems. I talked about this a last year in my blog ("Performance is Relative").
Other comments and questions received about last week’s blog asked for more clarification on the idea of keeping a database design simple. So below are three different ways to keep your big data data warehouse design simple.
First: There are reasons Bill Inmon’s and Ralph Kimball’s decentralized and centralized data warehouse ideas are so popular, those design patterns work. Design patterns for all types of IT applications, Java/.NET MVC (model view controller), various business models and standard processes have been extensively analyzed and endorsed over many years through the design pattern books, conferences and government studies. The decentralized and centralized data warehouse design patterns work and your design should use them for your data warehouse performance. Big data or not, there is no reason to do something more complex. Starting with these types of design patterns, using and optimizing simple Fact table(s) surrounded by Dimension tables(s) design pattern will provide you data warehouse performance. Decentralize or extend these design patterns with as many Fact tables and slow moving Dimension tables will optimize and minimize the amount of big data referenced in typical transaction and your data warehouse performance won’t be an issue.
Second: Make sure to normalize your big data design. It’s typical to try to consolidate everything within a data warehouse performance design. Unfortunately having too many elements in a table forces too much data into an application transaction and data warehouse performance can suffer. Just as decentralized and centralized data warehouse performance design patterns have been used for years, database table normalization has been around for even longer because it logically optimizes your design. The database design normalization process has been documented everywhere over the years and it is effective for making sure the table elements relate to the whole database table key(s). Combining table keys or designs causes excessive repeating data or data groups and over-normalization leads to excessive application joins. Normalization is striking a balance and no one does it perfectly the first time. Normalize your data warehouse performance design several times and your transaction performance can strike a balanced performance for all critical applications.
Third: Design, test, redesign test and repeat. Schedule enough database and application meetings and testing time to understand all the application transactions and reports. Data warehouse performance and modeling big data can get unwieldy, so testing your design early is vital. Sometimes big data table population numbers cause tools to abort. Cut the number of zeros down and model all your application transactions against your database design, build it and run applications against it. Data warehouse performance requires real life testing and actual running of the code or SQL that interfaces with the design. No one has any time to do it perfectly but everyone will be mad if you have to redesign and do it all over right before implementation. Know your performance before production through thorough testing. Big data and data warehouse performance requires design and testing. Make sure to do these several times during your development with as much big data as possible.
These are only some of the simple truths that insure your data warehouse performance for your big data system is a success.
Within the zEnterprise announcement in July 2010 there were several exciting
comments about the new query performance within the IBM Smart Analytics
Optimizer. This smart system leverages the integrated platform, providing
industry leading scalability for data warehouse and business intelligence
Queries that scanned entire terabytes of data that were previously avoided
can now be executed in seconds through the IBM Smart Analytics Optimizer
processing. This capability provides data mining features for quickly getting
answers to the clustering, associations, classification and predictions within
your data warehousing environment.
Within some of the documentation, IBM testing of the Smart Analytics
Optimizer shows huge performance boosts. Queries that scanned and then
subsequently optimized sometimes showed an improvement of 54 times, the cost of
the query 711 times. This type of improvement is very exciting and especially
what new data warehousing and business intelligence workloads need on the
Check out all the
zEnterprise documentation—especially the information on the IBM Smart Analytics
Optimizer. This was just the beginning of a resurgence of the new mainframe
that supports any mainframe, UNIX or windows workload. Known as a company’s
“private cloud,” it allows IT departments to leverage tremendous performance
improvements while reducing query cost substantially.
he DB2 10.5 “Cancun Release” was previously available as DB2 LUW Version 10.5 Fix Pak 4.
The short list of prerequisites and summary DB2 10.5 “Cancun Release” features list is here at the IBM DB2 10.5 “Cancun Release” Fix Pak Knowledge Center website.
Within the release there are many important features, and some of the most interesting ones are related to the new DB2 Explain information. New Explain columns provide more information on the DB2 SQL access path attributes used within our applications.
There are 14 new columns which have many values to support a variety of DB2 optimizer access path information. These are highlighted below, along with a description of what the new Explain provides for our debugging and performance research.
Provides an indicator related to whether the SQL Explain output was built from SQL that was reused. This can be important to identify SQL that could be reused from your system statement cache, be resident for a long time, and not the new refreshed version that was just pushed into the test environment.
Indicates whether the sort process is used to buffer the result set information as it is retrieved from the database.
• BYDPART and RANDACC
Help identify and describe the Zig Zag Join access path. These columns indicate if the DB2 access part is partition dependent which is especially important for pureScale systems. The RANDACC Explain parameter highlights whether the Zig Zag Join uses a TEMP table to facilitate and improve random access.
Indicates that XML data may be moved between DPF partitions during the execution of the SQL.
Indicates that the access path references the index block entries within a Multi-Dimensional Clustered (MDC) index to FETCH the result set data. This is similar to index-only access on a standard table.
Indicates DB2 SQL optimization error(s) occurred during the parsing or applying of SQL Profiles, a new feature. By defining specialized SQL optimization profiles, access paths can be improved or standardized for access-path-challenged tables. Sometimes by applying this profile information, the parsing or optimization can encounter an error. This column, along with the diagnostic messages, can help debug the error situation.
Tells whether the bind option for skipping locked data during the SQL execution is enabled. Skipping locked data can be troublesome especially in robust systems with detailed result set requirements. Be careful and monitor this for accounting and other critical result set data applications.
• PLANID, STMTID and EXECUTID
Help uniquely identify the SQL statements for the DB2 plan, statement, and the execution id used for the Explain. These unique identifiers are very important due to the new ability to set up execution profiles which the optimizer can use to favor critical DB2 access paths for better performance.
• BUSTSENS and SYSTSENS
Indicate whether the SQL could be impacted by the value of the CURRENT TEMPORAL SYSTEM_TIME special register. This is very important since SQL can have specific DB2 temporal table business time or system time requirements. The temporal special register and the process’s SQL temporal requirements now provide more information about the flexibility for using the business or system temporal time.
Indicates the DB2 access path isolation level used for the SQL against a specific table. Since the isolation level can be adjusted through a number of mechanisms, this Explain column provides the information indicating whether it is Uncommitted Read, Read Stability, Cursor Stability, or Repeatable Read.
All of these new Explain columns in the new DB2 10.5 “Cancun Release” provide more critical information about the development of DB2 SQL access paths by the DB2 optimizer. Knowing how the DB2 optimizer is treating your SQL is always better for getting the best DB2 SQL performance possible.
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.
During the 2010 IBM Z Summit road show, there were several presentations detailing the mainframe platform advantages over UNIX and Windows platforms such as the lowest total cost of ownership, the best availability and unparalleled scalability. These presentations cut through the rumors with detailed facts and figures of different platform configurations. Download these presentations and distribute them to your management for a little reminder why the mainframe continues to be the best platform for your enterprise applications.
The Windows and UNIX platforms proponents always discount and minimize the total cost of ownership, availability and scalability topics. It is our duty to periodically remind management of the extra costs of these UNIX and Windows systems with their huge power consumption costs, software license fees, and software maintenance costs of working with several hundred or thousands of disparate systems. The mainframe quietly continues to process the majority of the transactions at the Fortune 500 companies and everyone, especially younger management types who think the world can run on an iPhone, needs to understand that the System Z infrastructure is the best backbone for any company.
The System Z mainframe is also evolving since it now has specialized processors such as the IFL, zIIP and zAAP to reduce overall operational and licensing costs. These specialty processors, along with the smaller configuration of the System Z offer a single small platform that can consolidate any number of UNIX workloads into a single footprint with a smaller greener energy footprint and better licensing configuration.
The presentations detail benchmarks, licensing fees and labor costs of various mainframe versus UNIX platform configurations. The figures show it sometimes takes double the number of processor cores on a UNIX configuration to start to scale out a configuration. Even more UNIX processors are required to achieve transaction rates that are still only performing one-fourth of what the mainframe System Z executes. These UNIX systems are also dedicated to the production transaction environment with no thought of supporting testing, QA or failover facilities that have yet to be priced or considered, features that come standard within the System Z environment.
System Z also continues to grow because of its faster chips. Ask any PC or UNIX platform personnel “what platform has the fastest clock speed processors” and you will quickly find out who keeps up with the industry information. The chip clock speeds of the System Z and other IBM platforms have improved like the rest of the PC industry. In fact, the System Z z10 chip operates at 4.4 GHz and comes in a 64-way quad core configuration that can speed up any application performance problem. This is almost twice as fast as the HP Superdome processors and a third faster than the Intel Nehalem chips.
So the mainframe continues to lead the industry. Does your management know the cost savings and performance figures of System Z? Tell them and show them the presentations before someone tries to “replace the mainframe” again with a more troublesome, power hunger, bad performing clustered iPhone configuration.
Over the last three years, my clients have shown that multiple framework, architecture and programming patterns are usually implemented within the same project. The problem is the poor performance lessons experienced from the application implementation are not fully understood and the performance problems are continued and replicated into the next architecture, framework or pattern iteration, including DB2 Java applications.
Each application is different and each service or process within the application is unique. Step back and be flexible in your design patterns to understand that one or two architectures, frameworks or programming patterns are not correct for every situation. Your design should reflect the application requirements and the correct implementation for achieving the best DB2 Java performance might mean a variety or mix of approaches.
Objects within Java are great for flexibility and reuse. Java services and open source products such as Hibernate, iBatis, Ruby and techniques such as Java Persistence Architecture (JPA) and Data Access Objects (DAO) are great for accessing the DB2 database. Many of these techniques are common in today’s DB2 Java applications. My clients have experienced problems with these techniques when the application processing does not pay attention to the transaction integrity or the unit of work properly. When this happens the DB2 Java application processes usually have connected to the database multiple times, processed the transaction too many times or not committed or rolled back the transaction properly. These DB2 Java transaction situations usually manifest themselves in JDBC errors or poor referential integrity issues that developers blame on the database. Unfortunately, it is not the database but the application coding of the services that cause the problems.
In the coming weeks I will talk further about DB2 Java applications, their processing and issues that I have experienced with my clients. I know it will help you avoid some of these problems too.
Modified by DaveBeulke
Previously, I talked about the first alphabetic group of DB2 data warehouse DSNZPARMS that can improve your access paths and overall application performance. This week the second set of DSNZPARMS are discussed. Many of the data warehouse DSNZPARMS discussed are somewhat hidden within the regular DSNZPARM install panels. All of these DSNZPARMS discussed are available in DB2 for z/OS DB2 Version 9. Some are available in DB2 Version 8.
Caution needs to be taken with all system settings and especially these data warehouse DSNZPARMS. These DSNZPARMS are meant to change access paths and improve them, but each data warehouse design is unique, along with each application access path, so results will vary. If the data warehouse DB2 subsystem is shared with other OLTP or operational applications, I highly recommend fully documenting and setting up a full PLAN STABLITIY plan and package management structure for your current access paths before changing any DSNZPARMS. This documentation along with a good PLAN STABILITY DB2 plan and package management implementation and back out practices helps you quickly react to your environment and back out any detrimental access paths encountered through unexpected rebind of any program.
Some of the comments from previous blogs on data warehouse applications highlighted the resurgent of data warehousing on the z/OS platform and why running a data warehouse on z/OS provides many advantages over other platforms. One that was noted from several people is when your data warehouse runs on z/OS, the huge ETL processes don’t usually have to transmit the data over a network. Even though the network bandwidth is robust, avoiding this extra bottleneck can sometimes save hours of overhead, guaranteeing that your refresh data jobs have enough time every day to provide critical refreshes of you data within your data warehouse. Additionally, most of your source data warehouse data comes from the z/OS operational systems and can quickly be put into operational business intelligence data warehouses. This fresh data increases sales, provides real time inventory or product availability updates and, most importantly, removes latency for all your critical single point master data source of record for the enterprise.
Improve your system and application performance by adjusting these data warehouse DSNZPARMS to improve your access paths and by using the superior DB2 optimizer technology and most efficient performance available.
To get Part 2 of the DB2 V9 DSNZPARM settings, click here.