When I work with millennials these days in all different management and programming duties, the experience is always very interesting. Each millennial has a very different point of view of IT and data processing. Even within the Big 4 consulting firms’ new hires, the college experience, IT degrees, consulting company training, and aptitude for IT activities is widely different which leads to a lot of discussion about the best solution and the best procedures to use to get the project done.
Talking to millennials is always fascinating, and understanding their point of view is critical for communicating the structure of the massive systems that are running the business environment. Asking the millennials the following four questions can give you a huge understanding why they are having difficulties within their management, development duties and maintenance of legacy IT systems.
-
What kind of programming did you do in college? This is an interesting question for the millennials because some of them have never done any programming. Never. Now college graduates can get an IT degree without having to write a single program in any language throughout their entire college curriculum. Some have only done fewer than 10 programs in their entire four-year college degree program and the languages they used were Basic, Python, or some other scripting language.
Now I understand why all the companies are hiring people from other countries. Some of the non-Americans who I have talked to have at least a little more experience with Java, C++, COBOL, JSON, or similar languages. Understanding programming, its complexity, and its methodology is critical to writing requirements, specifications and efficient IT solutions. The next time you run into a college-aged millennial, ask them how much programming or what languages their college required for an IT degree. You might be surprised at the answer.
-
What is the best form of IT documentation you have worked with, and why was it so good? This is always an interesting conversation because it explores the millennial experience within their studies and brief work experience. It also explores their consulting firm training because usually they are taught a methodology to follow within their consulting firm. Sometimes the answers are really interesting in that the millennials only have Agile project experience Agile systems can produce good documentation, but they are notorious for not producing any documentation or, at best, useless documentation that has no substance or value.
Educating millennials on the development specifications and documentation requirements for your company or project is vital to reduce development confusion and overall on-going maintenance costs. Making sure everyone understands the difference between good and bad documentation and how good documentation can be done easily, quickly, and succinctly is critical for the long term success of your project.
-
What is the largest number of processes that have run concurrently on systems you’ve analyzed, designed or worked on? Exploring the millennial’s answer to this question is interesting because it gives you an understanding of their working knowledge of analysis, development, and programming of large systems. By the time I got my degree I had written programs in Assembler, PASCAL, COBOL, FORTRAN, RPG, Basic and done scripting in JCL, KSH and BASH. While I didn’t have any business experience when I started in IT many years ago, the many different programs I wrote, the programming languages, the case study analysis of the complex systems gave me the tools to ask some of the right questions.
Millennials who work only on smaller Windows and UNIX systems usually don’t have an idea about the requirements of concurrent or constant processing, or availability requirements. When discussing their answer talk about the criteria your company uses to prioritize different applications, the framework for the different priorities, and the different architectural factors that drive the corporation’s processing. Communicate that all the current processing has to continue, not just in the project that is currently under development.
-
Which do you think is more important process or data? This question to the millennials isn’t really fair because the answer in my mind is both. Listen to their answer and their justification gives you some insight into how they may be approaching the IT environment and development tasks.
If it is one thing I have learned throughout my career about process and data is that one isn’t any good without the other. Any architecture, designs or decisions made without considering both can be disastrous for database and processing designs. What is good about this question is that it explores the person’s attitude toward how to come up with an IT solution. Processing and data go hand in hand, and good IT solutions are not built without considering the requirements for both.
The IT profession is always changing and millennials have a tough job learning all the new and old technology. Use the four questions above to explore millennials’ background and point of view will lead to a greater understanding of the best way to solve the complex IT architecture, design and development issues.
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.
|
DB2 10.5 “Cancun Release” was released as DB2 LUW Version 10.5 Fix Pak 4. IBM has put a number of impressive features in this new update. Within DB2 10.5 “Cancun Release” Fix Pak 4 there are several new and interesting monitoring elements. These new monitoring elements provide capabilities to dig deeper into performance tuning efforts to get more information and feedback. These new monitoring elements come in relation to time spent, sort heap, and more information about the SQL within the package cache.
-
Know where you spent your time. Within DB2 10.5 “Cancun Release” there are new monitoring elements that track online backups and index creation processes. The first two time-spent elements, total_backup_time and total_backup_proc_time, capture the total elapsed time and the amount of non-wait processing time of the backups. Also, a total_backups monitor element counts the total number of backups completed. These backup monitoring elements give administrators additional information on their backup processing efficiency and show how much time is necessary for their backup procedures.
In addition to these backup monitoring elements, there are similar monitoring elements that track index creation, re-creation, or rebuild. For the index monitoring there are total_index_build_time and total_index_build_proc_time elements which capture the total elapsed time and the amount of non-wait processing time of the index creation or rebuild.
The monitoring element of total_index_build_time is the parent of element stmt_exec_time. The element total_index_build_proc_time is parent to the total_rqst_time element along with being parent to the stmt_exec_time element. There is also the element total_indexes_built which tracks the number of indexes built. This index monitoring elements provide especially useful information when DB2 automatically rebuilds indexes during an online reorg or when DB2 automatically rebuilds indexes after running out of space or another index error situation.
-
Understand your sort heap requirements. There are several new sort heap monitoring elements in the DB2 10.5 “Cancun Release.” These elements provide a huge amount of additional information about the sorting processes within your DB2 system. All of these elements help you understand more attributes of the sort consumers, how much of the sort heap is being dominated by a particular process, and how much memory these concurrent sort processes are utilizing. All of these sort heap monitors will also help automatic memory management understand your workload and manage the memory to maximize overall performance better.
With all of these new sort heap monitor elements and automatic memory management, you have a number of elements to tell what is happening within your system.
active_col_vector_consumers |
active_peas_top |
active_col_vector_consumers_top |
active_peds |
active_hash_grpbys_top |
active_peds_top |
active_hash_joins_top |
active_sort_consumers |
active_olap_funcs_top |
active_sort_consumers_top |
active_peas |
active_sorts_top |
post_threshold_col_vector_consumers |
sort_consumer_shrheap_top |
sort_consumer_heap_top |
total_col_vector_consumers |
-
More Cache SQL Information. Next within DB2 10.5 “Cancun Release” there is more information provided by monitoring elements shown through the MON_GET_PKG_CACHE_STMT_DETAILS table function. These new monitoring elements help out in two main ways. The first, and probably the most important is monitoring SQL for suboptimal compilation. Through the prep_warning and prep_warning_reason monitoring elements, the table function can warn about SQL that might adversely impact your performance.
-
Get Cache SQL Object information. Through the more general MON_GET_PKG_CACHE_STMT table function, even more information is available about SQL in the package cache. The wide variety of data available now expands information about the statement id, plan id, and semantic_env_id which indicate whether or not the default schema was being used for the SQL statement.
All of this information along with the MON_GET_PKG_CACHE_STMT_DETAILS table function information can help monitor, warn, and help you better understand the attributes of SQL going against your most important performance sensitive objects.
-
Save information for later analysis. DB2 10.5 “Cancun Release” also adds new MON_GET_SECTION and MON_GET_SECTION_OBJECT table functions. These functions let you gather information and save it for later analysis. With the MON_GET_SECTION function you can save SQL information and pass it along to the EXPLAIN_FROM_DATA stored procedure to get more details on the SQL access path.
With MON_GET_SECTION_OBJECT table function DB2 translates the internal identifiers used in the package cache to the object type, object schema, object name, and object module information.
All of these new monitoring elements are only available in DB2 10.5 “Cancun Release” which should give everyone more motivation to adopt the new Fix Pak as soon as possible.
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.
|
The new DB2 10.5 “Cancun Release” was recently released as DB2 LUW Version 10.5 Fix Pak 4. IBM has done a lot of work and put a number of impressive features in this DB2 LUW Fix Pak 4. All these new DB2 10.5 “Cancun Release” features remove any doubts or restrictions about going to the most recent DB2 LUW BLU Acceleration which offers the new column data store as a table definition option.
When I first talked about DB2 LUW Version 10.5 BLU Acceleration last year here, it offered state-of-the-art technology and performance processing that no other DBMS offered. That’s still true. Its new column data store, actionable 90-95% compression, Single instruction Multiple Data (SIMD) for interrogating multi-core CPU buffers in parallel, and data skipping technology for avoiding I/O against unneeded data make DB2 LUW 10.5 performance extremely desirable and the best in the industry.
The list of features and improvements of DB2 10.5 “Cancun Release” are long and very detailed, and the best place to get information on the prerequisites and features is here at the IBM DB2 10.5 “Cancun Release” Fix Pak Knowledge Center website.
Some of the most important Fix Pak 4 features that jump to the top of the DB2 LUW list are:
-
Better performance without any effort. DB2 LUW system software leverages the new Power8 chip better than previous releases which provides more threads per CPU, larger memory addressability, and 2.4x more I/O bandwidth. The BLU Acceleration capabilities that I mentioned earlier are uniquely tailored to fully exploit the Power8 chip improvements.
-
Application programmers can now leverage SQL arrays for INSERT, UPDATE and DELETE, grabbing a full web page of DB2 data during one trip to the database. Also, application processes can now leverage the SQL MERGE statement to help eliminate the complex application processing logic for insert or update processes.
-
DB2 pureScale removes interconnect hurtles, now leveraging TCP/IP to connect its different DB2 LUW members. By running within virtual machines (VMs) DB2 pureScale can be more easily leveraged, managed and configured by quickly configuring and deploying VM for additional pureScale members. DB2 pureScale now supports spatial data types, removing previous restrictions. Now all types of applications can have their data leveraged within the high availability pureScale environment.
-
Full HADR support for column-organized tables, as well as new monitoring fields to check on the health of your HADR environment, and the ability to take GPFS snapshot backups remove the column HADR restriction and enhances disaster recovery for all DB2 LUW HADR users.
-
DB2 10.5 “Cancun Release” removes almost all of the restrictions to implementing DB2 LUW Version BLU Acceleration columnar tables. The ability to get better Explain information, the new SYNOPSIS table support, improved columnar table support of SQL with table expressions, support for the ALTER ADD COLUMN, and other features give the developer and DBA more flexibility with DB2 LUW. If you haven’t experienced the performance of column-based tables there is no longer any excuse to delay leveraging DB2 LUW column-base-table technology.
-
DB2 10.5 “Cancun Release” adds several security enhancements such as new auditing facilities to make sure the users and applications are well-behaved. The most impressive is the new enhanced encryption that is NIST SP 800-131A compliant. This encryption, which only has a minor performance overhead, does a great job of encrypting the data storage at rest. There are new monitoring elements to let DBAs and application developers verify their process performance. To improve ease of use, the new DB2 LUW install no longer requires Root access making the installation and maintenance of DB2 LUW even easier for operations organizations.
IBM has done a great job with the BLU Acceleration DB2 10.5 “Cancun Release”. These great features along with others that I didn’t mention continue to enhance the DB2 LUW column table options along with security. Everyone should attend and look forward to the upcoming IDUG and IBM Insight conferences where you’ll hear more about these and other great features from the IBM developers who will lead you on a trip to “Cancun” DB2 10.5 “Cancun Release.”
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.
|
In this post I talk about what other attributes and ideas the millennials can get from the mainframe systems, their development, and the people who support the technology.
-
Millennials can learn corporate business rules from the mainframe. Some mainframe systems have been actively working for the business many years, sometimes even decades. These systems have been modified, enhanced, and leveraged to book profits for the corporation by providing the most inexpensive cost per transaction. Also, these systems have been extended by replicating their data into many other applications and systems like big data.
Since these mainframe systems are running the business today, they are programmed to make all the business processing decisions; and contain all the data definitions, element encoding, application process criteria, and processing relationships for the business. All of the applications’ code, regardless of language, hold the business rules that make sure the business works. To understand the business millennials need only look to these legacy systems for a more complete understanding.
-
Learn how the business shares its data and processes. Since many of these mainframe systems have been around forever, millennials need to understand their history, what it took to develop them, and how they are shared across the various business departments. These older systems handle all types of processes for many different business departments. Years ago fewer projects were developed because of the larger development costs and the need for a greater number of people to communicate across the corporation. Since these systems needed to do so much for the various business aspects, they crossed many departments and shared their data, processes and functions throughout departments and divisions. Today’s projects share data and processes, but not to the extent of the collaboration required for some of these mainframe systems, since it is now easier for departments or managers to budget, develop, and deploy their own focused business solutions.
-
Millennials can learn about the forged corporate consensus. These mainframe systems built to service across departments and foster collaboration throughout the entire corporation had to forge corporate consensus to function. Anyone who has been in development understands how difficult it is to get departments to share their data and agree on the representation of conditions in various codes or shared data elements. Corporate data consensus was built through data dependencies and data relationships between the departments. Their corporate collations were forged through these mainframe systems still hard at work today.
Millennials need to get involved with the mainframe systems and the people who have the corporate knowledge as soon as possible. Since all of that gray-haired knowledge is beginning to walk out the door to retirement, management needs to have the new millennials pick up their know-how before it’s lost forever. By gaining knowledge of the battles fought and corporate policies embedded in the mainframe systems, the millennials can avoid rehashing old issues and leverage the corporate wisdom for better systems development in the future.
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.
|
When I talk to millennials coming into management and programming it appears that their knowledge of the mainframe is very limited or non-existent. Since many universities have stopped teaching mainframe classes, the recent college graduates don’t even have a basic understanding of the mainframe computing model, much less its advantages and why it endures through all the attempts to replace it. Since there is no need to reinvent the wheel, we all need to help the millennials understand why the mainframe will always be around with the following four important points.
1. Mainframes are the most reliable, stable, and available systems. The reason most of the world works is because of mainframe computer’s reliability, stability and availability which provides electricity grids, bank ATM networks, credit card transactions, and other services that countries, civilization, and society depend on. The mainframe systems are engineered with the most state-of-the-art hardware and software advances. Since mainframe computers are almost always running at 100% utilization, the platform demands the fastest storage, memory capacities, and CPU chip technology to get processing done efficiently and effectively. Some mainframe computer systems have been operating constantly 24/7/365 for decades.
2. The best system to serve them all. Since hardware and software upgrades are constant, the mainframe processing architecture is built to be redundant so operations can continue while upgrades occur. This provides the ability to support an ever increasing number of operating systems, virtual environments, and user bases with an optimum number of software components and support personnel. Also, since the mainframe capacity can be scaled up horizontally almost limitlessly by adding additional machines the processing power can securely segment and service side by side any and all types of test, QA, production, old, new, batch and interactive state of the art applications from a centralized platform. By servicing mainframe, UNIX, and other operating systems in virtual environments the mainframe can serve data and processing power for any requirements.
3. Mainframe has the most advanced technology available. The mainframe has been around for over fifty years and continues to be at the forefront of hardware and software technology advances. Many of the hardware and software technologies that are available in any big or small computing systems were developed first in the mainframe environment.
From virtual environments, memory management, and multi-tier architectures to cloud-time sharing computing model types, most computer technologies were first developed, leveraged, and continue to be used in mainframe facilities around the world. The most powerful CPU chip speeds, the number of parallel CPU chips, and multi-threading architectures are currently available on the mainframe. The different storage types from tape to flash memory can be architected to the extreme within the broad capacity of the mainframe. Additionally, all these hardware and software technology capabilities are backed up through comprehensive administration capabilities. The mainframe provides flexibility to customize the hardware and software environment and performance settings to optimize processing CPU and I/O capacity requirements while balancing and maximizing utilization.
4. Mainframes provide the lowest cost of ownership. First the efficient mainframe power and cooling requirements are much cheaper than equivalent distributed UNIX or Windows platforms. Personnel costs are as much as 60% budget for any computing environment. The number of personnel required to administer, configure, and maintain non-mainframe computing environments is much greater compared to the cost efficient mainframe environments.
Even with open-source solutions, software license and maintenance costs for the distributed environment are usually extremely more expensive for the multitude of test, QA, and production environments than in a mainframe environment. Since the mainframe has been around for decades, it has chargeback methodologies in place to track every aspect, a process that has negative connotations, but actually provides better cost allocation across the corporate structure. Since the distributed systems don’t have chargeback it sometimes makes it difficult to add up all the same factors. When all the distributed environments factors are monitored and documented, they are much more expensive, have troubled availability records, and are tremendously underutilized computing environments.
As the millennials join the technology workforce, they may naïvely say that they want to replace the mainframe. It’s our responsibility to educate another generation so they can learn that the reliability, availability, security, and, foremost, the costs of the mainframe continue to make it the best computing platform available. Possible applications on the mainframe are only limited by your imagination, because today it provides all types of analytics, mobile, social, web based, or transactional computing abilities which make the world go around.
Modified by DaveBeulke 2700023WUN
|
Previously, I talked about the first alphabetic group of DB2 data warehouse DSNZPARMS that can improve your access paths and overall application performance. This week the second set of DSNZPARMS are discussed. Many of the data warehouse DSNZPARMS discussed are somewhat hidden within the regular DSNZPARM install panels. All of these DSNZPARMS discussed are available in DB2 for z/OS DB2 Version 9. Some are available in DB2 Version 8.
Caution needs to be taken with all system settings and especially these data warehouse DSNZPARMS. These DSNZPARMS are meant to change access paths and improve them, but each data warehouse design is unique, along with each application access path, so results will vary. If the data warehouse DB2 subsystem is shared with other OLTP or operational applications, I highly recommend fully documenting and setting up a full PLAN STABLITIY plan and package management structure for your current access paths before changing any DSNZPARMS. This documentation along with a good PLAN STABILITY DB2 plan and package management implementation and back out practices helps you quickly react to your environment and back out any detrimental access paths encountered through unexpected rebind of any program.
Some of the comments from previous blogs on data warehouse applications highlighted the resurgent of data warehousing on the z/OS platform and why running a data warehouse on z/OS provides many advantages over other platforms. One that was noted from several people is when your data warehouse runs on z/OS, the huge ETL processes don’t usually have to transmit the data over a network. Even though the network bandwidth is robust, avoiding this extra bottleneck can sometimes save hours of overhead, guaranteeing that your refresh data jobs have enough time every day to provide critical refreshes of you data within your data warehouse. Additionally, most of your source data warehouse data comes from the z/OS operational systems and can quickly be put into operational business intelligence data warehouses. This fresh data increases sales, provides real time inventory or product availability updates and, most importantly, removes latency for all your critical single point master data source of record for the enterprise.
Improve your system and application performance by adjusting these data warehouse DSNZPARMS to improve your access paths and by using the superior DB2 optimizer technology and most efficient performance available.
To get Part 2 of the DB2 V9 DSNZPARM settings, click here.
Modified by DaveBeulke 2700023WUN
|
No matter what version of DB2 you are running, it’s a good idea to review the DSNZPARMs on a regular basis and get them set up to maximize the performance of your data warehousing application SQL by leveraging all the DB2 optimizer capabilities.
To begin, get a listing of the current DSNZPARMs for your data warehouse system’s settings. This can be done a number of different ways: through your performance monitor, through the JCL that creates the DSNZPARMs, or through the IBM REXX exec of the DB2 stored procedure DSNWZP. This DSNWZP stored procedure routine is one of the many DB2 administration procedures and is set up during the system install.
The data warehouse DSNZPARMs affect the performance of the DB2 system and its applications in many different ways, from the DB2 optimizer’s access path choices to the number of parallel processes DB2 will perform during data retrieval. To make sure that all DB2’s performance features and functions are available to improve performance, I’ve made an alphabetic list of DB2 data warehouse DSNZPARMs. (To get Part 1, click here.) Making sure these data warehouse DSNZPARMs are enabled and adjusted for your system will help your application fully exploit DB2 V9 and get the best performance possible for your data warehouse.
____________________
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.
|
As I have talked about previously, DB2 Version 9 for z/OS introduced over 50+ great new performance features. Many of these features can really benefit your standard DBA operations, improve application performance and increase overall availability of your systems and data.
One of the important new changes in DB2 Version 9 for z/OS is Utilities options and performance improvements which helps your overall maintenance and health of your systems.
Several performance improvements were made within these standard DB2 utilities leveraging improved access and the new big block I/O features within z/OS.
These improvements result in the potential to save a tremendous amount of CPU on many of your daily, weekly and monthly utility jobs.
Analysis and performancefigures published by IBM shows that the following CPU performance improvement potential.
-
0 to 15% Copy Tablespace
-
5 to 20% in Recover index, Rebuild Index, and Reorg Tablespace/Partition
-
5 to 30% in Load
-
20 to 60% in Check Index
-
35% in Load Partition
-
30 to 50% in Runstats Index
-
40 to 50% in Reorg Index
-
Up to 70% in Load Replace Partition with NPIs and dummy input
Elimination of BUILD2 Phase
One of the most important items related to the DB2 Version 9 for z/OS Utilities Improvements was the elimination of the BUILD2 phase processing within the standard utilities when rebuilding indexes from the data entries. The BUILD2 phase was the time when DB2 data was unavailable. Eliminating this phase helps availability for all your applications and shorter utility related downtime.
Eliminating the BUILD2 phase provides a dramatic improvement in availability, but could increase the CPU time and the elapsed time because DB2 manages the concurrent applications when one or more partitions are reorganized. Eliminating the BUILD2 phase allowed online reorgs during non-peak processing and provides the DBAs with more opportunity to maintain and improve their systems, database and application performance.
With each new release of DB2, IBM makes important improvements in performance and CPU reduction, like those noted in this recent blog post.
|
DB2 Version 9 for z/OS introduced over 50+ great new performance features. Many of these really stand out, and I have written about some that can really help you reduce the CPU demand on your system and application environments. (Go to www.davebeulke.com and click on the tag, DB2 9 to get a list of these articles.)
Some of the new features in DB2 9 for z/OS are really great data warehousing and reporting features migrated from DB2 LUW: the RANK, DENSE_RANK, and ROW_NUMBER SQL scalar functions. These OLAP type functions are great for a wide variety of data warehousing applications, moving sums, moving averages, complex calculations, or processing. These OLAP functions improve performance because they drive this complex processing into the DB2 database engine.
SQL RANK Function
The first OLAP function, RANK is used to order and prioritize your data according to your specific criteria when you do not want ties or the same number on duplicate rows. RANK orders your data assigning successive sequential numbers to the rows returned from the SQL query. The ranked rows can be individual data rows or groups of data rows.
SQL RANK Example:
SELECT WORKDEPT,AVG(SALARY+BONUS)AS AVG_TOTAL_SALARY,
RANK() OVER (ORDER BY AVG(SALARY+BONUS) DESC) AS RANK_AVG_SAL
FROM EMPLOYEE
GROUP BY WORKDEPT
ORDER BY RANK_AVG_SAL
WORKDEPT AVG_TOTAL_SALARY RANK_AVG_SAL
-------- ----------------------------- --------------------
B01 95050.00000000000000000000000 1
E01 80975.00000000000000000000000 2
C01 77847.50000000000000000000000 3
A00 71670.00000000000000000000000 4
D11 59283.63636363636363636363636 5
D21 51740.00000000000000000000000 6
E21 47570.00000000000000000000000 7
E11 45720.00000000000000000000000 8
8 record(s) selected
In the example above, the RANK function provides a way to order the data across two different criteria, providing the user the data in an easy to understand report broken out by each work department and average salary within that particular department.
SQL DENSE_RANK Function
The next OLAP function, DENSE_RANK, provides a way to manage ties or duplicate values within your ranking of the result set rows. DENSE_RANK orders your data and assigns successive sequential numbers based on the OVER PARTITION data values found. DENSE_RANK differs from RANK because common values or ties are assigned the same number.
SQL DENSE RANK Example:
SELECT WORKDEPT, EMPNO, LASTNAME, FIRSTNME, DENSE_RANK() OVER (PARTITION BY WORKDEPT ORDER BY EDLEVEL DESC) AS RANK_EDLEVEL
FROM EMPLOYEE ORDER BY WORKDE
PT, RANK_EDLEVEL, LASTNAME
WORKDEPT EMPNO LASTNAME FIRSTNME RANK_EDLEVEL
-------- ------ --------------- ------------ --------------------
A00 000110 LUCCHESSI VINCENZO 1
A00 000010 HAAS CHRISTINE 2
A00 200010 HEMMINGER DIAN 2
A00 000120 O'CONNELL SEAN 3
A00 200120 ORLANDO GREG 3
B01 000020 THOMPSON MICHAEL 1
C01 000030 KWAN SALLY 1
C01 200140 NATZ KIM 2
C01 000140 NICHOLLS HEATHER 2
C01 000130 QUINTANA DELORES 3
D11 200220 JOHN REBA 1
D11 000220 LUTZ JENNIFER 1
D11 000210 JONES WILLIAM 2
D11 000160 PIANKA ELIZABETH 2
D11 000180 SCOUTTEN MARILYN 2
D11 000150 ADAMSON BRUCE 3
D11 000200 BROWN DAVID 3
D11 000060 STERN IRVING 3
D11 000190 WALKER JAMES 3
D11 200170 YAMAMOTO KIYOSHI 3
D11 000170 YOSHIMURA MASATOSHI 3
D21 000240 MARINO SALVATORE 1
D21 200240 MONTEVERDE ROBERT 1
D21 000260 JOHNSON SYBIL 2
D21 000070 PULASKI EVA 2
D21 000270 PEREZ MARIA 3
D21 000250 SMITH DANIEL 3
D21 000230 JEFFERSON JAMES 4
E01 000050 GEYER JOHN 1
E11 000280 SCHNEIDER ETHEL 1
E11 200280 SCHWARTZ EILEEN 1
E11 000090 HENDERSON EILEEN 2
E11 000300 SMITH PHILIP 3
E11 000290 PARKER JOHN 4
E11 000310 SETRIGHT MAUDE 4
E11 200310 SPRINGER MICHELLE 4
E21 200340 ALONZO ROY 1
E21 000340 GOUNOT JASON 1
E21 000320 MEHTA RAMLAL 1
E21 000330 LEE WING 2
E21 000100 SPENSER THEODORE 2
E21 200330 WONG HELENA 2
42 record(s) selected.
In the example above, the DENSE_RANK function provides a way to order the data even though there are a number of people with the same level of education within the same work department.
SQL ROW_NUMBER Function
The ROW_NUMBER function provides a way to add row numbers to any result set. These additional ROW_NUMBERs can be a great assistance for evaluation of applications, their processing logic, and end-user web applications or service architectures.
SQL ROW_NUMBER Example:
SELECT ROW_NUMBER() OVER (ORDER BY WORKDEPT, LASTNAME) AS NUMBER,
LASTNAME, SALARY
FROM BEULKE.EMPLOYEE
ORDER BY
WORKDEPT, LASTNAME
db2 => SELECT ROW_NUMBER() OVER (ORDER BY WORKDEPT, LASTNAME) AS NUMBER, LASTNAME, SALARY FROM EMPLOYEE ORDER BY WORKDEPT, LASTNAME
NUMBER LASTNAME SALARY
-------------------- --------------- -----------
1 HAAS 152750.00
2 HEMMINGER 46500.00
3 LUCCHESSI 66500.00
4 O'CONNELL 49250.00
5 ORLANDO 39250.00
6 THOMPSON 94250.00
7 KWAN 98250.00
8 NATZ 68420.00
9 NICHOLLS 68420.00
10 QUINTANA 73800.00
11 ADAMSON 55280.00
12 BROWN 57740.00
13 JOHN 69840.00
14 JONES 68270.00
15 LUTZ 49840.00
16 PIANKA 62250.00
17 SCOUTTEN 51340.00
18 STERN 72250.00
19 WALKER 50450.00
20 YAMAMOTO 64680.00
21 YOSHIMURA 44680.00
22 JEFFERSON 42180.00
23 JOHNSON 47250.00
24 MARINO 48760.00
25 MONTEVERDE 37760.00
26 PEREZ 37380.00
27 PULASKI 96170.00
28 SMITH 49180.00
29 GEYER 80175.00
30 HENDERSON 89750.00
31 PARKER 35340.00
32 SCHNEIDER 36250.00
33 SCHWARTZ 46250.00
34 SETRIGHT 35900.00
35 SMITH 37750.00
36 SPRINGER 35900.00
37 ALONZO 31840.00
38 GOUNOT 43840.00
39 LEE 45370.00
40 MEHTA 39950.00
41 SPENSER 86150.00
42 WONG 35370.00
42 record(s) selected.
It is always good to review your current applications and see where these OLAP functions might help your performance. These OLAP functions usually improve performance where any histograms, moving sums, moving averages, or complex calculations take place within your various reports or other applications. Since the OLAP DB2 engine features are faster than any application program process, these new powerful SQL features can easily improve your application response time while reducing your overall CPU demand.
Modified by DaveBeulke 2700023WUN
|
As I referenced in an earlier DW post, DB2 Version 9 for z/OS introduced over 50+ great new performance features. Many of these enabled companies to reduce CPU demand on system and application environments.
One of the features introduced in Version 9, selecting modified data, helps your applications minimize the number of SQL calls and the number of trips to the DB2 system. This feature provides the ability to get data values with SQL SELECTs clauses wrapped around your standard UPDATE, INSERT, MERGE and DELETE SQL statements.
By providing for the ability to perform an SQL SELECT from an UPDATE, INSERT, MERGE or DELETE SQL statement, the application is able to retrieve many important values previously retrieved in additional SQL statements. By retrieving these data values when the UPDATE, INSERT, MERGE or DELETE SQL statement is executed, the application can get data values for ROWIDs, identity, sequence, timestamp columns and other values or defaults in the database.
What is also great about using a SQL statement to select the modified data is that the application can get the data values of columns modified by BEFORE INSERT triggers. By getting these column data values, your application can use them in other parts of the application immediately. This is especially valuable for maintaining referential integrity relationships that need the values of ROWIDs, identity or sequence column values.
The new feature of selecting modified data or SQL SELECT from an UPDATE, INSERT, MERGE or DELETE SQL statements, is very powerful also because it can eliminate a large amount of network traffic and additional SQL calls. Processes and their related SQL that were done in multiple calls to DB2 can now be done in a single call. Additionally before selecting modified data SQL enhancement, these old multiple SQL calls sometimes had to transmit large chunks of data back and forth across the network. Now all that extra data transmission traffic is eliminated.
By implementing this powerful feature of selecting modified data or SQL SELECT from an UPDATE, INSERT, MERGE or DELETE SQL statement, you can dramatically reduce the number of SQL Calls, reduce your network traffic, and improve your application response time while reducing your overall CPU demand
Modified by DaveBeulke 2700023WUN
|
This is a continuation of last week's post.
Avoid Inappropriate Tablespace Scans
There are many DB2 access paths available. The most expensive of these is a tablespace scan because it causes DB2 to scan all the entries of a table to retrieve the rows desired by the application. Within most of the system and application performance reviews done at one recent client's review, there were 1,869 distinct existing tablespace scans in their production environment. Almost one tenth of the applications were doing tablespace scans. This was significantly worse than that found at other clients or industry standards.
In addition to accessing every row within a table, these tablespace scans hold additional database locks, consume additional CPU, and hinder application throughput performance. Each of these tablespace 1,869 scans needed to be evaluated to see if the access path could be improved. Improving all of your production tablespaces scans will reduce overall CPU and locking consumption.
List Prefetch and Index Scans
Other expensive DB2 access paths are List Prefetch and Index scans. These access paths can very often be improved and optimized to direct index access resulting in CPU savings and improve elapsed runtimes. Analyzing the production environment at a recent financial bank showed a large number (737) of the available programs doing these types of SQL access types.
When these List Prefetch and Index Scans are being used, they are pushing a number of Record Identifiers (RIDs) to the RID pool within the DB2 system. When a large number of applications push RIDs into the RID Pool sometimes the pool requirements become too big causing RID pool failures. When RID Pool failures happen, the access path is degraded, causing more I/O and CPU to be used to retrieve the database information. To avoid this situation each of the index scans should be evaluated to determine if it is overflowing or causing major utilization of the RID pool.
These types of List Prefetches and Index Scans access types can sometimes be improved through using more SQL WHERE statement parameters or verifying the index definitions are appropriate. By improving the SQL, monitoring the number of rows referenced and monitoring RID Pool failures, the efficiency of the programs will be improved and the CPU reduced for each application execution.
Summary
Review applications in your company for:
-
Excessive use of tablespace scans
-
Excessive use of list prefetch and index scans
-
Lack of use of more recent SQL techniques such as limited fetch or table expressions.
________________________________________
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
Modified by DaveBeulke 2700023WUN
|
Let's delve into the SQL used in the applications of a major DB2 system.
Taking Advantage of Relational Technology
During reviews of various client systems, analyzing the SQL used is very helpful for understanding how sophisticated the shop and system are at leveraging relational technology. When the shop and system are well designed, the tables, indexes and application properly utilize object designs through effective object reuse and consistent design patterns. All of these application components are exposed when the application access methods are designed and formulated into SQL. Usually if a system is designed properly the SQL contains good access paths. SQL with good access paths use the following:
-
SQL Joins
-
SQL Sub queries
-
Table expressions
-
CASE expressions
-
Limited fetch
The more work done within in the DB2 relational technology engine, usually the faster the data can be turned into information for application reports, web pages, etc. At a recent client there were zero table expressions, limited fetches and very few of any of the other SQL options to leverage DB2 and relational technology. Rewriting one of the applications with an SQL Join and a SQL Table expression dramatically reduced processing elapsed time and CPU requirements taking the application processing from hours to only a few minutes.
Use SQL Joins
Analysis of another client's system showed that the application programmers were not using SQL Joins in any of their applications. Most of the application data processing focused on retrieving one row at a time and often from only one table.
The basis of relational technology is set processing or processing a large number of rows through a single SQL statement. Application programmers need to understand and leverage relational technology so that the application programs process more data with each interaction with the database. Often when SQL is used to process single database rows, the application program can be rewritten with SQL Joins of tables using indexes and proper WHERE clauses and dramatically cut the CPU and elapsed processing time. Using any of the several of SQL Join methods also optimizes I/O and CPU usage of the application by pushing the work from the program into the DB2 engine.
Experience with several past clients has shown significant elapsed time process improvement and tremendous CPU savings by rewriting application with SQL Joins. For example, an application designed with poor SQL was rewritten with SQL Joins and went from 12 hours of execution time to only 5 minutes.
Experience has shown that applications that yield the biggest improvements are those that are executed often and open cursors to access a large number of rows. Once these applications are identified, an applications team can redesign these programs to use more efficient relational technology and SQL join set processing.
Summary
It is important to drive processing to the SQL engine as much as possible. Review applications in your company for:
-
Lack of SQL Joins
-
SQL Sub queries
-
Table expressions
-
CASE expressions
-
Limited fetch
________________________________________
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
|
Last time I wrote about some of the application performance aspects. This week we’ll delve deeper into DB2 Checkpoint aspects for recoveries and index definitions over your busiest tables and their potential performance impact.
DB2 Checkpoint System Frequency
The DB2 system runs transactions and takes system wide recovery checkpoints for system consistency and integrity. These system wide checkpoints synchronize the system across all applications and can be disruptive because they hold up transactions to get a synchronization point. It is a common industry rule-of-thumb to take these checkpoints every ten to fifteen minutes to have good recovery points and not hold up processing.
Within the examined system these checkpoints were being taken very infrequently because the active log data sets were very large. This frequency, while good for processing through put, exposes the DB2 system to having an elongated recovery window should a disaster happen. The system checkpoint frequency needed to be evaluated and adjusted to provide better recoverability.
This situation also can cause issues during the batch cycle as programs write over 250,000 or 500,000 log records without the system taking a checkpoint.
Too Many Indexes Defined on Busy Tables
Looking at the workload for the overall system found that one of the main insert, update and delete processing tables also had a large number of indexes defined on it. Querying the DB2 catalog, found 11 indexes defined on this very busy table. Analysis also showed that the majority of the processing was inserts, updates and deletes as opposed to select activity.
Other very busy tables also had 7 indexes. These tables and each of their indexes needed to be analyzed for further usage analysis versus their overhead.
Too Many Indexes Not Being Used
Looking further into the indexes that were defined on the application database also showed many indexes that were not being used. Querying the DB2 catalog I often find a large number of indexes defined that are not being used. Sometimes these indexes that aren’t being used are even on the largest or the busiest tables.
Further research showed that some of these indexes were defined for one time processes and weren’t deleted after the processing was complete. One other index was being used and was critical for the dynamic SQL workload. So just because you queried the DB2 Catalog with the query below to find the index plan and package dependencies does not mean that you can delete the index.
SELECT DISTINCT TBNAME, NAME AS INDEXNAME
FROM SYSIBM.SYSINDEXES
WHERE DBNAME LIKE '<database-name>%'
ANDNAME NOT IN
(SELECT BNAME
FROM SYSIBM.SYSPACKDEP
UNION
SELECT BNAME
FROM SYSIBM.SYSPLANDEP)
Run the query and use the indexes listed as your starting point for further analysis. Each index listed could be critical for dynamic SQL performance so be very careful before deleted any indexes from your system.
In Summary
Verify the checkpoint frequency of your systems. Having the correct checkpoint frequency can provide the proper checkpoints for disaster and regular recoveries of your databases and system. The recommended checkpoint frequency is about every 10 to 15 minutes. So depending on how busy your system is adjust the number of log records accordingly.
Having extra indexes over busy tables can be a huge performance overhead. Deleting two or three extra indexes can sometimes cut one-quarter or one-half of a second off your transaction response time. So make sure you only have the indexes that are being used defined on your tables.
________________________________________
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
|
We’ve checked out numerous areas to check for performance. Now it’s time to see how the application performance is doing within the DB2 system. During this post, I’ll talk about some of the other standard places to check for performance improvements.
Locking Tables and Lock Escalations
Looking into the DB2 system and application monitor showed huge spikes of CPU during the morning hours. These CPU spikes were also happening when there were a number of applications that are experiencing lock escalation in the system. Lock escalation can be caused by numerous situations and needs to be minimized for best processing throughput possible. For example, in a production performance report there were LOCK TABLES = 14,667 and LOCK ESCALATIONS = 25 during a given day.
The high number of LOCK TABLES is a major consideration because it prevents concurrent work. These situations need further research to determine whether they could be modified and eliminated.
The lock escalations were also troubling in terms of system performance. Twenty-five lock escalations during the daytime online window could cause throughput and CPU performance problems. These lock escalations needed to be further researched to determine their origin and the resources that were being locked.
Write Engine Not Available
Looking into the DB2 system and application monitor showed extended read and write times during peak processing. The disk configuration showed normal disk I/O rates and no particular data sets had particular bad read or write times or performance issues.
Looking further into the DB2 monitor statistic showed that in some of the buffer pools the amount of write activity was very robust during certain peak periods. During these extreme peak periods there were some instances where the write activity was so robust that all of DB2’s write engines assigned to that buffer pool were busy.
This situation is denoted in various performance monitor reports under each individual buffer pool report as Write Engine not available. The condition is caused by updated or dirty pages remaining in the buffer longer than necessary and reducing the number of buffer pages available for other processes.
Once the number of buffer pool pages update becomes too big the write engine try to write all the updated pages to the disk. Since the activity was so robust, DB2 could not keep up and all the write engines were busy trying to keep up. This situation can be fixed in a variety of ways depending on your existing configuration.
The first method can be to expand buffer pool allowing more write cushion or changed the deferred write threshold setting to write more frequently or change the workload to spread out the peak timeframe. All methods will potentially improve the situation and allow DB2 to keep up with the write activity before it runs out of DB2 write engines.
In Summary
-
Do you have too many locks against your tables? How many lock escalations happened today? The answer to these questions lies in looking at the order of your application processing and your programming methodologies so that concurrent transactions can occur easily.
-
Check to see if there are lots of times that a write engine is not available. Look at the size of your buffer pools, look at the deferred write threshold and look at the timing of your jobs and limit the times write engines are not available.
_______________________________________
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
|
Last time I wrote about DB2 Checkpoint and index definitions over your busiest tables. This week I will discuss the old style of tablespace definitions and their potential performance impact. Tablespaces Need to Be Reorganized The first thing to look at is the current state of the database tables and indexes. Usually I find that there are many database objects in need of reorganization and normal on-going maintenance. This is indicated by the number of extents found on the each of the various tablespace and indexspaces which can usually be obtained through a standard data set extent report. This is also discovered by querying the DB2 catalog information and highlighting the number of FAROFFPOS rows. FAROFFPOS indicates that retrieving these data rows requires extra I/Os when referenced in the system because they are not stored on their optimum position within the database tablespace or indexespace. These FAROFFPOS rows are usually found within several large tablespaces and their indexes of database that have not had normal maintenance done on them during the normal business cycle. All database tables and indexes need to be analyzed to understand their daily activity (insert, update and delete frequency) so that their free space can be adjusted through the FREEPAGE and PCTFREE DB2 allocation parameters. These parameters and space allocations needed to be adjusted to handle the workload for a period of six months without database reorganization. The analysis and adjustments will pay off in better system and application performance, while freeing staff from database object reorganizations for an extended period of time. Simple Tablespaces Should be Changed to Segmented Tablespaces Simple tablespaces are not allowed in DB2 Version 10. Usually when I am doing a performance review on an old DB2 Version 9 system I find a small number of tablespaces that were still defined as a simple tablespace type. These tablespaces need to be redefined as segmented or Universal Table Space tablespaces to leverage the better space management and processing features of these tablespaces. Given that the simple tablespace definition is going away, it is a good idea to change these database tablespaces before the migration requirement creeps up on you and messes up or constrains your schedule. Tablespace Segment Size is Too Small Sometimes when I am looking into the segment tablespace definitions I find bad definitions or tablespace definitions that have taken the defaults. When this happens I usually find database tablespace segment size of 4 pages which is too small for normal databases. This segment size is used in secondary allocation efforts and pre-fetch operations. Having a small segment size limits the efficiency of pre-fetch operations and could require additional I/O during normal operations. It is best to have a large segment size of 32, if possible, depending on the number of partitions and whether or not you are using the new Universal tablespace type. When defining a regular segmented tablespace use a large segment size whenever possible to improve I/O and prefetch operations. Minimal Partitioning In some shops partitioning is not really used for the databases. In another performance review I found that a majority of the current database tablespaces used a segmented tablespace definition for handling data. There were a number of tables that had a large number of rows. Those tablespaces needed be analyzed to determine whether they should be redefined as partitioned tablespaces. Partitioning these tablespaces would split up the data into a number of smaller data sets and spread out the I/O and locking within the database. Partitioning allows better data set and space management for isolating processing or data set secondary extents. Partitioning also enables the processing to possibly use parallel access to reduce the elapsed time of any long running batch processes. It is best to partition tables that have over one million rows. The DB2 Catalog can easily be queried to determine tables with over one million rows that aren't partitioned. Table Reorganization Schedule Maintenance procedures needed be put in place to regularly review and schedule database utilities and database tools against the production database environment. These schedules and tools are very important to the ongoing efficiency of the database system. Performance problems can be very disruptive to the business environment. Analysis of the environment uncovered tablespaces and indexes in multiple extents in need of reorganization. Workflow analysis and database change statistics need to be gathered to develop an efficient maintenance schedule. Also standard jobs, downtime maintenance windows and operational procedures are needed to minimize the business disruption impact. Summary Honing in on your tablespace and table structures can give you lots of places to improve DB2 performance. Take a look at these areas in your own shop: - Are there tablespaces and tables that need to be reorganized? Check for a large amount of data set extents and a large number in rows that are in a FAROFFPOS position.
- Do you have any simple tablespaces that should be changed to segmented tablespaces for better performance? Are your segmented tablespaces properly sized?
- Do you have frequently updated tables with a large number of indexes? Double check to see if any of these indexes can be eliminated to improve performance.
- Are there large segmented tablespaces that really should be redefined as partitioned? Check for tables with over a million rows.
- Do you have a reorganization schedule an appropriate FREEPAGE and PCTFREE freespace settings that keeps tables cleaned up with a minimum of outage?
See you when we continue on with our exploration of this fascinating case study. ________________________________________ Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
|