working with a client’s SOA environment showed several interesting DB2 performance
issues. One DB2 performance issue that was quite stunning was the large number
of connections that the .Net and Java applications were making to DB2 and other
systems. Researching the system and application further uncovered a wide
disparity in the handling and the amount of connections each of the many
application modules were using. Proper connection handling is very important to
DB2 performance because of three main reasons: acquiring new connections is
expensive, application connections maintain database locks and connections are
unit of work transactions.
getting a database connection is expensive because of all those great things
that a database provides such as security and integrity. Since the database is
important, every connection request must have its security checked and be
authorized. This security authorization against the database system and the
data desired is quick, but takes time. Next, when database processing
guarantees integrity, it is through its transaction logging of the unit of
work. Starting a new database unit of work again is fast but must be managed
within the database so that it can be backed out should the transaction
within the database connection, the SQL processing selects, inserts, updates, or
deletes data. These actions holds locks against the data referenced and prevent
applications from trying to update the same data or reference the same deleted
data. The DB2 applications have several mechanisms to control and handle this
locking within the application and system. The best way for DB2 performance is
to Bind the application against the database is using the Bind parameters for
cursor stability ISOLATION(CS) and CURRENTDATA(NO). This minimizes the
immediate locks held and allows other transactions more concurrency to the
data. If the application is read only and is not concerned with other
transaction manipulating the data then use uncommitted read ISOLATION(UR).
Using the ISOLATION(UR) setting is preferred for application referencing data
that doesn’t change.
application unit of work must maintain the connection. Large application
workloads that perform too many updates, inserts or deletes within a unit of
work hold on to too many locks and can cause extended back out times when an application
fails and impact DB2 performance. It is very important to have the proper
transaction commit scope, issue appropriate commits to minimize the amount of
locks and amount of work that the database may have to back out. It is also
critical for the applications to reference the database tables and perform
their updates in the same sequence. Referencing the data in same order acquires
and releases locks synchronously, allowing more application concurrency. Since
your application wants to minimize the number of locks and the time those locks
are held, it is always best to do your data updates and inserts right before
your application performs a commit or ends your transaction. This minimizes the
time the locks are held and again provides more workload concurrency.
of this information is standard practice for most applications, within the new
SOA architectures the services may not know much about the unit of work or
connection situation. Within one client SOA architecture, recent research showed
that a particular module had seven different connections active within its
service. The service had several connections; DB2 for z/OS, DB2 for LUW,
Oracle, MQ series inbound and outbound Queues and connections to application
and web servers for AJAX activities. It is a bit much to have all of these
connections within a single service and when some minor changes caused this
module to fail many processes could not function. Also debugging was very
difficult because one connection failure caused all the connection participants
to back out their transactions, causing more locking and data integrity issues.
sure your application handles connections properly because they are expensive
to acquire and impact DB2 performance. Minimize the number of database
activities within a transaction to minimize the locks and understand the number
of connections that are involved within a particular unit-of-work so that you
can get the best DB2 performance from your applications.
Dave Beulke is an
internationally recognized DB2 consultant, DB2 training and DB2 education
instructor.Dave helps his clients improve their strategic direction,
dramatically improve DB2 performance and reduce their CPU demand, saving millions
in their systems, databases and application areas within their mainframe, UNIX
and Windows environments. Search for more of Dave Beulke’s DB2 Performance
blogs on DeveloperWorks and at www.davebeulke.com.
version of DB2 for z/OS comes out there are always the projects that need
justification for going to the next DB2 version. The huge number of DB2
performance features in DB2 Version 9 for z/OS should be an imperative to get
to the DB2 9 or DB2 10 release. This large number of DB2 performance features
in DB2 V9 for z/OS can make a huge difference in your system, database and
large number of performance enhancements in DB2 Version 9 for z/OS also are
accompanied by a large number of operational and availability enhancements that
provide new functionality. The combination of all these new Version 9 features
can make your life as a DBA or an application developer dramatically better.
these features are discussed in my new presentation, “Improving Performance with DB2 V9 for zOS,” that has been posted for download. Also in this
presentation there is a full list of all the Version 9 enhancements that is great
to include as reference for a justification that management wants for your
migration to the latest DB2 Version 9 for z/OS.
Dave Beulke is an
internationally recognized DB2 consultant, DB2 training and DB2 education
instructor. Dave helps his clients improve their strategic direction,
dramatically improve DB2 performance and reduce their CPU demand, saving
millions in their systems, databases and application areas within their
mainframe, UNIX and Windows environments. Search for more of Dave Beulke’s DB2
Performance blogs on DeveloperWorks and at www.davebeulke.com.
In July 2010 IBM announced the zEnterprise with its new integrated IBMPOWER7
and IBM System x blades environments. This provides users with the ability to
run traditional mainframe applications alongside any UNIX and Intel based
applications. These new capabilities provide a one-stop cloud environment for
businesses to deploy and scale any application from any platform.
In August 2010 IBM announced the new deployment options for the DB2
pureScale system to the IBM System x environments. The ability to leverage the
DB2 pureScale environment provides the same multi-member DB2 data sharing
mainframe capabilities within a new SUSE Linux based solution. Testing shows
that the DB2 pureScale environment performs and scales almost linearly, like
the z/OS DB2 Data Sharing mainframe environment.
Providing these DB2 Data Sharing type capabilities within the System x
architecture provides another great performance and scalability option within
the DB2 LUW family. The DB2 pureScale option with its tremendous scalability
provides a great expansion option for existing DB2 LUW systems that are running
out of capacity within their single computing environment footprint. And now
with the ability to deploy them to the zEnterprise, companies that have DB2 LUW
systems can get the reliability and performance management of the mainframe
This DB2 pureScale on System x also provides another DB2 LUW open solution for companies that need to
consolidate UNIX or Intel based environments and applications. So regardless of
the application or system requirements, the DB2 family provides the most open
cost efficient options. Check out the full IBM DB2
pureScale announcement at IBM.
Within the zEnterprise announcement in July 2010 there were several exciting
comments about the new query performance within the IBM Smart Analytics
Optimizer. This smart system leverages the integrated platform, providing
industry leading scalability for data warehouse and business intelligence
Queries that scanned entire terabytes of data that were previously avoided
can now be executed in seconds through the IBM Smart Analytics Optimizer
processing. This capability provides data mining features for quickly getting
answers to the clustering, associations, classification and predictions within
your data warehousing environment.
Within some of the documentation, IBM testing of the Smart Analytics
Optimizer shows huge performance boosts. Queries that scanned and then
subsequently optimized sometimes showed an improvement of 54 times, the cost of
the query 711 times. This type of improvement is very exciting and especially
what new data warehousing and business intelligence workloads need on the
Check out all the
zEnterprise documentation—especially the information on the IBM Smart Analytics
Optimizer. This was just the beginning of a resurgence of the new mainframe
that supports any mainframe, UNIX or windows workload. Known as a company’s
“private cloud,” it allows IT departments to leverage tremendous performance
improvements while reducing query cost substantially.
The zEnterprise IBM computer was announced in July 2010 and it can run any workload. Yes any workload! Your company can now run mainframe, UNIX and some Windows applications within the new environment. The new zEnterprise is designed with performance and capacity for large-scale consolidation of any workload. As has been done recently with consolidating UNIX systems into the mainframe environment, it’s now possible to consolidate the myriad of Windows based systems with the integration of z/VM 6.1. This new computing architecture helps consolidate performance, integrate data better and improve overall management of the computing environment with unified standard performance security and optimization facilities for better overall availability.
The zEnterprise takes consolidation one step further as any workload can be implemented within its standard environment with dramatic energy savings using workload optimizers through zEnterprise’s integration of the IBMPOWER7 and IBM System x blades, allowing the consolidation of diverse application workloads.
This helps deliver the mainframe’s reliability, availability and security to these other platforms while helping your company lower risk, overall computing and energy costs. The new architecture also helps centralize all the data from these diverse platforms to a central hub, eliminating the many copies and remote islands of data. This improves integration, latency and performance as heterogeneous architectures perform more end-to-end enterprise transactions.
Consolidated computing through the new zEnterprise system is just beginning. Now all architectures, systems and applications can truly be evaluated for their performance. Regardless of the operating system, programming language or network protocol, your company can get it all done with the lowest total cost of ownership with 96 of the potentially newest and fastest CPUs (5.2GHz) available on any computer in the world through the new zEnterprise system.
Check out all the zEnterprise information at the links below and the interesting comments about z/VM from PCWorld. Then imagine all your enterprise data integrated into a high performance, energy efficient and optimized system.ibm.com/systems/zenterprise/
Within all database systems there are tables that hold transactions or some type of data based on time. These transactions unfortunately build up over time to handle regulatory compliance, infrequent access or support development of historical and trending reports.
Within your database design, this time-based transaction data builds up quickly and managing it so the volume does not affect performance is critical. Sometimes the frequency or grain of transactions such as web clicks, banking transactions, stock transactions or other transactions build up to terabytes of data per day, quickly overwhelming robust storage arrays.
Separating out the real time transaction data versus the old data within your database requires planning and design steps. These planning and design steps help the separation of the old data versus new data and guarantee application and SQL performance does not suffer when your database is fully populated A separation of the old and new data also helps performance management so more resources can be delegated to maintain transaction performance for business operational success.
DB2 10 Temporal Tables with their built-in functionality automatically understand the business time or system time that the data was entered into the system. This functionality is great for finding out the condition of the business as of a certain time. There are two new BUSINESS_TIME and SYSTEM_TIME table period definitions columns available. These new time period column definitions are used for the new DB2 temporal table definitions to provide system-maintained, period-maintained or bi-temporal (when both system and period maintained) database tables.
Many systems today have manual applications or utilities which migrate their real time data to history tables. These new temporal tables with their built-in system and business time columns can be used in conjunction with a user-defined trigger to automatically migrate transactional temporal table data to another user defined HISTORY table. Having these facilities built into the database greatly improves regulatory compliance, operations and overall performance.
More information can be found on DB2 10 Temporal Tables and other DB2 10 features in the IBM White Paper: IBM DB2 10 for z/OS Beta -- Reduce Costs with Improved Performance.
The North American IDUG Conference and the IOD EMEA conference highlighted all the functionality and features within DB2 10. The more you hear about these functions and features, the more you realize they are really going to save your company money right away.
Justifying a DB2 upgrade is always necessary for management and for the new DB2 10 it is going to be a lot easier with its many CPU-saving features. With all the new features saving from 5-7% right out of the box in conversion mode, DB2 10 offers existing systems and applications immediate savings. Add in all the new options such as Hash Access, Index Include Columns and other features and developer and DBA activities become a lot easier, reducing your overall DB2 CPU capacity requirements possibly up to 20%. Think of getting a 20% decrease in your costs and the costs of the associated third party software and you can see why management is going to push for getting DB2 10 installed as soon as possible.
One of first ways DB2 10 reduces your CPU within your systems and applications, is through a new access path called Range-List Index Scan. This new access path should immediately assist many of your applications that have multiple WHERE predicates that reference the same index. In DB2 9 the access paths scanned the index for each of the WHERE predicate references. Within DB2 10 the Range-List Index Scan scans the index only once to retrieve all the RIDs, saving the extra CPU and I/Os of multiple separate scans, RID consolidation and elimination of duplicate RID entries. This can be a huge improvement for applications that have paging or other types of searching SQL WHERE predicates
When your applications are being prepared for migration to DB2, the plan stability features started in Version 9 and improved with DB2 10 will assist you. Within DB2 10 this enhancement lets the application manager lock down any package for a particular access path. If the access path is changed through an attempt to rebind the package, the process can supply an error or provide a warning within the Bind process. This protects your access paths against access path problems and alerts your developers to a possible problem.
So if you want a 20% decrease in your CPU demand or just want to start thinking about staying up with your competition, start your plans for DB2 10 now
Dave Beulke is an internationally recognized DB2 consultant, DB2 training and DB2 education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand, saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
DB2 upgrades are not the same in any two companies and may not even be the same within the same company. As multiple DB2 systems sometimes support different applications with varying requirements, the Return on Investment (ROI) is completely different. Also, comparing DB2 z/OS or DB2 LUW to Oracle system upgrades also does not even begin to be similar or comparable as all these databases have completely different contract clauses, ROI requirements and upgrade factors. Apples and oranges are both edible, but that is where the comparison and similarities stop.
DB2 upgrades depend on the technology. First-hand experience has shown that bad research, poor database design, poor application development techniques, bad choices in third party vendor tools and an assortment of other factors make ROI equations for upgrades complex or almost impossible. Having helped fix bad technology decisions, redesigned bad database designs and fixed SQL application performance problems, I know that reducing costs is always on everyone’s mind. Upgrading any software especially raises cost consciousness.
To minimize upgrade costs, strive for better understanding and knowledge of the DB2 technology. This will help everyone avoid unnecessary costs and bad decisions. Work on minimizing the DB2 10 upgrade costs through better management of your DB2 systems today. I’ve mentioned the considerations of SMS environments and improving your security definitions to get ready to separate security data access.
This week the suggestion for preparing for DB2 10 is to begin analyzing your DB2 application techniques. The first thing to analyze is how your plans and packages are put together. IBM’s DB2 engineers have been telling everyone to get to DB2 packages within their environment for many releases. In DB2 10 the old plan management technique of DBRMs going directly into plans needs to be replaced by collecting them within COLLIDs and then binding into your DB2 plans. It is a small difference but it has big implications. If your shop still uses the technique of putting DBRMs directly into plans, it is time to start a process to migrate to the new method.
Next, the compliers that your applications utilize also need to be current. DB2 10 changes within the COBOL pre-compiler generate new SQLCA and SQLDA structures along with generating COMP-5 binary data attributes instead of COMP-4 or COMP. Even though there are still some ways to handle old compilers it is best to review your COBOL compiler versions to make sure they will be ready for your eventual DB2 migration.
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.
Whether you are currently using DB2 Version 8 or Version 9, getting ready for all the CPU efficiencies in DB2 10 is easy. Your company may not be thinking about migrating to DB2 10 yet, but DB2 10 is getting huge industry wide support as the cost savings began to be realized by management. Everyone wants to save money and with everyone trying to squeeze their budgets, a faster adoption of DB2 10 is in your future. It is best to be prepared, so here are a few things you can do
in DB2 Version 8 that will make it easier to get to DB2 10.
Embrace System Managed Storage SMS
No, I am not Darth Vader telling you to embrace the dark side, but to let you know that DB2 10 catalog and directory use SMS. Using SMS is a hard thing to do for many companies and personally, I am not a real big fan of SMS. DB2 10 and additional improvements in storage have made me think twice and SMS may be more help than a problem. Being a performance geek, I recommend limiting or not sharing storage volumes within the DB2 SMS definitions. The SMS performance key, if possible, is to segregate the DB2 SMS definitions completely, yes completely, away from other storage definitions. SMS horror stories about the I/O troubles are well known and through my consulting clients I see SMS at the center of many I/O problems. Embrace SMS but be diligent guarding your DB2 system’s I/O performance and especially your DB2 catalog and directory data sets.
Sharpen Your Security
I am a big supporter of compliance, protecting data assets, security audits and all the security team’s efforts for protecting the company and all the business information. DB2 10 offers more privileges and authority granularity in your DB2 systems and security experts will need to embrace these features to separate out your data access, utility operations and day-to-day work. Start auditing your security usage of your systems area, various DBA support groups, different local and distributed applications, and analyze their needs. Do the activities require data access? Could the DB2 DBA work maybe been done using a different security level such as DB2 SYSOPR or another security authority? More DB2 10 security privileges and authority granularity improvements are going to affect your security profiles. Get ready and start the security analysis now.
Learn more about DB2 10 through the replay of Roger Miller’s and my 2010 webcast: DB2 10 for z/OS – Helping you improve operational efficiencies and gain competitive advantage. The webcast replay is available here…
After the original post of “Reminding Management of the Advantages of System Z,” many people commented on how their management is increasingly out of touch with the mainframe. Also, comments also stated that the System Z environment is really processing almost all of the transactions in their company and how all the Windows platform systems continue to have scalability issues.
Although the mainframe revenue for IBM suffered in 2009 because of its upgrade cycle, the introduction of the z10 System platform demonstrates it is the best open system. Yes, that is correct, the mainframe is the most open system available because it runs all types of workloads: the legacy standards of Assembler, COBOL, PL1 etc. but also C++, C#, Java, PHP and the rest of the languages that run on UNIX and Windows boxes.
Also, some people are starting to run “virtualized” windows on the mainframe environments. PCWORLD highlighted this capability in early 2009With the System Z speed, scalability and network, the mainframe continues to be the best solution for all types of workloads. A nice short demo of z/Vos is on YouTube along with many other videos that you can show to your iPhone-obsessed boss, demonstrating that consolidating those hundreds of MS SQL Server instances is also possible.
The story of virtualization continues to drive UNIX consolidation to the mainframe. In 2009 Allianz consolidated 60 servers into a single mainframe, saving substantial operating, licensing and energy costs while improving scalability. This story detailed in this ComputerWorld article is being repeated at many companies as the mainframe IFL, zIIP and zAAP specialty engines continue to bring processing power at PC or minimized prices. This consolidation activity has a very short term return on investment as these efforts pay for themselves usually in the first year and reduce power consumption dramatically making it a “green” project.
The next time the hundreds of windows or UNIX server configurations need an OS, database or other software upgrade make sure to mention how System Z is saving other companies time and operating costs and overall costs through consolidating these environments to the best cost alternative -- the mainframe System Z.
The previous Part 1 and Part 2 discussions highlighted and discussed the various DB2 Data Warehouse DSNZPARMS for z/OS within Versions 8 and 9. With the new DB2 Temporal Tables the system needs all the DB2 Data Warehouse DSNZPARMS to be enabled and all the other new and improved DB2 10 DSNZPARMS for maximizing performance. To ensure the new DB2 DSNZPARMS are enabled, below is a listing of the Depreciated, Improved or new DB2 DSNZPARMS to make sure your data warehouse and other applications get the best performance available within your DB2 10 environment.
Since the storage model of the new DB2 10 system has moved much of the processing components above the 2GB memory line, most of these new and improved DB2 DSNZPARMS deal with new maximum memory settings. If you are monitoring system paging, running on hardware that has enough memory or running on one of the new 196 hardware platforms, leverage the new DB2 memory capabilities and the new hardware as soon as possible.
Remember even with a number of the DB2 10 components moving above the 2GB memory bar, there are still many components (sometimes as much as 25%) below the bar. Monitor your system paging and the size of DB2 memory footprint and adjust your settings incrementally and carefully.
Remember that monitoring, analyzing, improving and repeating in small increments is the best way to provide the best DB2 Data Warehouse DSNZPARMS and a stable high performance environment.
To access the DB2 10 DSNZPARM data warehouse chart in PDF format, please go to DB2 Information on www.davebeulke.com.
Previously we talked about the first alphabetic group of DB2 data warehouse DSNZPARMS that can improve your access paths and overall application performance. This week the second set of DSNZPARMS are discussed. Many of the data warehouse DSNZPARMS discussed are somewhat hidden within the regular DSNZPARM install panels. All of these DSNZPARMS discussed are available in DB2 for z/OS DB2 Version 9. Some are available in DB2 Version 8 or DB2 Version 10.
Caution needs to be taken with all system settings and especially these data warehouse DSNZPARMS. These DSNZPARMS are meant to change access paths and improve them, but each data warehouse design is unique along with each application access path, so results will vary. If the data warehouse DB2 subsystem is shared with other OLTP or operational applications, I highly recommend fully documenting and setting up a full PLAN STABLITIY plan and package management structure for your current access paths before changing any DSNZPARMS. This documentation along with a good PLAN STABILITY DB2 plan and package management implementation and back out practices helps your environment quickly react and back out any detrimental access paths encountered through unexpected rebind of any program.
Some of the comments I’ve received regarding this issue highlighted the resurgent of data warehousing on the z/OS platform and why running a data warehouse on z/OS provides many advantages over other platforms. One that was noted from several people is when your data warehouse runs on z/OS, the huge ETL processes usually don’t have to transmit the data over a network. Even though the network bandwidth is robust, avoiding this extra bottleneck can sometimes save hours of extra overhead, guaranteeing that your refresh data jobs have enough time every day to provide critical refreshes of you data within your data warehouse.
Additionally most of your source data warehouse data comes from the z/OS operational systems and can quickly be put into operational business intelligence data warehouses. This fresh data increases sales, provides real time inventory or product availability updates and, the most important factor, removes latency for all your critical single point master data source of record for the enterprise.
Improve your system and application performance by adjusting these data warehouse DSNZPARMS to improve your access paths and by using the superior DB2 optimizer technology and most efficient performance available.
Previously, I’ve talked about the new DB2 10 temporal tables and how they are great for data warehousing applications. To leverage the temporal tables and data warehousing applications within the DB2 for z/OS environment the system needs the proper DSNZPARM settings. Since everyone has too many systems, some organizations only review system DSNZPARMs during migrations, meaning that many of the new settings might not be set up or enabled. It’s a good idea to review the DSNZPARMs on a regular basis and get them set up to maximize the performance of your data warehousing application SQL by leveraging all the DB2 optimizer capabilities.
First, get a listing of the current DSNZPARMs for your data warehouse system’s settings. This can be done a number of different ways: through your performance monitor, through the JCL that creates the DSNZPARMs or through the IBM REXX exec of the DB2 stored procedure DSNWZP. This DSNWZP stored procedure routine is one of the many DB2 administration procedures and is set up during the system install.
The data warehouse DSNZPARMs affect the performance of the DB2 system and its applications in many different ways, from the DB2 optimizer’s access path choices to the number of parallel processes DB2 will perform during data retrieval. To make sure that all DB2’s performance features and functions are available to improve performance I've created an alphabetic list of DB2 data warehouse DSNZPARMs. Making sure these data warehouse DSNZPARMs are enabled and adjusted for your system will help your application fully exploit DB2 and get the best performance possible for your data warehouse.
Well the votes are in, 22 billion rows is big enough data. It’s not the billions of web logs rows of a Google or Facebook but its big enough for everyone. One of the comments that struck me was that one in a million happens 22,000 times. So whatever your criterion is for big data, it is more a state of mind about the amount of data as opposed to the actual terabyte amounts or the number of rows. Regardless of what database systems you work with, big is a relative term. Just ask your SQL Server, Oracle and Sybase DBA friends what they consider a big system. Usually the answer is nowhere near what you get for DB2 z/OS or even DB2 LUW systems. I talked about this a last year in my blog ("Performance is Relative").
Other comments and questions received about last week’s blog asked for more clarification on the idea of keeping a database design simple. So below are three different ways to keep your big data data warehouse design simple.
First: There are reasons Bill Inmon’s and Ralph Kimball’s decentralized and centralized data warehouse ideas are so popular, those design patterns work. Design patterns for all types of IT applications, Java/.NET MVC (model view controller), various business models and standard processes have been extensively analyzed and endorsed over many years through the design pattern books, conferences and government studies. The decentralized and centralized data warehouse design patterns work and your design should use them for your data warehouse performance. Big data or not, there is no reason to do something more complex. Starting with these types of design patterns, using and optimizing simple Fact table(s) surrounded by Dimension tables(s) design pattern will provide you data warehouse performance. Decentralize or extend these design patterns with as many Fact tables and slow moving Dimension tables will optimize and minimize the amount of big data referenced in typical transaction and your data warehouse performance won’t be an issue.
Second: Make sure to normalize your big data design. It’s typical to try to consolidate everything within a data warehouse performance design. Unfortunately having too many elements in a table forces too much data into an application transaction and data warehouse performance can suffer. Just as decentralized and centralized data warehouse performance design patterns have been used for years, database table normalization has been around for even longer because it logically optimizes your design. The database design normalization process has been documented everywhere over the years and it is effective for making sure the table elements relate to the whole database table key(s). Combining table keys or designs causes excessive repeating data or data groups and over-normalization leads to excessive application joins. Normalization is striking a balance and no one does it perfectly the first time. Normalize your data warehouse performance design several times and your transaction performance can strike a balanced performance for all critical applications.
Third: Design, test, redesign test and repeat. Schedule enough database and application meetings and testing time to understand all the application transactions and reports. Data warehouse performance and modeling big data can get unwieldy, so testing your design early is vital. Sometimes big data table population numbers cause tools to abort. Cut the number of zeros down and model all your application transactions against your database design, build it and run applications against it. Data warehouse performance requires real life testing and actual running of the code or SQL that interfaces with the design. No one has any time to do it perfectly but everyone will be mad if you have to redesign and do it all over right before implementation. Know your performance before production through thorough testing. Big data and data warehouse performance requires design and testing. Make sure to do these several times during your development with as much big data as possible.
These are only some of the simple truths that insure your data warehouse performance for your big data system is a success.
Was 22 billion rows Big Data? As I began the new data warehouse project, I knew that the data warehouse performance needed to be designed from the very first data element. The new project was very interesting with complex calculations, cross referencing reports and Big Data implications and big data warehouse performance challenges. Many standard reports that affected whole industries and government policies were depending on data warehouse performance for timely reports and 100% not even 99.999% accuracy of every bit of the Big Data.
Another consideration was the implementation deadline for this high performance Big Data data warehouse system was only 120 days away and wasn’t movable. All of these reasons made this Big Data system quite the challenge. Our small team of business users, SQL analysts and technical staff were committed and we were successful in completing the implementation using agile scrum development techniques. The data warehouse performance was so improved that some of the processes went from 37 hours to under 7 minutes and some reports come back in seconds. Complex calculations and report verification done during our data quality testing found bugs in the old system’s existing reports that had been trusted for over 20 years, so we knew everything was 100%. Our overall performance, 100% accuracy and overall quality implementation were a testament to the commitment of a winning team effort.
There were many success factors that drove the team, the design, development and successful implementation of this 22 billion row Big Data system. Three success factors drove the data warehouse performance.
First: Keep it simple. Yes I know everyone talks about keeping it simple in every development and system design. Designing it and doing it simply can be much harder. Instead of endless discussions and meetings, design the big picture and then see how all the reports, every type of interface and users interact with the warehouse. Data warehouse performance is achieved through computing power. Simple designs that leverage parallelism, that drive I/O and calculation computing power for quickly producing reports are key.
Second: Leverage all existing infrastructure and processes. The tight 120-day schedule from start to finish did not allow the team to have issues, only time delayed solutions. Since we used an agile scrum type of development methodology the agile scrum stories and the data warehouse performance of the Big Data reports were tested several times during the short development period. All used the same standard corporate testing procedures, documentation and approval processes that had been used for years. The team contacted everyone that needed to guarantee the data warehouse performance, got them involved early and got their requirements, feedback, improvements and eventual approval for all the components and implementation. Everyone knew the Big Data project schedule ahead of time so all infrastructure and processes were ready. Issues were worked on until solutions were approved and implemented.
Third: Don’t be intimidated by Big Data. I have been fortunate to design, develop, deploy and study data warehouse performance since back in 1988 when I implemented my first “Big Data” system for the financial firm E.F. Hutton. At least back then it was considered Big Data and it seems that all the other data warehouse systems I have dealt with over the many years have had bigger data then the previous one. Big Data will keep getting bigger and it is just another couple of zeros on the end of your table population and design considerations. Data warehouse performance is built on the same principles as many years ago. As I mentioned with the first success factor, design for both I/O and CPU parallelism in every single item of your system. Only then can your Big Data system have the data warehouse performance it needs to speed up your processes and reports from hours to just minutes or even seconds making everyone, especially your users, happy