Table or Range partitioning is a powerful feature of DB2 that facilitates good database design principles that can help lead to easier maintenance operations, increased data availability and more optimized queries.
But why is table partitioning so good in a warehousing environment? Here are some reasons:
1. Range-specific maintenance operations
Where data partitions (ranges of data within a table) are placed in individual table spaces, maintenance operations can be targeted at active data only.
Many DB2 commands, for example REORG and BACKUP, can be specified to execute against specific table spaces or data partitions. This can significantly reduce maintenance times.
2. Data lifecycle management
Table partitioning large fact tables by date means that older data in a table can be detached as an online operation. This can help eliminate the need to perform costly delete statements.
In addition, as data ages, it can be moved as an online operation to less costly storage. In db2 v10 this multi-temperature data management is facilitated by the new storage groups feature.
3. Partition elimination
Range Partitioning benefits queries where the query spans one or a subset of the range partitioning key.
The DB2 optimizer can then eliminate entire data partitions from the query, and this reduction in rows/read (I/O) can help increase query performance.
4. Local indexes
Local indexes can help to significantly reduce index maintenance and increase query performance where significant sorting is not required.
In addition, local indexes can be placed in separate table spaces which provides more flexibility in building a backup schedule and a recovery strategy.
For example, in a restore scenario you have the choice between restore or rebuild of indexes.
5. Backup performance
Backup performance can be improved by backing up just those table spaces (table ranges) that are active.
By balancing the average size of your table spaces, parallelism within the BACKUP operations can also increase, helping to reduce the elapsed time of your backup operations.
These and other warehouse design issues are discussed in our many papers on warehousing. If you have any comments or experiences you would like to share with the authors, please leave a comment.
The table below, an excerpt from the best practices paper “Transforming IBM Industry Models into a production data warehouse
” describes guidelines for implementing an intelligent table space design strategy that gives you the flexibility you need to meet your service level objectives for all workloads; not just query, but backup, archive, maintenance, recovery, and ETL.
For this and more best practice guidelines for data warehousing visit DB2 LUW Best Practices on developerWorks
You can use DB2 silent installation and uninstallation to install or uninstall DB2 products and components without user interaction. Silent installation is useful for large-scale deployments of DB2 product editions. It is also useful when you need to embed the DB2 installation and uninstallation processes within the installation process of solutions that include DB2 products.
This paper covers the following tasks:
- silent DB2 installation and uninstallation
- silent DB2 fix pack updates
- silent DB2 upgrades to later product versions.
In addition to providing detailed recommendations for each task, the paper also includes practical scenarios to help you implement silent DB2 installation and uninstallation in your own environments.
Share your impressions and questions about this paper by adding a comment on the paper's web page: https://ibm.biz/Bdx8Hr
You will need to login to developerWorks with your IBM ID first.
The best practice paper Managing data growth
provides a wealth of recommendations to help you design and manage a database environment for efficient data growth, including tips on how to choose the right distribution key for a partitioned database.:
Database partitioning helps you to adapt to data growth by providing a way to expand the capacity of the system and scale for performance. A distribution key is a column (or group of columns) that is used to determine the database partition in which a particular row of data is stored. The following guidelines will help you to choose a distribution key.
- Choose the distribution key from those columns having the highest cardinality. Unique keys are good candidates. Columns with uneven data distribution or columns with a small number of distinct values might result in skew, where query processing involves more work on a subset of database partitions and less work on others.
- Choose the distribution key from columns with simple data types, such as integer or fixed length character; this will improve hashing performance.
- Choose the distribution key to be a subset of join columns to facilitate join collocation.
- Avoid choosing a distribution key with columns that are updated frequently.
- In an online transaction processing (OLTP) environment, ensure that all columns in the distribution key participate in transactions through equality predicates. This ensures that an OLTP transaction is processed within a single database partition and avoids the communication overhead inherent with multiple database partitions.
- Include columns that often participate in a GROUP BY clause in the distribution key.
- Unique index key columns must include all of the distribution key columns. The DB2 database manager creates unique indexes to support unique constraints and primary key constraints, and these system defined unique indexes have the same requirement that the constraint key columns must be a superset of the distribution key.
If you have any comments or questions for the authors of this best practice paper, feel free to log a comment on the paper's summary page and we will respond. You need to login with your IBM ID to be able to enter comments. Registering your Id is free and easy at developerWorks.
For this and more best practice guidelines for managing DB2 products, visit DB2 for Linux, UNIX, and Windows Best Practices
A new supplement to the popular DB2 best practices paper "Implementing DB2 Workload Management" has just been published. The supplement will help you set the DB2 client information fields for a variety of common middleware applications.
You can find it, along with other useful supplements, on the paper's information web page:https://ibm.biz/Bdx2n6
The DB2 client information fields are available on each connection to a database. These fields enable an external application that is using a connection to provide additional information to the DB2 database server that can be used to discriminate among connections based on end-user identification. The values in the client information fields are reported by DB2 for Linux®, UNIX®, and Windows® and other members of the DB2 family through various database monitoring and auditing interfaces. They are also leveraged by the DB2 workload definition in DB2 for Linux, UNIX, and Windows Version 9.5 and later as another way to aggregate connections to the database for purposes of monitoring and control.
Share your impressions and questions about the paper and supplements by adding comments to the web page (you need to join developerWorks and login first).
You are a busy professional and you don't always have the time and resources to travel to a technical conference, or call into a live web presentation, to listen to technical experts give great presentations about the products and technology you care about. Recorded webcasts offer you the benefits of listening to the same experts, at your own convenience, at the office or at home.
In this new DB2 best practices webcast, DB2 pureScale performance and monitoring
, watch and listen as Steve Rees, Senior Technical Staff Member and performance expert at the IBM lab, explains a wide array of configuration and tuning best practices to make your DB2 pureScale environments perform in an optimal fashion.
The tips and techniques presented in this webcast reflect information validated through the DB2 team's internal performance testing, as well as performance benchmark tests and customer engagements in real life DB2 pureScale environments.
In addition to the webcast video, you can download the presentation slides and transcript for easy offline viewing and reading.
If you have questions for the authors please add a comment to this blog or on the webcast's web page.
The examples in this paper are based on DB2 V10 fix pack 2 and GPFS 188.8.131.52 efix13 installed on AIX 6.1 TL6 SP5 but can be extended to more recent versions and other supported platforms. All versions of GPFS are supported with DB2 for Linux, UNIX, and Windows, however, the latest supported fix packs are recommended to ensure the best quality experience.
Technical paper summary:
In today’s highly competitive marketplace, it is important to deploy a data processing architecture that not only meets your immediate tactical needs, but that also provides the flexibility to grow and change to adapt to your future strategic requirements. To help reduce management costs, add flexibility, and simplify the storage management of your DB2® for Linux®, UNIX®, and Windows® installation, you need to choose a file system that is designed to provide a dynamic and scalable platform. The IBM® General Parallel File System™ (GPFS™) is a powerful platform on which to build this type of relational database architecture. This paper describes why GPFS is the right file system to use with DB2 databases by outlining the benefits and providing best practices for deploying GPFS with DB2 software. In addition, a section has been added to this paper to describe the DB2 pureScale feature, and how it configures and uses GPFS.
The new video from the DB2 team, "Getting up and running with HADR", provides a demonstration of how straightforward it is to set up HADR.
As we set up HADR in the video, we provide insight into some of the more important configuration decisions we are making, hopefully heading off some of the more common issues users face when setting up HADR.
Announcing a new best practice paper: "Building a data migration strategy with IBM InfoSphere Optim High Performance Unload
This paper addresses the topic of data migration and how you can use HPU to build a data migration strategy that can be scheduled to be migrated, automatically, from source to target database with no manual steps.
No longer do you have to grapple with reserving large amounts of storage capacity on the source or target database to stage data; no longer do you have to worry about preserving identity (surrogate) keys; no longer do you have to worry about generating subsets (ranges) of data to be migrated; and no longer do you have to worry about different DB2 software levels or distribution maps.
This newly published and second paper on HPU, the first paper looked at using HPU as part of a recovery strategy
, looks at how you can build and implement a data migration strategy using HPU. In testing the recommendations in this paper, we used both an IBM Smart Analytics System and an IBM PureData for Operational Analytics System.
If you have questions for the authors please add a comment to this blog or against the relevant paper.
We are happy to announce the publication of a new best practices paper to help you understand and get the most from the new BLU Acceleration technology introduced in DB2 for Linux, UNIX, and Windows V10.5: Optimizing analytic workoads using DB2 10.5 with BLU Acceleration (https://ibm.biz/BdDrnq)
BLU Acceleration is a new collection of technologies for analytic queries that are introduced in DB2 for Linux, UNIX, and Windows Version 10.5. At its heart, BLU Acceleration is about providing faster answers to more questions and analyzing more data at a lower cost. DB2 with BLU Acceleration is about providing order-of-magnitude benefits in performance, storage savings, and time to value.
This paper gives you an overview of these technologies, recommendations on hardware and software selection, guidelines for identifying the optimal workloads for BLU Acceleration, and information about capacity planning, memory, and I/O.
Welcome to the new DB2 Best Practices developerWorks community group.
You will find here technical papers, videos, and webcasts that offer expert advice on a variety of topics, to help you get the most out of your DB2 solutions, including DB2 for Linux, UNIX, and Windows, DB2 pureScale, and InfoSphere Warehouse products.
The best practices are written and tested by teams of technical experts who work in the IBM DB2 development and quality assurance teams, as well as with customers just like you, to determine and document the practical recommendations that can help you save time and resources on your information management projects.
If you are already familiar with our DB2 best practices on developerWorks, this new group is the next step and replaces the previous site. It includes all the great technical papers that we have published in the past, with the addition of new papers and videos on an ongoing basis.
Although you can already browse and download all the best practices in this group, we encourage you to join the group for added interaction with our teams. As a member, you are able to rate individual papers and leave comments about them, to help us make the best practices better. You can also subscribe to individual pages so that you will get notified when they get updated or when new comments are added.
If you like a particular best practice, you can also use developerWorks' sharing feature to easily share pages with friends and colleagues on a variety of social networks such as Facebook, Twitter, and LinkedIn.
One of our most popular best practices paper is now completely revised and updated to provide recommendations for the latest DB2 environments, including DB2 V10.1, DB2 V10.5, and PureData System for Operational Analytics.
Tuning and monitoring database system performance (https://ibm.biz/Bdx2nt) is available for download from our DB2 best practices community.
Most DB2 systems go through something of a “performance evolution”. The system must first be configured, both from hardware and software perspectives. In many ways, this sets the stage for how the system behaves when it is in operation. Then, after the system is deployed, a diligent DBA monitors system performance, in order to detect any problems that might develop. If such problems develop, we come to the next phase – troubleshooting. Each phase depends on the previous ones, in that without proper preparation in the previous phase, we are much more likely to have difficult problems to solve in the current phase.
This paper presents DB2 system performance best practices following this same progression. We begin by touching on a number of important principles of hardware and software configuration that can help ensure good system performance. Then we discuss various monitoring techniques that help you understand system performance under both operational and troubleshooting conditions. Lastly, because performance problems can occur despite our best preparations, we talk about how to deal with them in a step-wise,methodical fashion.
We have just published a new best practices paper for IBM Smart Analytics System and IBM PureData System for Operational Analytics customers: Performance monitoring in a data wartehouse.
This best practices paper covers real-time monitoring of the IBM Smart Analytics System and IBM PureData System for Operational Analytics. You can apply most of the content to other types of clusters of servers running a data warehouse system with DB2 software and database partitioning under AIX and Linux operating systems. The focus of this paper is finding the reasons for performance problems. These can be bottlenecks that are in the operating system, are in the DB2 database software, or are related to a single query. The focus is on data warehouse systems with long-running queries rather than transactional systems with mostly short queries.
A main goal of this paper is to provide a set of key performance indicators (KPIs) or metrics for the operating system and DB2 software, along with a methodology for analyzing performance problems in a distributed DB2 environment. This paper describes scenarios to help you gather the right information depending on the symptoms of the performance problem.
This paper first provides an overview of the approach and what to consider in general when monitoring the performance of a data warehouse. It then describes the most important operating system and DB2 metrics for multiserver data warehouse systems. The last section describes in detail several performance problem scenarios that are related to data warehouse or BI workloads and explains how to use the metrics for analyzing the problems.
Most of the information about KPIs that are described in the paper has sample commands that extract actual values. However, these examples are not intended to provide comprehensive tooling. You can use this best practices paper as a guideline for selecting the metrics to focus on when using monitoring tools such as IBM InfoSphere® Optim™ Performance Manager.
We are pleased to announce the release of a new DB2 best practices paper: Troubleshooting DB2 servers
Even in a perfectly engineered world, things can break. Hardware that is not redundant can fail, or software can encounter a condition that requires intervention. You can automate some of this intervention. For example, you can enable your DB2 server to automatically collect diagnostic data when it encounters a significant problem. Eventually, however, a human being must look at the data to diagnose and resolve the issue. When the need arises, you can use several DB2 troubleshooting tools that provide highly granular access to diagnostic data.
The information and scenarios in this paper show how you can use the DB2 troubleshooting tools to diagnose problems on your server.
In large database environments, the collection of diagnostic data can introduce an unwanted impact to the system. This paper shows how you can minimize this impact by tailoring the values of a few basic troubleshooting configuration parameters such as diagpath, DUMPDIR, and FODCPATH and by collecting data more selectively.
The result? When things do break, you are well prepared to make troubleshooting as quick and painless as possible.
The following DB2 troubleshooting scenarios are covered in this paper:
Troubleshooting high processor usage spikes
Troubleshooting sort overflows
Troubleshooting locking issues
For each scenario, this paper shows you how to identify the problem symptoms, how to collect the diagnostic data with minimal impact to your database environment, and how to diagnose the cause of the problem.
The target audience for this paper is database and system administrators who have some familiarity with operating system and DB2 commands.
This paper applies to DB2 V10.1 FP2 and later, but many of the features that are described here are available in earlier DB2 versions as well.
Help us provide you with the best practices you need the most.
What business scenarios do you need to implement with DB2 products that you need practical recommendations to help you?
What issues are you facing for which you have not found practical recommendations to help solve them?
Let us know what topics would be the most helpful for us to write and publish best practices in this community. Simply add a comment to this blog post with your ideas. The more detailed the better!
We will review and evaluate them with our product experts, so that we can prioritize our projects to help as many of you as possible.
We cannot promise to publish best practices that will answer all of your questions, but we will definitely review all of them so that we can prioritize our efforts.
P.S. You will need to login with your IBM developerWorks userid in order to add comments.
Are you also a DB2 for z/OS user? If so, don't miss the latest best practices webcasts published by the DB2 for z/OS team:
Check out these new titles and many more at the DB2 for z/OS Best Practice community --https://ibm.biz/BdxkKb
Have you downloaded and read any of the DB2 LUW best practices papers and videos in the past year? Has any of them helped you solve a DB2 problem, implement a new scenario, or helped you learn something new? If so, which ones?
Let us know what you think! Post your comments here to help us continue to write and publish best practices that are useful to you.
We are pleased to announce the release of a new DB2 and Optim best practice webcast: Deep-dive performance analysis ysing InfoSphere Optim Performance Manager V5.3
This new webcast by IBM DB2 preformance expert Steve Rees, provides an in-depth tutorial to help you use IBM InfoSphere Optim Performance Manager V5.3 (OPM) to diagnose, analyse, and correct potential performance problems with your DB2 system. Some of the topics covered in the webcast include:
Overview of the new and improved features in OPM V5.3
Setting a baseline for performance metrics
Investigating system bottlenecks
Recognizing a system-level disk bottleneck
Recognizing a system-level CPU level bottleneck
Recognizing a system-level locking bottleneck
Drilling down – diagnosing slow queries with OPM
... and more.
Let us know what you think by leaving feedback on the webcast's description page.
Modified by sboivin
IBM Knowledge Center Open Beta is available!
We are very happy to announce the availability of our open IBM Knowledge Center Beta, live on ibm.com.
You can access IBM Knowledge Center here:
The Beta will run until the end of February 2014.
Improving your technical content experience
IBM Knowledge Center is our new technology designed to bring IBM's technical publications together in a single location, and will replace our individual IBM Information Centers.
For this release, we simplified the user experience, improved search, and refined the overall experience with many other enhancements. As always, you can get help on IBM Knowledge Center from the information icon in the upper right corner of the pane (also linked here):
Send us your feedback!
After you've worked with IBM Knowledge Center, sign in with your IBM ID and take a few moments to complete the survey on IBM Knowledge Center located here:
Known Beta limitations
We are still:
Fine tuning IBM Knowledge Center, so you might experience some minor functional issues
Configuring and adding content to IBM Knowledge Center, so the content you see might not be exactly what you expect
Configuring and indexing content for search, so search results might not be exactly what you expect, or might not be in all the languages you expect.
Modified by sboivin
If you are currently using IBM PureData System for Analytics, powered by Netezza technology, or are interested in learning more about it, you will be happy to know that we have a new best practices community for it:
In this community you will find a number of useful articles and papers to help you learn about the product's features, and get valuable recommendations and tips to get the most from it.
Provide us with feedback by leaving comments after you have downloaded and read the papers.
We are pleased to announce that the best practices Writing and tuning queries for optimal performance has been updated.
The old PDF paper has been replaced by a roadmap to the most up-to-date information in the DB2 V10.5 Information Center. The best practices information covers the most current DB2 V10.5 products.
A updated version of the best practices paper entitled "Expanding an IBM Smart Analytics System database and redistributing data" is now available: https://ibm.biz/Bdx2eF
The paper incorporates updated information based on user testing and additional information for IBM PureData System for Operational Analytics.
The trend in programming today is towards greater diversity in datastores that can be applied to a broad set of applications. Developers and Data Architects require the ability to not only work with traditional relational databases but also with document based databases
A Meetup is scheduled for November 6 at the Mandalay Bay Convention Center in Las Vegas to highlight the significance of open interfaces and open source in the vibrant and rapidly evolving world of NoSQL, MongoDB, Big Data in the Cloud. Come meet with us to learn how open technologies are changing the face of computing and how they participate in the evolving open architecture
This is a three hour event with a panel, demos, lightning talks, stimulating discussions, networking and refreshments. Register now for the Big Data Developers Meetup and, after registering, you will see the meetup location at the Mandalay Bay Convention Center in Las Vegas.
More information on the meetup can be found at: http://bit.ly/MeetupIOD
Interact with industry experts. Challenge your knowledge of open technologies. Join the discussion.
Come to the first Cloud Foundry Meetup in the Waltham area this coming Wednesday, December 11th!
This meetup is your opportunity to learn more about Cloud Foundry and meet people excited about the technology.
On the agenda is an Introduction to Cloud Foundry: the technology and the community by Chris Ferris of IBM.
This will be followed by a talk by Renat Khasanshyn of Altoros on Implementing Cloud Foundry 2.0.
More information at: //bit.ly/1azS5PX
Access to IBM DB2 information centers will be redirected to IBM Knowledge Center at ibm.biz/IBMKCgo very soon. We encourage you to explore IBM Knowledge Center now, as we are nearing the end of its open beta period.
If you haven't already heard, IBM Knowledge Center offers many benefits that clients are excited about. One of the biggest causes for excitement is that the product documentation for all IBM products is now on one website rather than on 800 or more separate information centers. The interface lets you search, filter, and browse through this information efficiently so that you aren't slowed down by the volume of information that is there.
So, if you haven't had time to check out IBM Knowledge Center, take some time to try it this week. You can find useful information about how to get the most out of IBM Knowledge Center by following this Technical Content blog @ ibm.biz/IBMKCTCBlog.
After you have spent a little time exploring your favorite IBM product documentation in IBM Knowledge Center, you are invited to take a short survey about your first impressions: https://www-950.ibm.com/survey/oid/wsb.dll/s/ag554. We will use feedback that we get from you and other respondents as we consider possible future enhancements to the IBM Knowledge Center user experience.