DFSMS Striping was first introduced in 1993 and it was a success. Striping has served a dual purpose. The first purpose was to avoid hot spots on individual disks, and the second purpose was to increase sequential throughput for individual data streams. Since 1993, alternative techniques have been developed to avoid or dissipate a “hot spot” . Some examples:
Consequently, DFSMS striping is perhaps not needed or is not the best technique for avoiding hot spots.
IBM DB2 for z/OS Development used to recommend striping because parallelism helped over slow channels, but channel speed is now so high that the disadvantage of doing more I/Os tends to outweigh any advantage of parallelism.
Just because one particular workload might benefit from log striping, does not mean that no workloads will suffer.
DB2 uses various data transfer sizes. In all current versions of DB2, dynamic prefetch reads at least 128K, but DB2 utilities read or write at least 256K. Depending on the buffer pool size, sequential prefetch may read up to 256K and utilities may read and write up to 512K. However, if a DB2 query or Insert stream or utility is CPU bound, then striping will not make it faster. This is a common situation, except when creating or restoring image copies are concerned.
Striping also will not help if any of the components in the data paths are saturated. These paths consist of channels, host adapters, and the bus in each control unit. Take for example the DB2 RECOVER utility. When you want to recover an individual table space, the paths are not saturated and striping will reduce the time to restore the table space. However, if you want to recover the entire system, there will be many parallel RECOVER jobs which will use parallelism, and the paths may get saturated, in which case striping may well not reduce the time to restore the system.
A category of I/O that can possibly benefit from striping is list prefetch. Log apply makes extensive use of list prefetch. If a table space is striped across a single RAID rank, list prefetch may possibly improve modestly if the devices consist of HDDs , and may show more improvement if the devices consist of SSDs. Heavy parallelism tends to reduce the striping benefits with HDDs, but striping continues to do well with SSDs under heavy amounts of parallelism.
To summarize the value of striping for table spaces, the most benefit may occur with list prefetch when using SSDs, and the applications that benefit the most will likely be long running queries and utilities. Striping may help with RECOVER: restore from image copies, and to then apply log records. However, everyday workloads will not necessarily get any benefit from striping.
During normal OLTP operations, DB2 active logs are written, and are read only to archive the log records and for software replication. Most log writes are synchronous I/Os and each I/O consists of a small number of 4K pages. Faster channels has increased log bandwidth. However, most OLTP workloads never push the log throughput beyond 10 MB/sec. I/O response times for online transactions gradually degrade when log buffers queue up waiting for the previous I/O to complete.
As technology has evolved, many customers have used remote DASD replication to protect themselves against disasters. Remote replication is often the cause of I/O performance problems. Since all log writes caused by OLTP are synchronous, log I/O is very susceptible to problems caused by remote DASD replication. Each I/O introduces a risk of another delay, and since striping causes more I/Os, striping increases the risk of a delay. Remote replication is not the only thing that causes stress on a control unit, but it is one of the biggest causes.
Striping the active log may help the performance of active log reads as from DB2 9 as the I/O quantity for active log reads was increased from 12 pages to 120 pages. However, given 4 Gbps and faster links, the risks of striping the active logs will likely outweigh the risks - see below.
DFSMS striping of the DB2 active log is no longer a tuning option that DB2 for z/OS Development is actively testing and recommending. There have been 'hole in the recovery log stream' issues with the use of striping with DB2 10 for z/OS in failure scenarios involving albeit in a tiny number of customers. The problem is that there is no encapsulation of a striped I/O i.e, some stripes make it and some may not when there are failures. DFSMS striping has always worked that way. So we in DB2 for z/OS Development are now actively discouraging customers from using striping when the benefits are speculative at best. For customers who are already striping active log datasets there is performance risk in removing striping, but on the otherhand the chance of hitting above problem is very rare. But nevertheless there is risk. There is also risk in using striping with table spaces with large page size i.e., torned page as not all the stripes made it
As has been true in the past, hardware technology is forever changing and the value of DFSMS striping will probably continue to erode. Just because one particular workload may possibly benefit, does not mean that no workloads will suffer.
Many customers experience problems when allocating large datasets, because the datasets often end up being allocated in many extents. This can affect performance and availability as high frequency extent allocation or in extreme cases lead to extent allocation failure. The following is a partial rework of information provided by John Campbell, IBM DB2 for z/OS Distinguished Engineer, in response to a real customer problem.
Here is the story:
Now, what follows is NOT meant to sound like a sales pitch, but the correct technical solution is to add more zIIPs. It would be prudent to have the zIIPs run in the 30-50% CPU busy range on average (peaks would obviously be higher). zIIPs should be thought of as assist processors and they are not intended to be run as hard as GCPs. This recommendation is not specific to DB2 10, but with DB2 10, it has become even more critical. This is the option that Customer X decided to take - they have now refreshed their processor technology and increased the ratio of zIIP engines to GCP engines. Consequently the original elapsed time performance problems have gone away.
Michael_D. 1100004WAH Visits (814)
The technical white paper, IMW14819USEN-00, IBM DB2 11 extended LRSN and RBA format support enhances availability and addresses business risk, has been published.
Whether you are installing a new IBM DB2 application, designing your recovery strategy or preparing for increased workloads, your plans must account for the impact on the DB2 log. This vital feature keeps a record of every change to data within DB2 and provides the functionality needed for DB2 restart, transaction backout, and data and object recovery. Since the release of DB2 Version 1 for IBM MVS™, an RBA has been used to identify the location of records within the DB2 log. The original design of the RBA, as a 6-byte value, meant that DB2 could address up to 256 GB of log data. Thirty years ago, that amount of data seemed huge, enough for the foreseeable future.
However, over recent years, it has become clear that the amount of data stored in DB2 has increased exponentially, as has the transaction volume and the number of concurrent users. At the same time, disk performance and capacity have evolved dramatically to accommodate the volume of data and to allow applications speedy and reliable access to it, with increased storage capacity and improved data transfer times. With the introduction of DB2 10 for z/OS and the elimination of almost all virtual storage constraints, customers have taken the opportunity to vertically scale their DB2 systems even further.
These developments have made it clear that the current design of the DB2 log with a 6-byte RBA—and additionally a 6-byte LRSN in data sharing—no longer provides sufficient log capacity. The solution to this problem, as described in this white paper, is to extend the RBA and the LRSN to 10 bytes. This optional feature was delivered in DB2 11 for z/OS, giving customers the ability to choose whether to implement the extended RBA and LRSN during the lifetime of DB2 11 for z/OS.
This white paper was written primarily for DB2 for z/OS systems programmers and DBAs who are responsible for implementing the extended RBA and LRSN. It is also intended for IT executives and project managers who are involved in the decision-making and planning processes.
agburke 060001QPDN Visits (1334)
DB2 10 re-introduced the ability for remote connections to benefit from the dynamic JCC packages being bound with RELE
1) The package that the remote application will use must be bound with RELEASE(DEALLOCATE) i.e. SYSLN200 for cursor stability without HOLD and SYSLH200 for cursor stability WITH HOLD. We would not suggest rebinding all of the dynamic JCC packages that are used by all applicaitons to use DEALLOCATE. You should bind a copy of the packages under another collection (other than NULLID), and use the currentPackageSet special register to point those privileged applications to those packages.
2) To enable or disable DDF from treating the packages as DEALLOCATE you would use the -MODIFY DDF PKGREL(BNDOPT) command to turn on and -MODIFY DDF PKGREL(COMMIT) to turn off the high performance DBAT behavior. This allows you to force the packages to behave as RELEASE(COMMIT) when you need DDL, BINDs, or utilities to be able to break-in.
As a safety valve the high performance DBATs could only be reused 200 times before they were destroyed, in order to avoid thread footprint growth, and release certain locks.
None of this is new, but IBM released PI20352 (UI19900) mid-2014 to help scenarios where the connections were dropping before the thread was reused 200 times. When this occurred the DIST non-preemptible SRB cpu time could be driven up and the performance benefit of high performance DBATS would degrade. PI20352 allowed high performance DBATs to be pooled, and reused.
However it was discovered that when these pooled DBATs were reused by a new request from a different application some of the locks could persist from the previous thread. Due to these package allocation and table space intent locks hanging around IBM has now introduced the -MODIFFY DDF PKGREL(BNDPOOL) option with PI31597. This option is for customers that want the DBATs to be pooled in case the connection goes away. Meaning you want the same application to come in and reuse that DBAT because the locks could persist on that thread.
Here is a description for the 3 options for PKGREL.
With PI31597 there is another option for the -MODIFY DDF PKGREL() command:
BNDPOOL - New - The rules of the RELEASE bind option that was specified when the package was bound are applied to any package that is used for remote client processing. The high performance DBAT used by a terminating client connection will be pooled. BNDOPT - if you have PI20352 applied WITHOUT PI31597 then the BNDOPT behavior is like that of BNDPOOL where the DBAT will be pooled, possibly leading to the accumulation of locks. - The rules of the RELEASE bind option that was specified when the package was bound are applied to any package that is used for remote client processing. The high performance DBAT used by a terminating client connection will be deallocated. BNDOPT is the default value of the MODIFY DDF PKGREL command. COMMIT - The rules of the RELEASE(COMMIT) bind option are applied to any package that is used for remote client processing. COMMIT is the default value when the CMTSTAT subsystem parameter is set to ACTIVE. If the MODIFY DDF PKGREL command had never been issued, then COMMIT is the default value. No high performance DBATs exist while the PKGREL option is set to COMMIT.
So how do I know which option to use?
The 2 key points here are whether or not the application does more than 200 commits before deallocating or disconnecting, AND whether or not you have segregated different applications to different collections for the DEALLOCATE packages.
To determine such candidates you can look at IFCID 365, as below and compare the 'INITIATED FROM REMOTE SITE' vs. the 'DEALLOCATED CONVERSATIONS'. If the ratio is 200:1 or greater then it is a great candidate for DEALLOCATE in general.
You can also look in the accounting short report and compare the #DDFS (occurrences) vs. the CONV (conversations). Here we see 244 occurrences and 0 conversations ended, so it would be a prime candidate.
Given you know the behavior of the application as shown above the settings for PKGREL() could be:
COMMIT - is the default and you are not utilizing High Performance DBATs
BNDOPT - if you your applications do not deal
BNDPOOL - if the applications do not reuse the thread 200 times as shown above, but there is only 1 application per JCC package collection (not using the default NULLID collection name) then it would not matter as much that the locks persist across threads, because they are accessing the same objects. However you would get the CPU savings of having the threads pooled so the thread could actually be reused 200 times before it is destroyed.
agburke 060001QPDN Visits (1120)
There are two APARs to point out when looking to upgrade DB2 for z/OS when you are using QRep or SQL Replication (now referred to as Infosphere Replication Server).
The first, PM96954, addresses several issues customers have seen. Most notably high CPU overhead in Qapply when the same row is being updated repeatedly, as well as Qapply ending with ASN0543E due to changes occurring in the DB2 catalog during an online migration.
This informational APAR tracks PTFs for Infosphere Replication server which constitute the recommended service level during an upgrade of DB2 for z/OS.
Remember that if you plan on going to DB2 11 you must be on Infosphere Replication Server V10.2.1 prior to migrating to V11 CM mode due to the 10 byte extended RBA values. The IFCID 306 record is changed in CM mode and any product which reads them will need to be on the appropriate maintenance level.
agburke 060001QPDN Visits (1458)
There are 2 APARs for DB2 10 which could affect Class 3 wait time in DB2.
The first was a fix for an ISO(UR) application not returning recently updated rows. If the updates to a GBP dependent object create overflow records, the side effect of this APAR, is that each over flow page will result in a force log write, and forced write to the coupling facility of the overflow page. If there are many occurrences of this in a batch application for instance then the Log Write I/O suspense could become a noticeable performance degradation. The way to avoid this Log Write I/O is to avoid the writing of overflow records by increasing PCTFREE on the object. Overflow records can occur with compressed rows or VARCHAR fields which change length after the update, and will not fit back into the row's original place on the data page.
Here is a simple SELECT statement to determine the number of overflow records in an object. By determining the percent of rows that have overflowed you can determine the amount of free space that should be preserved in the object to avoid them. In DB2 11 the PCTFREE FOR UPDATE clause can help with this issue and avoid the need for a REORG.
SELECT name,partition, (DE
'SYSSTATS' and partition = 0 WITH UR;
This APAR addresses high Other Write I/O due to space map pages taking up space on the vertical deferred write queue and causing more frequent writes. The space map pages themselves are not changed and will not be written out, but a page p-lock is taken against them especially in cases where Member Cluster is used and the object is GBP dependent. The APAR ensures those spacemap pages are not left on the VDWQT queue.
agburke 060001QPDN Visits (1759)
If an application uses multi-row insert against a Universal table space, which is partition by growth, the getpage count could be unusually high. This APAR adjusts the space search algorithm as it pertains to multi-row insert.
"During the exhaustive search prior to the physical extend of the data set, the Multi-Row insert operation encounters a high get page count. In this case, insert operation fails to find available space to insert and will search the same set of space map pages or data pages for each insert operation within the same Muli-Row insert statement."
agburke 060001QPDN Visits (1688)
Customers requested a new object state which would allow readers access to data, eliminate the need to, or effect of a -STOP or -START command to place it in Read Only Status. This Persistent Read Only status causes UPDA
Recommended procedure for setting the PRO restricted status on a table space partition:
1. -STOP DB() SP() PART() - wait for the command to complete
successfully, verify that the object is in STOP status
(not STOPP) with the -DIS DB command.
2. -START DB() SP() PART() ACCESS(UT)
3. Create two full image copies of the table space partition
with COPY SHRLEVEL REFERENCE
4. Use REPAIR utility to turn on PRO status
5. -START DB() SP() PART() ACCESS(RW)
Along with the preceding APAR the lab has modified the DSNACCOX stored procedure to be able to handle the new PRO status to avoid unnecessary Utility suggestions.
agburke 060001QPDN Visits (2074)
With the increased attention on auditing capabilities around the DB2 engine this new feature adds the capability to do some simple auditing using the system time capability of temporal tables, as well as with some generated expression columns.
APAR text: " The PTFs for PM99683 (the preconditioning APAR), PI15298 (the enabling APAR), and PI15666 (the LOAD utility feature APAR) deliver integrated auditing support using non-deterministic generated expression columns to allow for automatic tracking of some audit information including: a. who modified the data in the table b. what SQL operation modified the data in the table."
As a rudimentary example if you are looking at a table of account balances and on Monday an account is inserted with a Balance of $10,000 you would see it enter the base table.
BASE_TABLE on Monday
AUDIT_TABLE on Monday (empty)
Now someone else comes in and updates the balance raising it to $20,000 on Tuesday.
BASE_TABLE on Tuesday
AUDIT_TABLE on Tuesday
I have left off the mandatory beginning timestamp,ending timestamp, and trans_id columns for the system time to simplify the example.
agburke 060001QPDN Visits (2094)
High Performance DBATs were introduce in DB2 10. In order to utilize this feature you must have the JCC packages (i.e. SYSNL200) bound with RELEASE(DEALLOCATE) as well as the -MODIFY DDF PKGREL(BNDOPT) option in place.
This allows distributed requests to benefit from the performance aspects of not going through deallocation after each commit. Caution should be used when employing this option, as you would not want every distributed application coming in as RELEASE(DEALLOCATE) and using up all of the available DBATs. You can more granularly control these If you bind the dynamic JCC packages into an alternate collection and then allow specific applications to use this by having them specify this collection in the CurrentPackageSet.
APAR PI20352 was opened because there were times of increased DDF SRB time seen when the thread was deallocated prior to the 200 uses. In order to alleviate this, code has been modified to allow the High Performance DBAT to be pooled if it has not reach the 200 use mark. The POOLINAC timeout value can be used to limit the time the DBAT remains in the pool.
agburke 060001QPDN Visits (1302)
This only affects new installations of DB2 11, not migrations.
After a new Version 11 installation of DB2 has created approximately 6534 archive log data set pairs, the offload task can begin failing with MSGDSNJ116I ERROR ATTEMPTING TO ADD ARCHIVE ENTRY TO BSDS. To bypass this problem, you can reduce the maximum number of archive log entries in the BSDS data sets by changing the DSN6LOGP MAXARCH value to 6500 in DSNZPARM.
From the APAR text, and how to determine if you rae affected:
DSNJU102 does not create enough log data set records in the
agburke 060001QPDN Visits (1470)
DB2 10 added a new column, REORGCLUSTERSENS, to RTS tablespace SYSI
agburke 060001QPDN Visits (2408)
Many customers these days are utilizing DASD mirroring solutions as well as Hyperswap technology to automate fail-over to an alternate site or to local DASD hardware in the event of a failure or disaster. z/OS APAR OA31707 was put out to aid in the event of a fail-over by ensuring any pages it might need would not be paged out to AUX.
From OA31707: "During a Hyperswap, it is possible for the system to require page fault resolution via page devices that may be part of the scope of devices being recovered by the Hyperswap. If this occurs, it is possible that a page fault will not be able to be resolved leading to deadlock and Hyperswap failures."
The downside of this is a massive amount of page fixed storage which includes the following:
The purpose of this entry is to ensure customers are aware of the effect on real storage when this function is enabled, and to plan for it in advance. A system that is already running lean on REAL storage may see increased demand paging once this function is enabled, which can lead to DB2 entering DISCARD MODE (contraction) due to the REAL
If you have page fixed your buffer pools then the vast majority of the DBM1 PRIVATE address space will never be paged to AUX either, so you could end up with a severe shortage of REAL storage on the LPAR.
You can issue the D XCF,COUPLE command to determine if the function is enabled.
Further important information about the protection provided by this APAR and the service it introduces can be found as follows:
Two other APARs relate to the REAL storage growth seen in DB2 due z/OS not reclaiming frames when CRITICAL PAGING was enabled....
mjparker 120000QPNA Visits (1520)
Some of you might have noticed that IBM Knowledge Center does not include topic footers that provide a link to the information in PDF format. However, the DB2 for z/OS information is still available in PDF format.