It is a common belief that DB2 10 and 11 for z/OS can only use 1MB size pageable large pages, other than for buffers, if the CEC has Flash Express installed. For example, the text for APAR PM85944 infers just that. However, this is not true. The only requirement is that the CEC be SCM-capable (SCM = Storage-Class Memory). In other words, it does not matter whether Flash Express is actually installed or not. So if a customer is running on a zEC12 CEC without Flash Express, DB2 could request, and be given, a 1MB size pageable large page, residing in a 1MB size large page frame. However if that page needs to get paged out for some reason, at that point it will be broken down into 4KB page frames. To put it another way, pageable large pages are available on a zEC12 capable CEC with the caveat that if Flash Express is not installed, then if those pages are ever paged out they will be demoted to 4KB page frames. 1MB pageable large pages that are demoted to 4KB page frames as they paged out will never be coalesced: that is, they will remain 4KB page frames for the remaining life of the IPL.
Watch out for open APAR PI18475. This describes a situation where IRLM V2.3 fails with ABEND0C4 in DXRRL770 after PI07853/UI14551 is applied.
The error occurs during P-lock processing. The full range of symptoms is unknown, but include DB2 termination with reason code 00C90093, indicating an invalid request to update an object allocated to a utility, preceded by message DSNT501I with reason code 00C202AA.
However, be aware of the warning implied in the APAR text indicating that "other symptoms may be possible". It is strongly recommended that you monitor this APAR and apply the corrective maintenance when available.
The last thing you want to deal with when in a recovery situation is a failed RECOVER, so keep an eye on APAR PI17986, which was closed very recently. This affects customers DB2 10 and DB2 11 customers who have APAR PM88455, PTF UK97229/UK97230 applied. The APAR abstract indicates that this affects customers who use RECOVER TOCOPY, but closer examination of the APAR text reveals that this can also affect customers who use RECOVER TOLOGPOINT or TORBA, so in other words this can affect any PIT recovery.
The RECOVER can fail with RC 8, and something like the following message is issued:
DSNU556I - RECOVER CANNOT PROCEED FOR TABLESPACE db.ts DSNUM x
TOCOPY copy_data_set BECAUSE A SYSIBM.SYSCOPY RECORD HAS BEEN
ENCOUNTERED WHICH HAS DBNAME=db TSNAME=ts DSNUM=y ICTYPE=M
STYPE=R STARTRBA=X'nnnnnnnnnnnn' LOWDSNUM=0 HIGHDSNUM=0
Although the APAR is closed, the PTFs are not yet available. It is strongly advised that you monitor this APAR and apply the corrective maintenance when available.
Without this fix applied, when REORG FORCE cancels a thread, the cancel operation picks up a control block address without holding a latch, and when that control block address is used for thread cancellation, it might actually have been reused by a different thread. So there are two possibilities: the wrong thread may get cancelled; and, if that is the case, depending on what that cancelled thread id doing, DB2 may be brought down. The fix causes a latch to be acquired when picking up the control block address, ensuring that the right thread is cancelled, and protecting you from a DB2 outage.
Many customers experience problems when allocating large datasets, because the datasets often end up being allocated in many extents. This can affect performance and availability as high frequency extent allocation or in extreme cases lead to extent allocation failure. The following is a partial rework of information provided by John Campbell, IBM DB2 for z/OS Distinguished Engineer, in response to a real customer problem.
At a high level, the recommended best practice for managing large datasets is for customers to use both Space Constraint Relief and Extent Constraint Removal. That is the easy part of the answer. The difficult part is understanding the implication of setting PRIQTY=-1 for large objects.
In terms of volume maintenance, the vast majority of customers do not DEFRAG their volumes; sometimes this is just impossible because of 24x7 operations, but sometimes customers do not even know they need to. Consider the impact, if you have fragmented volumes, and DB2 is using the sliding scale of secondary allocation. DB2 might, for example, allocate 7 cylinders, and then work out it needs to allocate 8 cylinders on the next allocation using the sliding scale, but the volume is so fragmented that another extent has to be found as the adjacent space does not hold 8 cylinders, therefore resulting in many extents.
If you find yourself in this situation, there are some questions you need to answer and consideratiobs to take into account:
What 3390 model type are you emulating? You might be struggling to obtain a PRIQTY of 36000, for example, which might seem quite big but is only 500 cylinders and is not, in fact, very large. This opens up the possibility that you are still emulating 3390 Mod-3s, which are, by today's standards, very small. The larger the logical model size you are emulating, the less of an issue finding 500 cylinders should be.
What is the HIGH specification in ISMF for the SMS Storage Group? Many customers are using 99% and forcing datasets to go multi-volume, causing additional DB2 performance degradation because of the additional synchronous request to the ICF catalog for each new volume used.
Are there enough volumes in the SMS Storage Group? Many customers are running their systems lean and mean, and do not have enough logical volumes. Even if they understand that, they are often unwilling to add more volumes because of the cost.
What DSSIZE you use indicates how many extents it will take to get to that 500 cylinder point, via a gradual slide or stepping stone.
For many customers there will be a 90/10 split for small vs. large data sets. If you assume 300 cylinders as the dividing line (this is an arbitrary value used for purposes of this discussion), then 90% of objects would be below 300 cylinders and 10% of the objects above. It would therefore make sense to provide two SMS Storage Groups, one for small objects, and one for large objects. The 'large' pool should have some additional logical volumes as a 'buffer' for doing such things as parallel Online REORGs. This way the small objects don't introduce fragmentation into the Storage Pool for the large data sets, and therefore would cause many fewer problems. The problem with PRIQTY -1 or a value very low is that it will not trigger the Storage Group ACS routine to allocate the datasets in the 'large' pool, because the allocations are based on an initial space allocation which is too small. You could take individual data sets or a set of them and have the ACS routines place them specifically in the 'large' pool, but that would be a manual approach. An alternative is if the installation to allow DBAs to specify one of the SMS classes on the CREATE/ALTER STOGROUP, the Storage Administrator can then code the Storage Group ACS routine so as assign the datasets into a specific special class and into the 'large' pool based on the class name provided. This takes away the manual effort and gives more control to DBAs.
When John was helping design sliding secondary allocation, he never envisaged PRIQTY=SECQTY=-1 to be a 'one size fit all' approach to be used for all objects. If you know the allocation for an object will be large, a common approach is to allocate 1000 cylinders primary, and 100 cylinders secondary if there is enough room. This is true for Online REORG of large objects as well as those that do not use REUSE or ALTER of the space before the REORG. In this case DB2 Data Space Manager will go through the slide scale all over again, so a 2900+ cylinder 2GB data set slides from the beginning even though we know what the potential high end size will be.
Be aware that every time a new extent is taken there is a synchronous request to the VTOC, plus the performing of the sliding scale calculation, and if the object goes multi-volume then add the additional synchronous request for the ICF catalog to add a new volume. Very few customers see this performance impact.
So in summary, I will make the following recommendations:
Enable both Space Constraint Relief and Extent Constraint Removal. For more information, see z/OS DFSMSdfp Storage Administration, SC23-6860-01.
Have a segregated SMS Storage Group with additional volumes for large allocations.
Use larger emulated logical volumes.
Use a reasonable value for the ISMF Storage Group HIGH value. For more information, see z/OS DFSMS Implementing System-Managed Storage, SC23-6849-00.
This new information APAR is opened for recommended APARs and tuning to improve DB2 INSERT performance based on recent observations from analysis and solution to customers using SAP application - but these recommendation should also apply to general customers.
Migrating DB2 for z/OS with the business application set active
Increasingly, customers must keep their applications online and available all hours of every day, 24X7. Fortunately, the DB2 migration process has been designed specifically to allow the business application set to continue to run while you migrate. This article by Jay Yothers, IBM Senior Technical Staff Member, provides hints and tips to make your online migration successful.
This new IBM Redbooks publication discusses the performance and possible impacts of the most important functions in DB2 11 for z/OS. It includes performance measurements that were made in the laboratory and provide some estimates improvements when moving from DB2 10.
APAR II13538: DEALING WITH HUNG COUPLING FACILITY CONNECTIONS IXLCONN REASON CODE 02010C27 0C27 02010C09 APAR status http://www-01.ibm.com/support/docview.wss?uid=isg1II13538
When a DB2 member abnormally terminates, its connections to the coupling facility structures are put into a FAILING state by cross-system extended services for z/OS (XES). The FAILING DB2 member remains in this state until all surviving members of the group have responded to the XES Disconnected/Failed Connection (DiscFailConn) event for each structure. XES sends this event to each surviving member of the group so that the necessary recovery actions can be taken in response to the failed member.
For this reason it is important to preform the actions described in this informational APAR to recover from hung CF structure connections.
Other important informational APARs in this area are: APAR II14016: DB2 RA10 R910 R810 HANG WAIT SUSPEND LOOP PROBLEM SUMMARY and for HANG or WAIT problems in DB2 DISTRIBUTED asid also see II08215 + II11164.
There are 56 Best Practices located at this site covering DB2 for z/OS, DB2 Tools, and QMF.
From the best practice pages on this site you can view the videos (streamed from YouTube), download a copy (MP4) to your local machine, download a transcript, and download slides when available.
DB2 for z/OS Best Practice: FlashCopy and DB2 for z/OS https://ibm.biz/BdRK4R
This Best Practice covers FlashCopy and its uses in conjunction with DB2 for z/OS. It gives you an overview of FlashCopy, how the DB2 utilities use it, and how you can affect the behavior in your environment. How to use FlashCopy to takes backups of data outside of DB2’s control is discussed. Finally, how FlashCopy fits in with DASD based replication solutions such as Metro Mirror (PPRC), z/OS Global Mirror (XRC), and Global Mirror.
In this Best Practice, Sheryl Larsen, the DB2 for z/OS world-wide evangelist discusses structures and appliances that can be used in conjunction with DB2 to improve performance. She presents information on base table indexes, index on expression, Materialized Query Tables (MQTs), zIIPs, and Accelerated Query Tables (AQTs). Decisions and cases regarding when to use these structures are presented.
DB2 for z/OS Best Practice: Advanced SQL Performance Insights https://ibm.biz/BdRKr7
In this Best Practice, Sheryl Larsen, the DB2 for z/OS world-wide evangelist shares many of the insights she has gained in writing efficient and well performing SQL. She explains the stages of SQL processing and filtering and gives practical tips on which predicates might obtain faster results. The steps to review your SQL are outlined. Sheryl also reviews some of the analytic capabilities of SQL statements such as PACK, GROUP BY GROUPING SETS, GROUP BY ROLLUP, and GROUP BY CUBE.
RTS REORGLASTTIME is set to creation time without REORG ever executed:
RTS tracks the last time when a REORG utility was executed against an table space / partition object in SYSIBM.SYTABLESPACESTATS.REORGLASTTIME.
However,after a newly created table space object without any REORG executed against it, SYSIBM.SYSTABLESPACESTATS.REORGLASTTIME indicates the same timestamp as the creation time.
Why is that?
Actually this behaviour is intended and works as designed. The rational behind is that a newly created empty table is considered perfectly reorganised.
That behaviour is also documented in the DB2 documentation about SYSIBM.SYSTABLESPACESTATS.REORGLASTTIME (http://www-01.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.sqlref/src/tpc/db2z_sysibmsystablespacestatstable.dita?cp=SSEPEK_11.0.0%2F10-0-117&lang=en):
The timestamp the REORG utility was last run on the table space or partition, or when the REORG utility has not been run, the time when the table space or partition was created. A null value indicates that the timestamp is unknown.
A value of NULL would indicate an UNKNOWN point in time. One could argue that this theoretically true, no REORG utility was executed against the table space object. However, the other rationale is that DB2 supplied stored procedure DSNACCOX has to identify the need for a REORG based on REORG-INSERT/UPDATE/DELETE values of the SYSIBM.SYSTABLESPACESTATS table. This is basically done based on the SYSIBM.SYSTABLESPACESTATS.UPDATESTATSTIME value and number of SYSIBM.SYSTABLESPACESTATS.REORG-INSERTS/DELETES/UPDATES compared against SYSIBM.SYSTABLESPACESTATS.REORGLASTTIME. So if we would set SYSIBM.SYSTABLESPACESTATS.REORGLASTTIME to NULL during the CREATE time, which means also UNKNOWN, DSNACCOX could not pick on that. In the past we had to run REORG explicitly after create time to enable DSNACCOX.
With that said, there is also an exception to it. APAR PM37511 (http://www-01.ibm.com/support/docview.wss?uid=swg1PM37511) introduced a change where COPY or RUNSTATS would trigger setting RSYSIBM.SYSTABLESPACESTATSEORGLASTTIME to NULL if the RTS row for the affected object did not exist before. This situation could occur when during CREATE TABLESPACE or CREATE INDEXSPACE time the RTS table space was offline or not accessible. Or when migrating from DB2 UDB for z/OS Version 8.1, the table space was created when the RTS support was disabled.
A couple of recent DB2 Connect APARs have been identified as high pervasive and can impact customers planning on migrating to DB2 Connect 10.5 fix pack 2 in preparation for their migrating to DB2 11.
When discussing on DB2 11 migration plans, please review the following DB2 Connect APARs.
APAR IC99419 CLI-BASED APPLICATIONS RECEIVE SQL0501N AGAINST DB2 Z/OS WHEN STORED PROCEDURE CALL HAS MULTIPLE CURSORS APAR IC98222 COMPLETE SPECIAL REGISTER SUPPORT FOR DB2 Z/OS V11 APAR IC99735 GETJCCSPECIALREGISTERPROPERTIES () RETURNS SQLCODE -4743 AGAINST V11NFM WITH V10R1 APPLCOMPAT PACKAGES
If you are planning to deploy DB2 Connect 10.5 fix pack 2 and are susceptible to the reported problem, the recommendation is to either request a special build with the APAR fix or wait until the APAR is closed in a future fix pack available.
Customers who are using VSAM log striping for the DB2 active log datasets with DB2 10 for z/OS or DB2 11 for z/OS should pay attention to recently opened APAR PI10353, which is now marked HIPER. There is no exposure for DB2 for z/OS customers running DB2 V9.1 or earlier. Neither is there any cause for alarm, as the exposure for DB2 10 or 11 is very small. There is no data loss involved and no loss of data integrity.
So what is the underlying problem?This concerns a concept called dependent writes, which means that, in the event of a DB2 restart after a crash, or when restarting DB2 from a DASD mirror or restored FlashCopy image, I/Os must be applied in the order in which they were initiated, and not in the order in which they completed. Unfortunately, VSAM striping does not guarantee the correct ordering of dependent writes for in-flight I/Os. This could mean, in some rare scenarios, a missing log CI stripe. Just to be clear: VSAM striping has always worked this way.
Why are DB2 10 for z/OS and DB2 11 for z/OS exposed, and not earlier releases? Forced log writes (for example, those scheduled as a result of an application commit) can mean that a partially full log CI is written to disk. That log CI has to be rewritten in place when the next log write I/O is scheduled. Prior to DB2 10 for z/OS, DB2 would perform the rewrite of the log CI to log copy 1 and then to log copy 2 serially - ensuring that the write to log copy 1 is complete before initiating the write to log copy 2. With DB2 9 for z/OS and earlier releases, DB2 restart automatically detected the hole in the log, truncated the log at that point and continued the restart process. However, a performance enhancement delivered with DB2 10 for z/OS caused DB2 to rewrite log CIs (those which had been partially full when previously written) to both log copy 1 and log copy 2 in parallel. Unfortunately, the existing logic in DB2 10 for z/OS and DB2 11 for z/OS cannot deal with the new situation, that is, a missing log CI stripe.
A possible consequence of this is that one or both copies of the current DB2 active log pair could be damaged at the end, causing a DB2 crash restart to fail and leading to advanced non-standard recovery, which has to be done under guidance of the IBM DB2 service team. This could theoretically happen in the following circumstances:
A local site crash, such as a CEC or LPAR failure, or an address space failure
A DB2 restart off an XRC mirror or PPRC mirror
A DB2 restart off a restored system level FlashCopy
No customer has ever hit the problem following a local site failure in over 3 years of DB2 10 for z/OS production experience. Only one customer has ever hit the problem, and only when performing disaster recovery restart testing from an XRC mirror. This happened on two occasions. The chance of hitting the problem during DB2 crash restart following a local site failure is tiny. The chance of hitting the problem when restarting from a DASD mirror or from a restored system level FlashCopy is marginally greater, because the timing window for this sort of event (a missing log CI stripe) is larger. APAR PI10353 will provide a fix (circumvention) such that DB2 will automatically detect this type of problem, repair the log and allow DB2 crash restart to proceed.
For further guidance, please see the problem relief as described in the APAR, but be cautious if you are considering the option of disabling VSAM striping, which is difficult to implement, and may well result in serious performance problems.