Increasingly, customers must keep their applications online and available all hours of every day, 24X7. Fortunately, the DB2 migration process has been designed specifically to allow the business application set to continue to run while you migrate. In this article, get hints and tips to make your online migration successful. While the general points in this paper apply to migrating to any release since V8, the article focuses on migrating to V10.
Before talking about online migration, it must be said that the best way to avoid conflicts between migration and the business application set is to run the migration jobs while the application set is not running. In other words, if you can afford to take an application outage, the recommendation is to run the migration jobs while nothing else is running. In a data sharing group, an outage here means a complete outage across the entire data sharing group. Running one member in ACCESS(MAINT) while there are other members running normally means you haven't taken an outage at all and that your migration is actually an online migration.
Additionally, while almost everyone uses the term CATMAINT to talk about migration, I'm afraid I cannot. CATMAINT is used in so many ways these days that if you said you ran CATMAINT, I'd have to ask you which one. So to avoid confusion, I use the names of the migration jobs. DSNTIJTC is the first job you run on the new release to enter Conversion Mode (CM). DSNTIJEN is the job you run to enable new function. This job comprises the ENFM process.
The migration jobs, DSNTIJTC and DSNTIJEN, use normal DDL and online REORG processing. As such, these jobs are designed to be run when the normal business workload for the time of day is active. This would be a time of reduced activity such as early Sunday morning. For non-data sharing, it is necessary to stop V8 or V9 to start V10 in order to run DSNTIJTC. However, that is not the case for data sharing. A single member can be (re)started in V10, where DSNTIJTC can run while the rest of the group is handling the normal business workload for the time of day on the down-level systems. On the other hand, DSNTIJEN can be run in either non-data sharing or data sharing while the rest of the system is running. By normal workload, I mean the business application set. I do not mean running utilities, DDL, GRANTs or REVOKEs, or binding plans or packages, all of which should be avoided during a migration process. In fact, all DBA activity should be avoided during a migration process.
In a normal operating DB2 system, most of what is needed to run the business workload is cached in memory. The EDM pool is the cache for static SQL and the dynamic statement cache for dynamic SQL. For SQL that isn't cached, access to the catalog and directory is read only for preparing both static and dynamic SQL for execution. This catalog and directory access is fleeting. It is released prior to the application getting control to execute SQL. During migration, the DDL that is done is committed quickly and the REORGs are SHRLEVEL REFERENCE. There should be no noticeable interference between the business application set and the migration process.
Avoiding -904 with 00C900A6
When V10 is first started, it will reject all requests with the resource unavailable SQLCODE of -904 and a reason code of 00C900A6. This means that the DSNTIJTC job has not yet completed successfully. To avoid this, you should temporarily remove the member on which you intend to initially start V10 from any workload balancing or sysplex routing scheme you have established until after you have successfully run the DSNTIJTC job. This will prevent applications from being routed to the V10 member until it is ready to handle normal activity. You should also avoid starting DDF on the V10 member until DSNTIJTC has completed successfully. After the DSNTIJTC job has completed successfully, you can restore your workload balancing or sysplex routing scheme to its normal behavior and start DDF if so desired. Note that changing your workload balancing or sysplex routing scheme and avoiding DDF is not necessary when running the DSNTIJEN job to handle the ENFM process.
Another way to avoid having the workload run on the V10 system before DSNTIJTC is complete is to start V10 the first time in ACCESS(MAINT). This does not interfere with the other members that are running normally, continuing to satisfy the normal business workload for the time of day. After DSNTIJTC has completed successfully in ACCESS(MAINT), this DB2 member can be stopped and restarted normally. Having mentioned ACCESS(MAINT), I am compelled to reiterate that running DSNTIJEN in a data sharing member started ACCESS(MAINT) has very little effect. Because there is a single copy of the Catalog and Directory, activity on the other members will have as strong an effect on the ENFM process, and vice versa, as if they had been running on the very same DB2 as DSNTIJEN. Running ENFM in a member started with ACCESS(MAINT) is actually an online migration if other members of the data sharing group are concurrently running workload.
SMS managed data sets
V10 requires that all new data sets for table spaces or indexes for the Catalog and Directory be SMS managed. DB2 assumes that there is an ACS routine that assigns these new data sets to a storage class that has the Extended Function (EF) and Extended Addressability (EA) attributes. Note that only new data sets will be placed in this storage class. Data sets for existing Catalog and Directory table spaces and indexes that are untouched by the migration process remain where they are. If you should run a REORG against any of these table spaces, in any V10 mode, the output data sets will also be placed in the SMS storage class. My recommendation for sizing this storage class is to begin with as much space as the Catalog and Directory currently occupy. Once in NFM, you can adjust the size if necessary.
What about failures?
The DB2 migration process, both the DSNTIJTC and DSNTIJEN jobs, are designed such that should anything go awry, the catalog and directory will be left completely operational. No recovery would be necessary. Should either of these jobs fail, after the reason for failure is discovered and corrected, the process can be restarted by simply rerunning the failing job, unchanged, and it will pick up where it left off and continue on to completion.
The ENFM process, job DSNTIJEN, works its way through the Catalog and Directory one table space at a time. Should the ENFM process fail for any reason, it will go back to the last commit point, which is always a point where the catalog and directory are completely operational. In fact, if the reason for the failure cannot be discovered and corrected in a reasonable amount of time, the catalog and directory can be left as it is, part way through ENFM, until this can be accomplished. This is true even if that means waiting until the next maintenance window. No recovery is necessary.
Given sufficient space for the new catalog and directory table spaces and index spaces, the most common stumbling block folks have run into is contention on the catalog caused by some monitor or similar process. Typically, such a process runs continuously. Sometimes they query something in the catalog, usually something in SYSDBASE, and never close the cursor or commit, thus holding a claim forever. When the ENFM process attempts to REORG SYSDBASE in DSNTIJEN it is unable to acquire the drain and DSNTIJEN fails. Again, this leaves SYSDBASE completely operational. No recovery is necessary. After the offending monitor or similar process is identified and terminated, DSNTIJEN can be rerun, unchanged, where it will pick up with processing SYSDBASE and move on to completion of the ENFM process. At this point, the monitor can be restarted.
If you are interested in finding potential contention prior to running DSNTIJTC or DSNTIJEN, you can run a normal SHRLEVEL REFERENCE REORG on the catalog and directory table spaces you are concerned about, such as SYSDBASE, as part of your pre-migration activities. If the REORG utility cannot break into the normally running activity, then the migration jobs will also be unable to break in. Should the REORG time-out, you can identify the process that is holding locks or claims against the catalog table space without concern about holding up a migration job. Beyond that, running a REORG against a catalog or directory table space has the added benefit of organizing the data such that a REORG against that table space that is part of the ENFM process can execute more quickly.
When customers get into trouble it is usually because they attempt to recover some part of the catalog and directory to a prior point in time, leaving the set as a whole inconsistent and out of sync. The only way to recover any part of the catalog or directory to some prior point in time is to do so as part of recovering the entire catalog and directory, all to the very same prior point in time. Also, if any REORG or DDL has been run since that point in time, all the effected user data has to be recovered to the very same point in time as well. Very quickly, this turns into recovering everything to that point in time. For a failure of the migration process, it is better to simply take a deep breath, leave everything in the catalog and directory as it is, and begin to discover the reason for the failure so that it can be corrected and the migration process restarted.
Just a note about processes that impede the progress of the DSNTIJTC or DSNTIJEN migration jobs. If you discover that an IBM product is impeding any part of the migration process, please open a PMR to let us know. Although we may not be able to make changes to the products quickly to avoid the contention, we should be able to provide guidance on how to deal with the conflict. We will also be able to warn other customers of the contention and provide them guidance on how to avoid it.
Stopping in the middle of ENFM
I've mentioned several times that the jobs should not be changed when they are restarted. In particular, do not remove job steps from DSNTIJEN or reorder them in any way. If you find that you are running close to the end of your migration window and it doesn't look like DSNTIJEN will complete in time, you can run a job called DSNTIJNH. It will reach out and tap DSNTIJEN on the shoulder to cause DSNTIJEN to stop when it completes the table space it is currently working on. When another migration window rolls around you can run DSNTIJEN again, unchanged, and it will pick up where it left off and continue on.
A few customers have noticed automatic rebinding activity during the execution of the DSNTIJEN job. These automatic rebinds must be because of static SQL references to something in the Catalog, which should not be running as part of the business application set. We narrowed this activity down to references to Created Global Temporary Tables (CGTTs). A CGTT has a permanent definition and, as such, has a row in SYSTABLES. The RI associated with SYSTABLES requires a valid table space name be in the TSNAME column, but because CGTTs have no permanent table space, we had to choose some table space we knew would always exist. In prior releases we chose SYSPKAGE. Because SYSPKAGE is dropped during ENFM, in V10 we choose SYSTSTAB.
So far so good. However, we also discovered that for references to CGTTs we had erroneously been recording a dependency on SYSPKAGE because of the contents of the CGTT's SYSTABLES TSNAME column. Therefore, when DSNTIJEN dropped SYSPKAGE, all references to CGTTs were unnecessarily invalidated, forcing succeeding usage of these CGTT references to be rebound. PM81189 has corrected this so that we no longer record a table space dependency because of a CGTT reference. However, the CGTT dependencies recorded prior to applying PM81189 will remain and still cause the automatic rebinding during ENFM. Therefore, PM81189 also includes a change to the premigration job DSNTIJPM (DSNTIJPA) to add a report that identifies dependencies on SYSPKAGE. These packages can be rebound prior to running DSNTIJEN to avoid the automatic rebinds when SYSPKAGE is dropped.
Practice for success
The customers who have had the most success with migration in an online fashion have made a clone of the DB2 subsystems they were going to migrate and practiced the migration process with the clone. This allows them to discover impediments to the migration process for that particular DB2 subsystem in an isolated and controlled environment. Any stumbling blocks can be handled without concern for running out of time in a migration window. Using a clone also gives them experience with migrating the particular DB2 subsystem and confidence that they can manage the migration. It also provides them a very good estimate of how long the migration steps would take, giving them the information they need to size the migration windows appropriately and again giving them confidence that they can accomplish the migration in the allotted time. Not all customers have the luxury of being able to clone an entire production data sharing group. However, the closer you can come to this ideal, the closer you will come to a problem-free online migration.
The DB2 migration process has always been designed to allow migration to proceed while the business application set is active. Some customers have exploited this capability to keep a production data sharing group servicing their business needs continuously for many years, including across multiple DB2 migrations. In recent years, this design has been improved to reduce the possibility of contention wherever it has been found. With proper preparation, following the hints and tips outlined in this paper, you too can perform your next DB2 migration without a business application outage.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up to speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, or use a product in a cloud environment.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.