Exploring IMS disaster recovery solutions, Part 2: IMS Base and IMS Tools recovery solutions

Every customer needs a Disaster Recovery (DR) plan. The strategies used differ from one customer to another and they differ in time to recovery and loss of data. For IMS®, there are five types of DR solutions: restart, recovery, recovery and restart, coordinated IMS and DB2® restart, and coordinated IMS and DB2 disaster recovery and restart. Here in Part 2, we explore the recovery solutions that use only the IMS base functions and some of the functions in the IMS Tools.

Glenn Galler (gallerg@us.ibm.com), Certified IT IMS Specialist, IBM China

Glenn GallerGlenn Galler is a certified IT specialist for the IMS product in the IBM Advanced Technical Skills (ATS) group. He is a senior programmer specializing in disaster recovery. He joined the ATS group in March 2007. Galler is also the campus recruiting manager for the IBM Software Group for the University of Michigan, holding this position since 1998. He joined IBM in 1982 receiving his bachelor's degree in computer science from the University of Michigan. In 1989, he received a master's degree in computer engineering from the University of Santa Clara. Galler has worked in many areas of IMS, including testing, development, marketing and management. From 1992 to 1997, he held an international assignment in England as the European program manager for the IMS Quality Partnership Program (QPP).



Ron Bisceglia (RBisceglia@rocketsoftware.com), Lead Software Developer, Rocket Software

Ron BiscegliaRon Bisceglia is a lead software developer for Rocket Software, based in Houston. He has worked with IMS for more than 24 years, and for the past 20 years has been involved in the design, development, and support of a range of IMS tools. He has been involved in the development of database reorganization utilities, data propagation tools, database monitoring and analysis solutions, data replication, and backup and recovery products.



12 April 2012

Before you start

About this series

This "Exploring IMS Disaster Recovery Solutions" series covers non-storage mirroring disaster recovery solutions for the IMS environment. Methods described show how various backup and recovery resources can be combined to reduce time and data loss in the event of a disaster. Storage-based fast replication techniques (like FlashCopy technology) are also shown here with storage-aware IMS Tools that allow a coordinated and consistent recovery point between IMS and DB2 databases.

About this tutorial

Part 3 describes the IMS Disaster Recovery (DR) solutions that require backup resources like image copies, change-accumulation data sets and, in some cases, the archived log data sets.

The IMS Base and one IMS Tools Disaster Recovery solutions discussed in this tutorial:

  • Timestamp Recovery to Image Copies Only (TSR to IC)
  • Timestamp Recovery Using Change Accums (TSR to CA)
  • Full Database Recovery to Last Good Archived Log (Full DB Recovery)
  • Timestamp Recovery Using Incremental Image Copies Only (TSR to IIC)

This tutorial is the second in a series of four to show you how to use two forms of recovery: timestamp recovery (TSR) for a specific recovery point (RP) and full database recovery to the end of the last good archived log data set sent to the remote site. In the fourth solution, the tools in the IMS Recovery Solution Pack are used to create an incremental image copy (IIC) offline by applying log updates to an existing image copy up through an RP, resulting in a disaster recovery scheme that employs only clean image copies. Each solution in this tutorial requires a certain amount of RECON conditioning.

If you haven't done so, consider downloading the IMS Disaster Recovery Demonstrations, which go hand in hand with this series.

Objectives

After completing the tutorial, you should be able to:

  • Understand how to create RPs
  • Understand how TSR and full database recovery work
  • Create IICs using the IMS Recovery Solution Pack
  • Execute GENJCL to create recovery jobs
  • Identify uncommitted updates in the archived log data sets
  • Create backup copies of the RECON data set
  • Manually condition the RECON data set for recovery at the remote site

Prerequisites

You should have basic knowledge of the following:

  • z/OS® environment
  • IMS operating environment
  • IMS commands and procedures
  • Internet browser, such as Firefox or Internet Explorer

IMS Base (TSR to IC)

IMS Base Timestamp Recovery to Image Copies Only

Primary site activities

With TR, it is necessary to create an RP that is consistent for the databases being recovered. An RP is a period of time when the database is not allocated. Generally, the ALLOC records in the RECON data set can be analyzed to determine periods of time when the database is not allocated. An RP can be created for a database by taking it offline with a /DBR command or the equivalent UPDATE DB STOP(ACCESS) command or with a /DBD command or the equivalent UPDATE DB STOP(UPDATES) command. In IMS 11, the DB QUIESCE function provided the ability to create an RP without taking the database data sets offline. This function is provided by the UPDATE DB START(QUIESCE) command, and it allows transaction activity to be paused at commit points. In the RECON data set, these DB QUIESCE RPs are indicated with ALLOC records that have been updated with a DEALLOC time and the QUIESCE flag turned on. A new ALLOC record will be written when the database data set is updated again after the DB QUIESCE recovery point.

For this first TR solution, we use DB QUIESCE with the HOLD option to pause transaction activity at commit points, and we hold onto the QUIESCE until the Clean Image Copy is created. Then we explicitly release the DB QUIESCE with an UPDATE DB STOP(QUIESCE) command. The image copy is registered as a Batch Image Copy in the RECON data set. The RECON data set is backed up and transmitted to the remote site with the image copies.

In Figure 1, the DB QUIESCE command is shown, followed by a clean image copy and backup of the RECON data set.

Figure 1. Create Clean Image Copies using DB QUIESCE and Backup of RECON Data Set
Image shows Create Recovery Point with DB QUIESCE

Remote site activities

At the remote site, the backup copy of the RECON data set must be conditioned before it can be used for recovery. When the backup RECON was created, it showed a primary system that was up and running and healthy. The conditioning will change the RECON data set to reflect that a disaster has occurred and IMS was abnormally terminated.

The first step is to delete the IMS subsystems that were active at the time the RECON was created. A LIST.SUBSYS DBRC command will show which IMS subsystems were active and then a series of commands are required to flag each subsystem as abnormally terminated.

Listing 1. Commands to flag subsystems as abnormally terminated
CHANGE.SUBSYS SSID(ssidname) ABNORMAL
CHANGE.SUBSYS SSID(ssidname) STARTRCV
CHANGE.SUBSYS SSID(ssidname) ENDRECOV 
DELETE.SUBSYS SSID(ssidname)

By deleting the IMS subsystems records, it is no longer possible to emergency restart the IMS systems at the remote site. For this solution, this is not a problem because the recoveries are to RPs created by clean image copies, and dynamic back-out is not required for any uncommitted updates.

The second conditioning step is to flag the primary image copies as invalid. This is only necessary if dual image copies are being created at the primary site and only the secondary image copy is being transmitted to the remote site. It is common to use dual image copies in this way. The command for this is CHANGE.IC DBD(dbname) DDN(ddname) INVALID.

Once this initial RECON conditioning has been done, the GENJCL.RECOV command can be issued to create the DBRC recovery JCL, and the resulting jobs can be executed. The timestamp used with GENJCL.RECOV is the image copy runtime.

The final conditioning step before IMS is cold-started is to close and archive the open log data sets in the RECON data set. The Online Log Data Set (OLDS) data sets are not transmitted to the remote site, but at the time the backup RECON was created, it would have showed an active OLDS data set. To close and archive the data set, an empty OLDS data set must be allocated, and the active OLDS in the RECON must be flagged for archiving. The archive utility will not care that the OLDS is empty and it will archive it appropriately. As a result the PRILOG, PRISLD, and PRIOLD records in the RECON will change from having all zeros in the stop times to having valid timestamps. The command to identify the active OLDS for this is LIST.LOG ALLOLDS.

After the OLDS is allocated, the command to notify the RECON that the OLDS needs archiving is NOTIFY.PRILOG OLDS(oldsdd) SSID(newssid) -STARTIME(start) … RUNTIME(start+1).

These RECON conditioning steps at the remote site are shown in Figure 2. After these steps have been completed, conditioned RECON data set is restored to the production RECON and used to restart the IMS subsystems with a cold start.

Figure 2. Manually condition the RECON data set
Image shows manually conditioning the RECON data set (remote)

IMS Base (TSR to CA)

IMS Timestamp Recovery using Change Accums to a Recovery Point

Primary site activities

With TR, it is necessary to create an RP that is consistent for the databases being recovered. An RP is a period of time when the database is not allocated. Generally, the ALLOC records in the RECON data set can be analyzed to determine periods of time when the database is not allocated. An RP can be created for a database by taking it offline with a /DBR command or the equivalent UPDATE DB STOP(ACCESS) command or with a /DBD command or the equivalent UPDATE DB STOP(UPDATES) command. In IMS 11, the DB QUIESCE function provided the ability to create an RP without taking the database data sets offline. This function is provided by the UPDATE DB START(QUIESCE) command, and it allows transaction activity to be paused at commit points. In the RECON data set, these DB QUIESCE RPs are indicated with ALLOC records that have been updated with a DEALLOC time and the QUIESCE flag turned on. A new ALLOC record will be written when the database data set is updated again after the DB QUIESCE RP.

For this second TR solution, we use DB QUIESCE with the default NOHOLD option to pause transaction activity at commit points. This creates an RP, and by default, the OLDS are switched and archived as part of the DB QUIESCE command. GENJCL.CA is used to create the jobs that will create the CA data sets. The CA data sets have changes on them up until the RP created by the DB QUIESCE command. The RECON data set is backed up and transmitted to the remote site with the image copies and CA data sets.

In Figure 3, the steps to run GENJCL to create the CA data sets and the step to back up the RECON data set are shown.

Figure 3. Create Change Accum to up to recovery point (primary site activity)
Image shows building GEMJCL for change accum, run change accum, to RP, Show RP, and create backup RECOM

Remote site activities

At the remote site, the backup copy of the RECON data set must be conditioned before it can be used for recovery. When the backup RECON was created, it showed a primary system that was up and running and healthy. The conditioning will change the RECON data set to reflect that a disaster has occurred and IMS was abnormally terminated.

The first conditioning step is to flag the primary image copies as invalid. This is only necessary if dual image copies are being created at the primary site and only the secondary image copy is being transmitted to the remote site. It is common to use dual image copies in this way. The command for this is CHANGE.IC DBD(dbname) DDN(ddname) INVALID.

The second step is to delete the IMS subsystems that were active at the time the RECON was created. A LIST.SUBSYS DBRC command will show which IMS subsystems were active, then a series of four commands are required to delete each subsystem ID.

Listing 2. Commands to delete each subsystem ID
CHANGE.SUBSYS SSID(ssidname) ABNORMAL
CHANGE.SUBSYS SSID(ssidname) STARTRCV
CHANGE.SUBSYS SSID(ssidname) ENDRECOV 
DELETE.SUBSYS SSID(ssidname)

By deleting the IMS subsystems records, it is no longer possible to emergency-restart the IMS systems at the remote site. For this solution, this is not a problem because the recoveries are to RPs created by clean image copies, and dynamic back-out is not required for any uncommitted updates.

The third conditioning step before IMS is cold-started is to close and archive the open log data sets in the RECON data set. The OLDS data sets are not transmitted to the remote site. However, at the time the backup RECON was created, it would have showed an active OLDS data set. To close and archive the data set, an empty OLDS data set must be allocated, and the active OLDS in the RECON must be flagged for archiving. The archive utility will not care that the OLDS is empty, and it will archive it appropriately. As a result, the PRILOG, PRISLD, and PRIOLD records in the RECON will change from having all zeros in the stop times to having valid timestamps. The command to identify the active OLDS for this is LIST.RECON.

After the OLDS is allocated, the command to notify the RECON that the OLDS needs archiving is NOTIFY.PRILOG OLDS(oldsdd) SSID(newssid) -STARTIME(start) … RUNTIME(start+1).

Prior to running the recovery jobs, the database data sets must be deleted and reallocated as empty data sets. Once this is done, the GENJCL.RECOV command can be issued to create the DBRC recovery JCL and the resulting jobs can be executed. The timestamp used with GENJCL.RECOV is timestamp of the last CA data set that contains the RP created by the DB QUIESCE command. These RECON conditioning steps are shown in Figure 4.

Figure 4. Manually Condition the RECON Data Set (Remote Site Activity)
Image shows flagging primacy IC as invalid, show non-zero stop times for PRILOG, PRISLD, PRIOLD, delete SSYS record, and set non-zero stop times in PRILOG, PRISLD, PRIOLD

Prior to running the recovery jobs, the database data sets must be deleted and reallocated as empty data sets. Once this is done, the GENJCL.RECOV command can be issued to create the DBRC recovery JCL and the resulting jobs can be executed. The timestamp used with GENJCL.RECOV is timestamp of the last CA data set that contains the RP created by the DB QUIESCE command.

The two final steps are to delete the PRIOLD records and allocate the RDS data set. The OLDS and the RDS were not transmitted to the remote site. Finally, the IMS subsystems can be cold-started.


IMS Base (full DB recovery)

IMS base full database recovery to end of last good log

Primary site activities

With full database recovery, the updates in the archived logs are applied to the clean or concurrent image copies through to the end of the last good log data set available at the remote site. It is unnecessary to create an RP for full database recovery as the updates are applied to the database data sets, including the uncommitted updates. IMS is then emergency-restarted to allow dynamic back-out to back out the uncommitted updates. If emergency restart fails, it is also possible to run batch back-out to back out the uncommitted updates.

It must be noted that full database recovery will not work in a data-sharing environment because the archived logs do not end at the same timestamp on the one or more data-sharing systems. This means that when change accumulation is run, there could be spaces between the logs where updates could exist. These spaces are called spill records, and the Change Accum indicates this by showing COMPLETE=NO for the resulting CA data set. When a CA data set has this indicated, it cannot be used for recovery.

Each time a log data set is archived, the RECON data set is backed up, and the log and backup RECON are transmitted to the remote site. The full database recovery will use the last good log data set received at the remote site.

Full database recovery and data-sharing constraint

Full database recovery is not supported in a data-sharing environment because the log data sets do not end at the same time. The failure occurs in change accumulation prior to recovery.

Remote site activities

At the remote site, the backup copy of the RECON data set must be conditioned before it can be used for recovery. When the backup RECON was created, it showed a primary system that was up and running and healthy. The conditioning will change the RECON data set to reflect that a disaster has occurred and IMS was abnormally terminated.

The first conditioning step is to flag the primary image copies as invalid. This is only necessary if dual image copies are being created at the primary site and only the secondary image copy is being transmitted to the remote site. It is common to use dual image copies in this way. The command for this is CHANGE.IC DBD(dbname) DDN(ddname) INVALID.

The second conditioning step before IMS is cold-started is to close and archive the open log data sets in the RECON data set. The OLDS data sets are not transmitted to the remote site. However, at the time the backup RECON was created, it would have showed an active OLDS data set. To close and archive the data set, an empty OLDS data set must be allocated, and the active OLDS in the RECON must be flagged for archiving. The archive utility will not care that the OLDS is empty and it will archive it appropriately. As a result the PRILOG, PRISLD, and PRIOLD records in the RECON will change from having all zeros in the stop times to having valid timestamps. The command to identify the active OLDS for this is LIST.RECON.

After the OLDS is allocated, the command to notify the RECON that the OLDS needs archiving is NOTIFY.PRILOG OLDS(oldsdd) SSID(newssid) -STARTIME(start) … RUNTIME(start+1).

The third step is to flag the IMS subsystems as abnormally terminated. A LIST.SUBSYS DBRC command will show which IMS subsystems were active, then a series of commands are required to delete each subsystem ID:

Listing 3. Commands to show which IMS subsystems active
CHANGE.SUBSYS SSID(ssidname) ABNORMAL
CHANGE.SUBSYS SSID(ssidname) STARTRCV

It is important not to delete the IMS subsystem DBRC records so emergency restart can be done for these IMS systems. Prior to running the recovery jobs, the database data sets must be deleted and reallocated as empty data sets. Once this is done, the GENJCL.RECOV command can be issued to create the DBRC recovery JCL, and the resulting jobs can be executed. There is no timestamp specified with GENJCL.RECOV because the recovery is to the end of the log. The last step is to allocate an empty RDS data set. The RDS contains checkpoint information, but it was not sent to the remote site. The emergency restart will work without the RDS data set, but will take longer because the restart will need to look in the logs to find the correct checkpoint timestamp. The emergency restart command includes FORMAT WA RS to allow the Write-Ahead Data Set (WADS) and RDS data sets to be formatted. Finally, the IMS subsystems are emergency-restarted, and dynamic back-out removes any uncommitted updates.

The RECON conditioning steps are shown in Figure 5.

Figure 5. Manually Condition the RECON Data Set (Remote Site Activity)
Image shows flagging primary IC as invalid, cose and archive the missing open OLDS, and flag subsys recores for recovery

IBM Tools IMS Recovery Pack (TSR to IIC)

IMS Tools Timestamp Recovery to IICs Only

With TR, it is necessary to create an RP that is consistent for the databases being recovered. An RP is a period of time when the database is not allocated. Generally, the ALLOC records in the RECON data set can be analyzed to determine periods of time when the database is not allocated. An RP can be created for a database by taking it offline with a /DBR command or the equivalent UPDATE DB STOP(ACCESS) command, or with a /DBD command or the equivalent UPDATE DB STOP(UPDATES) command. In IMS 11, the DB QUIESCE function provided the ability to create an RP without taking the database data sets offline. This function is provided by the UPDATE DB START(QUIESCE) command, and it allows transaction activity to be paused at commit points. In the RECON data set, these DB QUIESCE RPs are indicated with ALLOC records that have been updated with a DEALLOC time and the QUIESCE flag turned on. A new ALLOC record will be written when the database data set is updated again after the DB QUIESCE recovery point.

In this IMS Tools disaster recovery scenario, an RP is created using the IMS 11 DB QUIESCE command with the default NOHOLD option, which pauses transaction activity at commit points. As soon as the RP is created by IMS, the transactions are released to continue processing. In this disaster recovery scenario, the IMS Recovery Solution Pack and High Performance Pointer Checker products are used. For instance, after the RP is created, the Recovery Point Identification (RPID) function is used in the DRF/XF product. This function shows the RP time spans for the set of databases specified. This is illustrated in Figure 6.

Figure 6. IMS DRF/XF Recovery Point Identification (RPID) (primary site activity)
Image shows that recovery time spans comman to all entries in the DBLIST

The DRF product is used to create the IIC. An IIC is created using an offline utility without taking the databases offline or affecting IMS (outside of registering the IIC with DBRC). Effectively, a TR is performed for one or more databases using an image copy (or IIC) as input along with one or more archived logs. The value for the OUTPUT parameter specified to DRF is ICR, indicating that the resulting data set is an IIC. An IIC is itself a stand-alone image copy, meaning that it contains a copy of the entire database data set, and it does not need to be combined with other IIC or IC data sets when needed for recovery. It is necessary to use the DRF product to restore an IIC. In this disaster recovery scenario, the last archived log used to create the IIC ended in an RP created by the IMS 11 DB QUIESCE command, so the IIC is registered to DBRC as a batch IC. If the archived log had not ended in an RP, the IIC would have been registered to DBRC as a concurrent IC. This is important to this disaster recovery scenario because it means that a TR can be performed at the remote site using only IIC data sets. The creation of the IIC by the DRF product is shown in Figure 7.

Figure 7. IMS DRF Incremental Image Copy (primary site activity)
Image shows creating incremental IC, DRF master address space, recovery sort subspace

(View a larger version of Figure 7.)

It is still necessary to condition the RECON data set for it to be used at the remote site. In this case, the RECON Cleanup (RCU) function of the IMS DRF/XF product, which is included in the IMS Recovery Solution Pack, is used to make a copy of the RECON and condition it, too. It is necessary to make sure the conditioned copy of the RECON created by RCU gets sent to the remote site. This is done at the primary site after the IIC is created. This is shown in Figure 8.

Figure 8. IMS DRF/XF RECON Cleanup (RCU) (primary site activity)
Image shows cleaning up Recon Backup (IMS DRF/XF) (RCU) (primary)

At the remote site, a TR can be performed immediately using only the IIC data sets and conditioned RECON since the RECON data set was conditioned just after the IICs were created at the primary site. The TSR is performed in parallel for all of the database data sets in the recovery group by the IMS DRF product using only the IIC data sets. This is shown in Figure 9.

Figure 9. IMS DRF Timestamp Recovery to Incremental Image Copies (Remote Site Activity)
Image shows IMS DRF Timestamp Recovery to Incremental Image Copies (primary)

(View a larger version of Figure 9.)

After the database data sets are restored using TSR, the indices are rebuilt using IMS Index Builder, an image copy is created using IMS High Performance Image Copy (HPIC), and pointer-checking is performed using IMS High Performance Pointer Checker. IMS is cold-started following the recovery of the databases and the recreation of the RDS data set since there are no uncommitted updates to back out with dynamic back-out following a TSR to an RP.


Conclusion

This tutorial shows how disaster recovery could be accomplished using backup resources like image copies, change accumulation data sets and, in some cases, the archived log data sets. There were three IMS Base and one IMS Tools Disaster Recovery solutions discussed, with examples showing several types of timestamp recovery and full database recovery. Each solution in this tutorial requires a certain amount of RECON conditioning.

In Part 3, you will learn how the IMS Tools product, IMS Recovery Expert, can automate many of the procedures, including the conditioning of the RECON data set. The IMS Recovery Expert product introduces the recovery resource called System Level Backup (SLB), which is used for disaster and local application recovery, significantly reducing the number of image copies needed at the production and remote sites.


Downloads

DescriptionNameSize
IMS Disaster Recovery Demonstrations1IMS_Backup_and_Recovery_Demos.zip3330KB
Download Instructions for Demonstrations2IMSBackupandRecoveryDemoInstructions.pdf768KB

Notes

  1. This download is the same for all parts of this article/tutorial series.
  2. This PDF file is the same for all parts of this series. If you have downloaded it for a previous article or tutorial in the series, there is no need to download it again.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=809816
ArticleTitle=Exploring IMS disaster recovery solutions, Part 2: IMS Base and IMS Tools recovery solutions
publish-date=04122012