Optimizing parallel apply processing and PSB scheduling
To optimize parallel processing, ensure that the target server and IMS are configured to support the maximum number of program specification blocks (apply PSBs) that your subscriptions are designed to use.
About this task
You specify the maximum number of apply PSBs that IMS can schedule by the target server by specifying the MAXTHRD parameter when you set up the interface for the database resource adapter (DRA).
You use the SCHEDULEBEHAVIOR configuration parameter to identify how you want the writer services to manage apply PSBs. This parameter sets the number of PSBs that are concurrently scheduled and how long a PSB remains scheduled. It affects the DRA MAXTHRD setting and the sizes of the IMS work pools that you need to configure to support IMS replication.
SCHEDULEBEHAVIOR has these setttings:
- 0 (default)
- IMS Replication acts similarly to a CICS® application. Apply PSBs are scheduled on demand and used to process a single UOR after which they are unscheduled. When you use this model, the DRA MAXTHRD value must be set to the value of the apply service MAXWRITERTHREADS configuration parameter plus the number of subscriptions. Extra threads above MAXWRITERTHREADS are required because the apply service must schedule an apply PSB during start of replication for a subscription to obtain bookmark information and for PCB validation purposes.
- 1
- This setting is like using IMS resident PSBs and allows the writer services to schedule the apply PSB limit (Maximum Parallel Apply PSBs) for each subscription.
When a writer service schedules an apply PSB, the PSB remains scheduled until replication ends for the subscription. Apply PSBs are scheduled on demand based on workload and parallelism characteristics. Once scheduled, any writer service can use an idle PSB to apply work for that subscription.
When you use this model, the DRA MAXTHRD setting must be set to the sum of the Maximum Parallel Apply PSBs values for all subscriptions if MAXWRITERTHREADS is set to a value greater than or equal to the largest apply PSB value. Setting MAXWRITERTHREADS to a value less than one or more apply PSB values limits the number of apply PSBs that are scheduled. In this configuration there is no need to reserve DRA threads for start of replication validation processing. The target IMS work pools must be configured to support these permanently scheduled PSBs.
- 2
- This setting creates a compromise configuration that allows the number of apply PSBs to increase
during peak processing periods where they remain scheduled, and then when the workload subsides
apply PSBs are unscheduled after they have been idle for a period.
The target IMS work pools and DRA MAXTHRD values must be configured to support the maximum load possible. Idle periods might occur if the source becomes idle and after a period the apply PSBs are unscheduled. During these idle periods, you can perform target side recovery operations while replication is still active, provided there is no source workload flowing through the system for the subscriptions that reference those databases.
The goal is to unschedule an apply PSB after it has not been used to apply changes to the target database for a certain period. The value provided on the PSBIDLETIMEOUT parameter is a polling frequency that identifies how often checks are made to see if an apply PSB has been idle. The PSBIDLETIMEOUT parameter also identifies the amount of time an apply PSB must be idle before it is eligible to be unscheduled, so it is possible that one of these PSBs could be idle for almost two times the PSBIDLETIMEOUT value before an unschedule occurs.
This approach does not consume a lot of CPU cycles. Given that the target IMS must be configured to support the peak load, freeing up IMS resources is not that high a priority. You can set a larger PSBIDLETIMEOUT value while replication is most active and then reduce the PSBIDLETIMEOUT after peak load processing is complete to allow apply PSBs to be unscheduled faster.
You can also create more intricate behaviors because the SCHEDULEBEHAVIOR and PSBIDLETIMEOUT values are dynamic and can be changed while replication is active. You can transition to and from dynamic scheduled (SCHEDULEBEHAVIOR 0) and leaving apply PSBs scheduled indefinitely (SCHEDULEBEHAVIOR 1) or somewhere in between (SCHEDULEBEHAVIOR 2).
Regardless of which method of apply PSB management you choose, ensure that the DRA MAXTHRD value is equal to or larger than the number of apply PSBs that IMS Replication attempts to schedule concurrently. When a schedule request is issued and the MAXTHRD PSBs are already scheduled, IMS does not return control from the schedule request until an unschedule occurs for an existing apply PSB.
The value that you set for Maximum Parallel Apply PSBs when you create or modify a subscription identifies the maximum number of apply PSBs that the subscription can use to apply changes in parallel.
Consider the Maximum Parallel Apply PSBs value as a theoretical maximum. Several factors can determine whether the system can schedule the maximum number of PSBs that are specified and how frequently that occurs. If the maximum value is greater than MAXWRITERTHREADS, then the practical limit is the MAXWRITERTHREADS value because each writer thread only processes a single UOR at a time.
If you have multiple subscriptions and set the Maximum Parallel Apply PSBs for each subscription to the MAXWRITERTHREADS value, dependency analysis tries to keep all of the writer threads busy and all of the subscriptions roughly point-in-time consistent. If multiple UORs need to be applied for a subscription and writer threads are available but dependencies exist between the UORs, then some delay occurs because updates to the same resource must be applied in chronological order to preserve integrity.
Dependency analysis uses different levels of granularity depending on how many updates are in the source UORs that need to be applied at the target server. If there are only a few updates, dependency analysis is done at the database and key level. For DEDB databases each area is considered a different database from a dependency analysis perspective.
If a UOR contains too many updates, dependency analysis performs only database- or area-level analysis, and potentially just serializes the UOR if too many databases or areas were updated in a single unit-of-recovery. Use the apply service DEPGRAPHUORLIMIT, DEPGRAPHHASHSZ, and DEPGRAPHKEYS configuration parameters to control how dependency analysis works.
To improve throughput you can also potentially group multiple source UORs into a single target UOR in the source server by specifying a value larger than 1 for the capture service UORGROUPCOUNT configuration parameter. This parameter identifies the maximum number of changes that can be grouped together into a single target UOR. Unless you are processing historical changes, it is difficult to get multiple source UORs grouped into a single target UOR because grouping operations are terminated when the capture cache for a subscription contains no more committed changes or when subscriptions have different apply requirements.
In theory, grouping multiple source UORs into a single target UOR reduces the cost of applying those source changes. A certain amount of overhead is associated with processing a target UOR, which increases if each target UOR requires scheduling an apply PSB. Often attempting to group multiple source UORs into a single target UOR has a negative impact on parallelism because of key overlaps, and often you can only group two to four source UORs together and still be able to apply changes for that subscription in parallel.
If you are using a non-zero SCHEDULEBEHAVIOR value, you might find that grouping with its general loss in parallelism is not worthwhile because the source changes can be applied without requiring an apply PSB schedule and unschedule operation.
Recommendations
Most sites have a requirement where the system must be able to process some number of (typically) historical changes at some average rate that is faster than work normally occurs in the source system. This allows replication to catch up to real time in cases where there is a planned or unplanned outage.
Ideally you have an idea of the average source UOR size and the goal rate and simply divide the goal rate by the average source UOR size to identify the number of apply PSBs that need to be used to achieve that goal. Often it is more complicated. The number of updates that are performed by application programs is not uniform, but ideally you can come up with a rough estimate of the number of apply PSBs that are required for a subscription. If that number is less than the MAXWRITERTHREADS limit, using this limit as an initial starting point and performing subsequent testing should indicate whether that number needs to be increased or decreased.
If the estimated number of apply PSBs that are required for catch-up exceeds the MAXWRITERTHREADS limits, you could try testing anyway to see how well things perform. Based on these tests you might determine that it is necessary to create a second target server and divide the subscription into two or more subscriptions so that more changes can be applied in parallel to increase the total system throughput rate.
IMS is very efficient at scheduling PSBs, but there is a cost associated with each schedule and unschedule operation that is charged to both IMS address spaces and the IMS Replication target server address space. Using the SCHEDULEBEHAVIOR configuration parameter you have some control over how often schedules and unschedules occur.
Using the default SCHEDULEBEHAVIOR (0) is appropriate if you have many subscriptions and each subscription contains a small number of databases. In this kind of configuration, using default scheduling makes IMS Replication act similarly to a high-volume CICS OLTP application and reduces the work pool requirements on the target IMS because apply PSBs are demand scheduled.
If you have one or two subscriptions and at least one subscription contains many databases, using a SCHEDULEBEHAVIOR of 1 is more cost effective than using demand scheduling. This option most likely requires the most IMS work pool resources because the number of apply PSBs that are scheduled increases to the maximum parallelism that can be achieved and stays at that limit.
Using a SCHEDULEBEHAVIOR of 2 represents a middle ground that saves PSB scheduling costs when the workload ramps up and then releases IMS resources when activity slows. Setting the PSBIDLETIMEOUT value to one minute causes apply PSBs to be unscheduled in the one- to two-minute time frame when the source workload starts to slow down.
Using a SCHEDULEBEHAVIOR of 2 is recommended when you have one or two subscriptions and might be an appropriate choice if you have more subscriptions, provided that the target IMS has been configured to support the larger number of apply PSBs that can be scheduled during peak workloads.
Special considerations when SCHEDULEBEHAVIOR is non-zero
If an apply PSB that references full-function databases attempts to issue an IMS database recovery command (/DBR) that references one of these databases, the command fails if a PSB is currently scheduled that references that database. Because the main intent of a non-zero SCHEDULEBEHAVIOR is to reduce the number of times that the apply PSB must be scheduled, attempts to issue a /DBR command for a full-function database that is referenced by the apply PSB are expected to fail while replication is active and an apply PSB is scheduled.
When you stop replication for a subscription, or replication fails because of an error, any apply PSBs that are scheduled at that time should be unscheduled. If for some reason an apply PSB is not unscheduled, you can use the IMS /STOP REGION ABDUMP command to cause IMS to remove the PSB.
If you issue a /STOP REGION command against a scheduled apply PSB while replication is active, replication fails in error the next time there is an attempt to use the PSB that is associated with the region number that was stopped. The following message is issued twice:
CECI0025E An error occurred during IMS DRA processing. The IMS PAPLRETC=1C.
Replication ends in error for the subscription. You can simply restart replication because all tracking data for the apply PSBs (including the one that was stopped) was removed when replication failed, causing new apply PSBs to be scheduled when replication resumes.