Deciding on monitoring activities and techniques
To develop a master plan for monitoring and analyzing performance, you should determine whether dynamic, daily, or detailed monitoring suites your needs best and develop a plan accordingly.
When you develop a master plan for monitoring and analyzing performance, you should establish:
- A master schedule of monitoring activity
Coordinate monitoring with operations procedures to allow feedback of online events to be incorporated into instructions for daily or detailed data gathering.
- Which tools are to be used for monitoring
The tools used for data gathering should provide for dynamic monitoring, a daily collection of statistics, and more detailed monitoring.
- The kinds of analysis to be performed
Document what data is to be extracted from the monitoring output, identifying the source and usage of the data. Although the formatted reports provided by the monitoring tools help organize the volume of data, you should probably design worksheets to assist in data extraction and reduction.
- A list of the personnel that are to be included in any review
of the findings
The results and conclusions from analyzing monitor data should be made known to the user liaison group and to system performance specialists.
- A strategy for implementing changes to the online IMS system design resulting from
tuning recommendations
Coordinate this change implementation strategy with your installation's standards for testing and standards for frequency of production environment changes. Change management is described in Modifying the system design.
Plan for three broad levels of monitoring activity:
- Dynamic
Observe the system's operation continuously to discover any serious short-term deviation from performance objectives.
The output from the /DISPLAY or QUERY command is suitable for this level of monitoring, together with end-user feedback. One use of the Resource Measurement Facility (RMF) II is to collect information about processor, channel, and I/O device utilization.
The MTO is an important source of information about the behavior of the online IMS system. An important part of MTO feedback is the set of conditions during an IMS Monitor run. This information can help establish the validity of the monitor data.
With the status information that can be obtained using the /DISPLAY or QUERY command, you can arrange to get a processing status during online execution. The status can include the queue levels, active regions, active terminals, and the number and type of conversational transactions. Such a status can be obtained with the aid of an automated operator program invoked by the MTO. At prearranged milestones in the production cycle—such as before scheduling a message or BMP region, at shutdown of part of the network, or at peak loading—the transaction processing status and measures of system resource levels can be recorded.
To dynamically monitor paging rates, use one of the following methods:
- Total System Paging Rate
- The RMF Monitor II Paging report gives snapshots of paging activity by sampling interval.
- Paging Rate by User
- The RMF Monitor II Address Space Resource Data report gives counts of allocated frames and page faults by address space for each RMF measurement interval. The page fault count is cumulative and includes both local address space page faults (not page I/Os) and CSA/LPA.
If the monitoring indicates that IMS is experiencing temporary delays, it might be possible to stop some TSO or batch initiators.
To monitor processor resources, use the following:
- Total System Utilization
- The RMF CPU Activity report gives WAIT TIME PERCENTAGE for the RMF report interval, which might typically be 10 to 30 minutes.
- Processor Over commitment
- RMF Monitor II Real Storage/Processor/SRM report gives percentage processor utilization for each RMF sampling interval. It shows a 101% figure if any address space is ready to be dispatched but must wait for processor cycles during the interval.
- Daily
Measure and record key system parameters daily. Record both the daily average and the peak period (usually one hour) average. Compare against major performance objectives and look for adverse trends.
This data usually consists of counts of events and gross-level timings. In some cases, the timings are averaged for the entire IMS system, for example, elapsed times for input queuing or program execution.
You can use the IMS system log as input to offline processing to produce statistics on a daily or regular basis. Two utilities, IMS Log Transaction Analysis and IMS Statistical Analysis, are suitable for this level of monitoring, because they impose no additional processing load on the online system.
If the analysis of dynamic RMF Paging Activity reports shows that paging I/O has increased, this must be related to some changes in the workload during the corresponding period. More detailed analysis is usually required to decide on tuning actions.
If analysis of the RMF CPU Activity reports show consistently high CPU utilization during a period of poor response times, examine the elapsed and processor times for transactions by IMS message region. If the elapsed time/processor ratio is significantly higher in the lower-priority regions, raise their dispatching priorities relative to other non-IMS work.
Related reading: If all regions are equally affected, see Minimizing path length for information on reducing path lengths. If appropriate, adjust priorities of competing work to move it below IMS message regions.
To monitor utilization by subsystem for processor over commitment, use the RMF Monitor II Address Space State Data report. This report gives processor units consumed by each address space for each RMF measurement interval. IMS control and message regions can be identified by job name.
- Detailed
Periodically collect detailed statistics on system operation for performance analysis against system-oriented objectives and workload profiles.
Data at this level is much more voluminous. It typically contains sequences of events and tabulations. The timings reported are at a detailed level.
At this level of monitoring, special trace tools such as the IMS Monitor and Generalized Trace Facility (GTF) are useful. They collect a detailed sample of the online processing and distinguish between activity in dependent regions, asynchronous processing for terminals and message queues, buffer pool usage, and system data set I/O.
Additional information on using these monitoring tools is included in other topics of this section. The use of monitoring and tools to detect performance problems is explained in Identifying and correcting performance problems.
You can use the following methods for detailed monitoring of paging rates:
- Paging Rate by User
- If analysis at the daily level is insufficient, plan to run a
GTF trace. If multiple page data sets are being used, private and
global paging can be identified. Examine the GTF Detail Trace report
to evaluate the impact on IMS of
any paging by calculating the elapsed time delays due to page faults,
particularly for the control region. The GTF Page Fault Summary report
can be used to discover which area of IMS or
system code is being affected by page faults. The report also indicates
the type of page faults.
Analyze this type of data to help you decide whether to tune real storage usage.
Related reading: For other factors to consider when tuning real storage usage, see Trade-offs between I/O controlled by IMS and paging.
- Examining NOT-IWAIT Time during Scheduling/Termination
- The IMS Monitor Region Summary
report can help you make an initial assessment of dispatching priority
problems by examining the Scheduling or DL/I NOT-IWAIT times, that
is, the elapsed time not accounted for by IWAIT time. Increases in
the NOT-IWAIT times can be caused by:
- Paging delays
- Dispatching for a higher-priority task
If minimal paging is occurring, the portion of elapsed wait time that occurs during the scheduling and termination of a region is fairly consistent from system to system. This elapsed wait time is not accounted for by IWAIT time. This time is recorded in the IMS Monitor Region Summary report and tabulated under the heading NOT-IWAIT TIME. DL/I call NOT-IWAIT times can also include dynamic logging I/O delays. Any delays caused by paging or dispatching for a higher-priority task result in an increase in the NOT-IWAIT times.
If the total mean NOT-IWAIT TIME is excessive, the machine resource is probably inadequate for IMS. If no higher-priority tasks are present, the cause is probably a high paging rate for IMS scheduler code, control blocks, and PSB pool.
The SVC Mapping Summary can assist in determining where an SVC is being issued.
Related reading: For more information on reducing path lengths, see Minimizing path length.