Central Monitoring Function

Enter the subcommand FCONRMT in Performance Toolkit basic mode to invoke the remote performance monitoring facility. You will see an initial menu similar to the following:

Figure 1. System Load Overview Screen
 FCX198         Performance Toolkit Remote Monitoring Facility     Remote Data       

 Sel Node-ID   Time   ------------- Exceptions & CPU Load -------------> AvExcp
  _  EHNIVM0A  15:12   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :    .00
  _  WALTERVM  15:12   >>>>>>>>>>>>>>>>>====                           :    .04
  _  BREUVM    15:12   %SPSL>>>>>>>>>>>>>>>>=====                      :    .08
  _  DUFAVM1   15:12   >>>>>>>>>>>>>                                   :      0
  _  EHNIVM01  15:12   Cache-Sub>>>>>>>>>>>>>>>>>>>>                   :   1.05
  _  STUTVM1   15:12   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>        :    .35
  _  SDFVM003  15:12   Usrlimit>>>>>>>>>>>>>>>>>>>>>>>>>               :    .44
  _  CHVM1     15:12   CP-Read>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           :   1.74
  _  VIEVMA    15:12   C1ES,_Usrlimit>>>>>>>>>>>>>>>>>>>>>             :   3.52
  _  GERMAZ01  15:12   Usrlimit>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     :    .40
  _  GERMAZ06  15:12   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                  :    .47
  _  BOEVM3    15:12   >>>>>>>>>>>>>>>>>>>>>>========                  :    .14
  _  SDFVM2    15:12   WSS-Loop>>>>>>>>>>>>>>>>>>>                     :    .56
  _  HERVM1    15:12   >>>>>>>>>>>>>>>>>>>>>>>>>                       :    .00
  _  BOEVMN    03:50  Not updated for  683 min.                        :    ...
  _  STUTVM4   15:12   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           :    .89
  _  GERMAZ05  15:12   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       :    .53

 Select  e: exception log  h: history data  m: monitor
 Command ===>
 F1=Help  F4=Top  F5=Bot  F7=Bkwd  F8=Fwd  F12=Return
The contents of the initial menu depend on the system definitions you prepared in the file FCONRMT SYSTEMS: one line will be shown for each of the defined systems, where
Sel
is the column where you can enter further selection commands. Enter
e
to display the accumulated exception log file for the selected system
h
to display the history data overview for the selected system
m
to display the accumulated summary performance data for the selected system (same data as on performance REDISP screen of the system). The performance monitor MENU screen will be shown initially when an APPC/VM session is established to the target system (controlled by an APPC/VM nickname entry appended to the system's FCONRMT SYSTEMS entry).
Pressing ENTER without inserting one of the above characters in the Sel column corresponds to either an
  • 'e'-selection when exceptions are currently indicated in the load bar of the system,
  • 'm'-selection when no exceptions are indicated at the moment.
Node-ID
is the RSCS node-ID of the VM system
The remaining fields are intended for the remote monitoring function only. They will not provide useful data unless the remote systems have been set up to continuously send performance information to this central monitor machine (command 'FC MONCOLL REMSEND ...', see Implementing Remote Performance for details).
Time
is a time stamp in the format hh:mm which indicates the time, in hours and minutes, when the last summary exception data record was received from the remote system. Note that the time stamp used is generated on this system so you can easily see whether an entry is being continuously updated. See the initial summary performance data display (selection 'm') for time stamps with the actual remote system's time.
Exceptions & CPU Load
will show:
  • The CPU load on the system, indicated by a bar of greater than (>) signs (displayed with reverse video on displays which support extended highlighting). The length of the bar is proportional to the CPU load: 100% CPU load are indicated by a bar extending up to the end of the arrowhead (--->) in the header line. The number of greater than (>) signs is an indication of the real CPU load of the system. For VM systems which are running in an LPAR or second level, the 'logical' load (which includes suspended time) may be higher: The difference will then be indicated by a corresponding number of equal (=) signs on the right of the load bar.
  • The names of performance variables which have exceeded the limits set on the remote system.
The CPU load bar will be overlaid by exception information. The inserted exception names can be
  • Any redisplay variable name used with the FC LIMIT command
  • The following expressions:
    I/O cont
    indicating at least one I/O device has exceeded the limit set for I/O contention
    I/O resp
    indicating the response time for at least one I/O device has exceeded the set limit
    CH cont
    indicating at least one channel of a VM system has exceeded the set contention limit
    CH busy
    indicating at least one channel of a VM system has exceeded the set 'channel busy' threshold
    INT miss
    indicates a missing interrupt for at least one I/O device
    RSRV pend
    indicates a RESERVE is pending for a device which has not run any I/O operations during the last measuring interval
    CPU-loop
    indicates at least one user is in a CPU loop
    I/O-loop
    indicates at least one user is in a I/O loop
    WSS-loop
    indicates at least one user appears to be in a general loop, with a practically constant working set size
    Usrlimit
    indicates at least one user has exceeded one of the thresholds set for user resource consumption (CPU, I/O rate, or UR I/O rate).
  • The string 'none' if no exceptions have been found, and if the CPU load is lower than the minimum that can be shown
  • The string 'Collect Error' if a data collect problem occurred on the remote system
  • The string 'no data received' if no data have been received so far from the remote system
  • The string 'not updated for nnn min.' when a remote system has ceased transmitting updates for a period of at least 10 minutes
AvExcp
is the average exception severity code since the last RESET. The summary counters are automatically reset at midnight. Other reset times can be specified using the 'FC MONCOLL RESET ...' command.

Format: Averages will be shown as a single '0' if no exceptions have been logged since the last RESET. They will be shown as a number with one or two decimals if at least one severity code > 0 has been found, that is, decimals always indicate that exception log data are available for display, even if the value shown is '.00'.

The following colors will be used for the data in the Exceptions & CPU Load and AvExcp fields:
green
for an exception severity code of 0
red
for an exception severity code of 1 or 2
pink
for an exception severity code of 3 or 4
yellow
for an exception severity code of 5 or 6
wpresse
for an exception severity code of more than 6
The severity codes are determined by dividing the sum of the 'weight' factors of all exceptions by 10. It is the system programmer's responsibility to set these weight factors so that color changes match the importance that your installation places on the occurrence of specific exceptions. See the FCONTROL FORCEUSR, FCONTROL LIMIT, and FCONTROL USRLIMIT subcommands in the z/VM: Performance Toolkit Reference for more information on how to specify 'weights' for the different exception messages.

Usage notes: This is essentially a centralized exception monitoring facility. It will automatically show exceptions detected by any of the remote systems: New system exception data will be inserted into the initial screen in 1-minute intervals, or when the screen is updated following a command entered, whichever occurs first.

Different colors will be used for displaying the Exceptions & CPU Load data fields, depending on the exception code found. So, provided that you have suitably set up threshold and user monitoring in your remote systems, you need now only watch the initial systems overview screen for highlighted status field information to be alerted in case of problems.

For any of the systems where exceptions have been logged, you can then:
  • View the exception log for the system. Insert an 'e'' in front of the corresponding system and press ENTER. You will be shown the system's Exception Log Display with the day's exception messages for the system.
  • View the collected summary performance data for the system: insert an 'm' in front of the corresponding system on the initial menu. The Remote Performance Log Display for the system will then be shown.
  • View previously collected performance data for systems where the append flag has been set to 'Y': insert an 'h' in front of the corresponding system on the initial menu. The History Data Selection Display will be displayed.
See the description on the following pages for more detailed information on each of these displays.
Once you have selected a specific system for which to display data, by either the 'e', 'h', or 'm' selection characters, you need not return to the initial menu for selecting another type of data for the same system: use the commands
EXCept
to display the system's exception log
HIStory
for displaying performance history data (previously retrieved performance data displays)
MONitor
or any valid data retrieval command to display either the system's performance log, or the retrieved performance data display
Entering a QUIT command or pressing the corresponding PF-key, will bring you back to the initial systems overview display, where you can select any of the other systems for displaying data. You can also switch directly between different systems by just entering another system's node-ID.