Situations
This chapter describes the predefined situations of the product.
Overview of situations
- Definition of a predefined situation
-
A situation is a logical expression involving one or more system conditions. IBM Z OMEGAMON AI for Storage uses situations to monitor the systems in your network. To improve the speed with which you begin using IBM Z OMEGAMON AI for Storage, the product provides situations that check for system conditions common to many enterprises. You can examine and if necessary, change the conditions or values being monitored to those best suited to your enterprise. Be sure to start the situations that you want to run in your environment.
- Using situations
-
You manage situations from the Tivoli® management portal using the Situation editor. Using the Situation editor you can perform the following tasks:
- Create a situation
- Save a situation
- Display a situation
- Edit a situation
- Start, stop, or delete a situation
- Investigate the situation event workspace for a situation
When you open the Situation editor, the left frame initially lists the situations associated with the Navigator item you selected. When you click a situation name or create a new situation, the right frame of the Situation editor opens to provide the following information about the situation and allow you to further define that situation:
- Condition
- View, add to, and edit the condition being tested.
- Distribution
- View the systems to which the situation is assigned and assign the situation to systems.
- Expert advice
- Write comments or instructions to be read in the situation event workspace.
- Action
- Specify a command to be sent to the system.
You can also specify a Storage Toolkit request to be run when a situation becomes true if IBM Z OMEGAMON AI for Storage is installed and a storage table is enabled for Storage Toolkit commands.
- Until
- Reset a true situation when another situation becomes true or a specified time interval elapses.
Predefined situations descriptions
The following predefined situations are included in the IBM Z OMEGAMON AI for Storage product.
KS3_Applic_Resp_Time_Critical
If VALUE S3_Application_Monitoring.High_Dataset_MSR GE 50
Monitors
the response time components to determine the reason for a poor response
time when an application is accessing a dataset and the response
time is greater than the critical threshold. Also examine the volume
for over-utilization, cache settings, and the response time components
at the volume level.KS3_Applic_Resp_Time_Warning
If VALUE S3_Application_Monitoring.High_Dataset_MSR GE 40 AND
VALUE S3_Application_Monitoring.High_Dataset_MSR LT 50
Monitors
the response time components to determine the reason for a poor response
time when an application is accessing a dataset and the response
time is greater than the warning threshold. Also examine the volume
for over-utilization, cache settings, and the response time components
at the volume level.KS3_Cachecu_Cache_Stat_Critical
If VALUE S3_Cache_Control_Unit.Cache_Status NE Active
Monitors
for the condition where caching is not active for the control unit.
Use the SETCACHE command to activate caching, if appropriate.KS3_Cachecu_DFW_Retry_Critical
If VALUE S3_Cache_Control_Unit.DFW_Retry_Percent GE 2
Monitors
for the condition where the percent of DASD fast write attempts that
cannot be satisfied because a shortage of available nonvolatile storage
(NVS) space exceeds the critical threshold. Check for pinned NVS and
correct the problem if NVS is pinned. Otherwise, if the impact on
performance is not acceptable, you need to move a volume or dataset
to another cache control unit or add NVS to this control unit.KS3_Cachecu_DFW_Retry_Warning
If VALUE S3_Cache_Control_Unit.DFW_Retry_Percent GE 1 AND
VALUE S3_Cache_Control_Unit.DFW_Retry_Percent LT 2
Monitors
for the condition where the percent of DASD fast write attempts that
cannot be satisfied because a shortage of available nonvolatile storage
(NVS) space has exceeded the warning threshold. Check for pinned NVS
and correct the problem if NVS is pinned. Otherwise, if the impact
on performance is not acceptable, move a volume or dataset to another
cache control unit or add NVS to this control unit.KS3_Cachecu_Inact_Vols_Critical
If VALUE S3_Cache_Control_Unit.Deactivated_Volumes GE 15
Monitors
for the condition where the number of deactivated volumes on the control
unit exceeds the critical threshold. You can use the SETCACHE command
to activate caching on the volumes, if necessary.KS3_Cachecu_Inact_Vols_Warning
If VALUE S3_Cache_Control_Unit.Deactivated_Volumes GE 10 AND
VALUE S3_Cache_Control_Unit.Deactivated_Volumes LT 15
Monitors
for the condition where the number of deactivated volumes on the control
unit exceeds the warning threshold. You can use the SETCACHE command
to activate caching on the volumes, if necessary.KS3_Cachecu_NVS_Stat_Critical
If Value_S3_Cache_Control_Unit.NVS_Status NE Active
Monitors
for the condition where nonvolatile storage is not active for the
control unit. All writes to volumes on the control unit are written
directly to the hard disk drive. Use the SETCACHE command to activate
NVS (nonvolatile storage), if appropriate.KS3_Cachecu_Read_HitP_Critical
If VALUE S3_Cache_Control_Unit.Read_Hit_Percent LE 50 AND
VALUE S3_Cache_Control_Unit.Read_Hit_Percent GT 0
Monitors
for the condition where the percent of read I/O requests resolved
from cache has fallen below the critical threshold. If performance
is a problem, look for volume with a low read hit percent and consider
moving them to another control unit to balance the load. This condition
can be caused by cache-unfriendly applications or a shortage of cache.KS3_Cachecu_Read_HitP_Warning
If VALUE S3_Cache_Control_Unit.Read_Hit_Percent LE 60 AND
VALUE S3_Cache_Control_Unit.Read_Hit_Percent GT 50
Monitors
for the condition where the percent of read I/O requests resolved
from cache has fallen below the warning threshold. If performance
is a problem, look for volume with a low read hit percent and consider
moving them to another control unit to balance the load. This condition
can be caused by cache-unfriendly applications or a shortage of cache.KS3_Cachecu_Trk_Dstg_Critical
If VALUE S3_Cache_Control_Unit.Track_Destaging_Rate GE 70
Monitors
for the condition where the rate at which tracks are being removed
from cache and written to DASD exceeds the critical threshold. If
performance is being impacted, you need to migrate datasets or volumes
to another cache control unit. An alternative is to increase the cache
capacity.KS3_Cachecu_Trk_Dstg_Warning
If VALUE S3_Cache_Control_Unit.Track_Destaging_Rate GE 50 AND
VALUE S3_Cache_Control_Unit.Track_Destaging_Rate LT 70
Monitors
for the condition where the rate at which tracks are being removed
from cache and written to DASD exceeds the warning threshold. If performance
is being impacted, you need to migrate datasets or volumes to another
cache control unit. An alternative is to increase the cache capacity.KS3_Cachecu_Trk_Stag_Critical
If VALUE S3_Cache_Control_Unit.Track_Staging_Rate GE 70
Monitors
for the condition where the movement of tracks from the physical device
to cache has exceeded the critical threshold. If performance is impacted,
you might need to move the logical volume that is causing the excessive
activity or to move datasets on the logical volume.KS3_Cachecu_Trk_Stag_Warning
If VALUE S3_Cache_Control_Unit.Track_Staging_Rate GE 50 AND
VALUE S3_Cache_Control_Unit.Track_Staging_Rate LT 70
Monitors
for the condition where the movement of tracks from the physical device
to cache has exceeded the warning threshold. If performance is impacted,
you might need to move the logical volume that is causing the excessive
activity or to move datasets on the logical volume.KS3_Cachecu_Write_HitP_Critical
If VALUE S3_Cache_Control_Unit.Write_Hit_Percent LE 45 AND
VALUE S3_Cache_Control_Unit.Write_Hit_Percent GE 0
Monitors for the condition where the
percent of DASD/Cache fast write commands that were successfully processed without accessing
the volume is below the critical threshold. If performance is impacted, you might need to move
a volume or dataset to another control unit to balance the workload.KS3_Cachecu_Write_HitP_Warning
If VALUE S3_Cache_Control_Unit.Write_Hit_Percent LE 50 AND
VALUE S3_Cache_Control_Unit.Write_Hit_Percent GT 45
Monitors for the condition where the
percent of DASD/Cache fast write commands that were successfully processed without accessing
the volume is below the warning level. If performance is impacted, you might need to move a
volume or dataset to another control unit to balance the workload.KS3_Channel_Busy_Pct_Critical
If VALUE S3_Channel_Path.Complex_Percent_Utilized GE 85
Monitors
high response time for I/O requests to volumes being serviced by the
channel due to over utilization of that channel. You might need to
balance the workload between channels by moving volumes or datasets.KS3_Channel_Busy_Pct_Warning
If VALUE S3_Channel_Path.Complex_Percent_Utilized GE 70 AND
VALUE S3_Channel_Path.Complex_Percent_Utilized LT 85
Monitors high response time for I/O
requests to volumes being serviced by the channel due to over utilization of that channel. You
might need to balance the workload between channels by moving volumes or datasets.KS3_FICON_Port_Frame_Pacing_W
Situation is true if the port is not a switch (if port is connected to a CU or channel path) and the average frame pacing is greater than 100.
This situation is written against the S3_FICON_Director_Ports attribute group.
KS3_FICON_Switch_Frame_Pacing_W
Situation is true if the port is not a switch (if port is connected to a CU or channel path) and the average frame pacing is greater than 100.
This situation is written against the S3_FICON_Director_Ports attribute group.
KS3_HSM_Backup_Held_Critical
If VALUE S3_HSM_Function_Summary.Function_Status EQ Held AND
VALUE S3_HSM_Function_Summary.Function EQ Backup
Monitors
the HSM backup function to see if it is being held. If the hold is
inadvertent, issue the HSM RELEASE BACKUP command to allow the backup
function to continue processing.KS3_HSM_Backup_Queue_Critical
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Backup
Monitors
the HSM backup queue for a condition where the number of backup requests
waiting exceeds the critical threshold. If the number of backup tasks
is not at the maximum, issue the HSM SETSYS MAXBACKUPTASKS command
to increase the number of backup tasks, thus increasing the processing
rate. Keep in mind that the number of available backup volumes serves
as a constraint on the number of active backup tasks.KS3_HSM_Backup_Queue_Warning
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 15 AND
VALUE S3_HSM_Function_Summary.Waiting_Requests LT 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Backup
Monitors
the HSM backup queue for a condition where the number of backup requests
waiting exceeds the warning threshold. If the number of backup tasks
is not at the maximum, issue the HSM SETSYS MAXBACKUPTASKS command
to increase the number of backup tasks, thus increasing the processing
rate. Keep in mind that the number of available backup volumes serves
as a constraint on the number of active backup tasks.KS3_HSM_CRQ_Element_Full_Warn
If VALUE S3_HSM_CRQplex.Element_Percent_Full GT 80
Monitors the percentage
of elements on the Common Recall Queue that are currently in use. HSM throttles the use of the
CRQ when the percent used reaches 95%. To expand the CRQ structure, issue the SETXCF
START,ALTER command.KS3_HSM_CRQ_Entry_Full_Warning
If VALUE S3_HSM_Cross_System_CRQplex.Entry_Percent_Full GT 80
Monitors the
percentage of entries on the Common Recall Queue that are currently in use. HSM throttles the
use of the CRQ when the percent used reaches 95%. To expand the CRQ structure, issue the SETXCF
START,ALTER command.KS3_HSM_CRQ_Host_Critical
If VALUE S3_HSM_Cross_System_CRQ_Hosts.HSM_Host_CRQ_State NE Connected
AND VALUE S3_HSM_Cross_System_CRQ_Hosts.CRQplex_Base_Name NE n/a
Monitors
the state of the host in regards to the Common Recall Queue. To connect
an HSM host to the CRQ, issue the HSM SETSYS command.KS3_HSM_CRQ_Host_Disconn_Crit
If VALUE S3_HSM_Cross_System_CRQplex.HSM_Hosts_Not_Connected GT 0
Monitors
the number of HSM hosts currently not connected to the Common Recall
Queue.KS3_HSM_CRQ_Host_Held_Critical
If VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Held EQ Yes
Monitors
the commonqueue status for this host. This condition can occur if
the HOLD COMMONQUEUE command has been issued. To resolve this condition,
issue a RELEASE COMMONQUEUE command.KS3_HSM_CRQ_Host_Place_Crit
If VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Place_Held EQ Internal OR
VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Place_Held EQ External OR
VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Place_Held EQ Both
Monitors
the commonqueue status for this host and whether requests can be placed
on the common recall queue. This condition can occur if the HOLD COMMONQUEUE(RECALL(PLACEMENT))
command has been issued or inferred because a HOLD COMMONQUEUE or
HOLD COMMONQUEUE(RECALL) was issued. To resolve this condition, issue
a RELEASE COMMONQUEUE(RECALL(PLACEMENT)) command.KS3_HSM_CRQ_Host_Recall_Crit
If VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Held EQ Internal OR
VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Held EQ External OR
VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Held EQ Both
Monitors
the commonqueue status for this host and whether requests can be recalled
from the common recall queue. This condition can occur if the HOLD
COMMONQUEUE(RECALL) command has been issued or inferred because a
HOLD COMMONQUEUE was issued. To resolve this condition, issue a RELEASE
COMMONQUEUE(RECALL) command.KS3_HSM_CRQ_Host_Select_Crit
If VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Select_Held EQ Internal OR
VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Select_Held EQ External OR
VALUE S3_Cross_System_HSM_CRQ_Hosts.Host_CRQ_Recall_Select_Held EQ Both
Monitors
the commonqueue status for this host and whether requests can be pulled
from the common recall queue. This condition can occur if the HOLD
COMMONQUEUE(RECALL(SELECT)) command has been issued or inferred because
a HOLD COMMONQUEUE or HOLD COMMONQUEUE(RECALL) was issued. To resolve
this condition, issue a RELEASE COMMONQUEUE(RECALL(SELECT)) command.KS3_HSM_Dump_Held_Critical
If VALUE S3_HSM_Function_Summary.Function_Status EQ Held AND
VALUE S3_HSM_Function_Summary.Function EQ Dump
Monitors
the HSM dump function to see if it is being held. If the hold is inadvertent,
issue the HSM RELEASE DUMP command to allow dump processing to continue.KS3_HSM_Dump_Queue_Critical
If VALUE S3_HSM_Function_Summary.Function_Status EQ Held AND
VALUE S3_HSM_Function_Summary.Function EQ Dump
Monitors
the HSM dump queue for a condition where the number of dump requests
waiting exceeds the critical threshold. If the number of dump tasks
is not at the maximum, use the HSM SETSYS MAXDUMPTASKS command to
increase the number of dump tasks, thus increasing the processing
rate. Keep in mind that the number of available tape drives serves
as a constraint on the number of active dump tasks.KS3_HSM_Dump_Queue_Warning
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 15 AND
VALUE S3_HSM_Function_Summary.Function EQ Dump AND
VALUE S3_HSM_Function_Summary.Waiting_Requests LT 50
Monitors
the HSM dump queue for a condition where the number of dump requests
waiting exceeds the warning threshold. If the number of dump tasks
is not at the maximum, use the HSM SETSYS MAXDUMPTASKS command to
increase the number of dump tasks, thus increasing the processing
rate. Keep in mind that the number of available tape drives serves
as a constraint on the number of active dump tasks.KS3_HSM_Inactive_Host_Warning
If VALUE S3_HSM_Status.Inactive_HSM_Hosts GT 0
Monitors
when an inactive HSM host has been detected. The event workspace for
this situation has a link to the DFSMShsm Host Details workspace.KS3_HSM_Migrate_Held_Critical
If VALUE S3_HSM_Function_Summary.Function_Status EQ Held AND
VALUE S3_HSM_Function_Summary.Function EQ Migration
Monitors
the migrate function to see if it is being held. If the hold on the
function is inadvertent, issue the HSM RELEASE MIGRATION command to
allow migration to continue.KS3_HSM_Migrate_Queue_Critical
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Migration
Monitors
the HSM migration queue for a condition where the number of migration
requests waiting exceeds the critical threshold. If the number of
migrate tasks is not at the maximum, use the HSM SETSYS MAXMIGRATIONTASKS
command to increase the number of migration tasks, thus increasing
the processing rate. Note that this affects only those migrations
requested by automatic functions. Only one task is available to process
command migration requests.KS3_HSM_Migrate_Queue_Warning
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 15 AND
VALUE S3_HSM_Function_Summary.Waiting_Requests LT 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Migration
Monitors
the HSM migration queue for a condition where the number of migration
requests waiting exceeds the warning threshold. If the number of migrate
tasks is not at the maximum, use the HSM SETSYS MAXMIGRATIONTASKS
command to increase the number of migration tasks, thus increasing
the processing rate. Note that this affects only those migrations
requested by automatic functions. Only one task is available to process
command migration requests.KS3_HSM_Recall_Held_Critical
If VALUE S3_HSM_Function_Summary.Function_Status EQ Held AND
VALUE S3_HSM_Function_Summary.Function EQ Recall
Monitors
the recall function to see if it is being held. If the hold on the
function is inadvertent, issue the HSM RELEASE RECALL command to allow
recalls to resume.KS3_HSM_Recall_Queue_Critical
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Recall
Monitors
the HSM recall queue for a condition where the number of recall requests
waiting exceeds the critical threshold. If the number of recall tasks
is not at the maximum, use the HSM SETSYS MAXRECAL LTASKS command
to increase the number of recall tasks, thus increasing the processing
rate.KS3_HSM_Recall_Queue_Warning
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 15 AND
VALUE S3_HSM_Function_Summary.Waiting_Requests LT 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Recall
Monitors
the HSM recall queue for a condition where the number of recall requests
waiting exceeds the critical threshold. If the number of recall tasks
is not at the maximum, use the HSM SETSYS MAXRECAL LTASKS command
to increase the number of recall tasks, thus increasing the processing
rate.KS3_HSM_Recovery_Held_Critical
If VALUE S3_HSM_Function_Summary.Function_Status EQ Held AND
VALUE S3_HSM_Function_Summary.Function EQ Recovery
Monitors
the recovery function to see if it is being held. If the hold on the
function is inadvertent, issue the HSM RELEASE RECOVER command to
allow recovery function to resume.KS3_HSM_Recovery_Queue_Critical
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Recovery
Monitors
the HSM recovery queue for a condition where the number of recover
requests waiting exceeds the critical threshold. If the number of
recovery tasks is not at the maximum, use the HSM SETSYS MAXDSRECOVERTASKS
command to increase the number of recover tasks, thus increasing the
processing rate. Keep in mind that the number of backup tape cartridges
serves as a constraint on the number of active recovery tasks.KS3_HSM_Recovery_Queue_Warning
If VALUE S3_HSM_Function_Summary.Waiting_Requests GE 15 AND
VALUE S3_HSM_Function_Summary.Waiting_Requests LT 50 AND
VALUE S3_HSM_Function_Summary.Function EQ Recovery
Monitors
the HSM recovery queue for a condition where the number of recover
tasks waiting exceeds the warning threshold. If the number of recovery
tasks is not at the maximum, use the HSM SETSYS MAXDSRECOVERTASKS
command to increase the number of recover tasks, thus increasing the
processing rate. Keep in mind that the number of backup tape cartridges
serves as a constraint on the number of active recovery tasks.KS3_HSM_Status_Inactive_Crit
If VALUE S3_HSM_Status.HSM_Status EQ InActive
Monitors
the status of the HSM. If status is not active, restart HSM.KS3_LCU_Av_Delay_Q_Critical
If VALUE S3_Logical_Control_Unit.Average_Delay_Queue GE 0.500
Monitors
for the condition where the average number of requests queued to devices
assigned to a logical control unit due to busy conditions on physical
paths has exceeded the critical threshold. If performance is impacted,
you might be able to balance the workload across multiple LCUs by
moving a volume or dataset. Otherwise, you need to add physical paths
to the LCU.KS3_LCU_Av_Delay_Q_Warning
If VALUE S3_Logical_Control_Unit.Average_Delay_Queue GE 0.2 AND
VALUE S3_Logical_Control_Unit.Average_Delay_Queue LT 0.500
Monitors
for the condition where the average number of requests queued to devices
assigned to a logical control unit due to busy conditions on physical
paths has exceeded the warning threshold. If performance is impacted,
you might be able to balance the workload across multiple LCUs by
moving a volume or dataset. Otherwise, you need to add physical paths
to the LCU.KS3_LCU_Cont_Rate_Critical
If VALUE S3_Logical_Control_Unit.Contention_Rate GE 1.001
Monitors
for the condition where the rate at which I/O requests are being queued
to devices on a logical control unit (LCU) due to busy conditions
on physical paths has exceeded the critical threshold. If performance
is impacted, you need to migrate volumes or datasets to another LCU,
otherwise, you need to add physical paths to the LCU.KS3_LCU_Cont_Rate_Warning
If VALUE S3_Logical_Control_Unit.Contention_Rate GE 0.2 AND
VALUE S3_Logical_Control_Unit.Contention_Rate LT 1.001
Monitors
for the condition where the rate at which I/O requests are being queued
to devices on a logical control unit (LCU) due to busy conditions
on physical paths has exceeded the warning threshold. If performance
is impacted, you need to migrate volumes or datasets to another LCU,
otherwise, you need to add physical paths to the LCU.KS3_LCU_IO_Rate_Sec_Critical
If VALUE S3_Logical_Control_Unit.Channel_Path_I/O_Rate GE 600
Monitors
for the condition where the I/O rate per second to volumes in the
logical control unit (LCU) has exceeded the critical threshold. If
performance is impacted, you need to balance the workload across multiple
LCUs by moving volumes or datasets.KS3_LCU_IO_Rate_Sec_Warning
If VALUE S3_Logical_Control_Unit.Channel_Path_I/O_Rate GE 200 AND
VALUE S3_Logical_Control_Unit.Channel_Path_I/O_Rate LT 600
Monitors
for the condition where the I/O rate per second to volumes in the
logical control unit (LCU) has exceeded the warning threshold. If
performance is impacted, you need to balance the workload across multiple
LCUs by moving volumes or datasets.KS3_RLS_Accelerated_Mode_Warn
If VALUE S3_RLS_Buffer_LSU_Summary.BMF_Accelerated_Mode_Pct GT 75
Monitors
for the warning condition where the BMF Accelerated Mode Pct is greater
than 75.KS3_RLS_Dataset_Avg_Resp_Time_C
If VALUE S3_RLS_Dataset_Group_Details.Average_Response_Time GT 3
Monitors
for the critical condition where the Average Response Time is greater
than 3.KS3_RLS_Dataset_Avg_Resp_Time_W
If VALUE S3_RLS_Dataset_Group_Details.Average_Response_Time GT 2 AND
VALUE S3_RLS_Dataset_Group_Details.Average_Response_Time LT 3
Monitors
for the warning condition where the Average Response Time is between
2 and 3.KS3_RLS_DSG_Max_Avg_Resp_Time_C
If VALUE S3_RLS_Dataset_Group_Summary.Max_Average_Response_Time GT 3
Monitors
for the critical condition where the Max Average Response Time is
greater than 3.KS3_RLS_DSG_Max_Avg_Resp_Time_W
If VALUE S3_RLS_Dataset_Group_Summary.Max_Average_Response_Time GT 2 AND
VALUE S3_RLS_Dataset_Group_Summary.Max_Average_Response_Time LT 3
Monitors
for the warning condition where the Max Average Response Time is between
2 and 3.KS3_RLS_Panic_Mode_Critical
If VALUE S3_RLS_Buffer_LSU_Summary.BMF_Panic_Mode_Pct GT 75
Monitors
for the critical condition where the BMF Panic Mode Pct is greater
than 75.KS3_RLS_Panic_Mode_Warning
If VALUE S3_RLS_Buffer_LSU_Summary.BMF_Panic_Mode_Pct GT 50 AND
VALUE S3_RLS_Buffer_LSU_Summary.BMF_Panic_Mode_Pct LT 75
Monitors
for the warning condition where the BMF Panic Mode Pct is between
50 and 75.KS3_RLS_StorCls_Avg_Resp_Time_C
If VALUE S3_RLS_Storage_Class.Average_Response_Time GT 3
Monitors
for the critical condition where the Average Response Time is greater
than 3.KS3_RLS_StorCls_Avg_Resp_Time_W
If VALUE S3_RLS_Storage_Class.Average_Response_Time GT 2 AND
VALUE S3_RLS_Storage_Class.Average_Response_Time LT 3
Monitors
for the warning condition where the Average Response Time is between
2 and 3.KS3_RMM_CDS_Backup_Critical
If VALUE S3_RMM_Control_Dataset.Days_Since_Last_Backup GT 3
The
number of days since the last backup of the DFSMSrmm CDS or Journal exceeded
the critical threshold.KS3_RMM_CDS_Backup_Warning
If VALUE S3_RMM_Control_Dataset.Days_Since_Last_Backup GT 1 AND
VALUE S3_RMM_Control_Dataset.Days_Since_Last_Backup LE 3
The number of days since the last backup of the DFSMSrmm CDS
or Journal exceeded the warning threshold.KS3_RMM_CDS_Space_Critical
If VALUE S3_RMM_Control_Dataset.RMM_Percent_Used GT 90
The
percentage of space used by the DFSMSrmm CDS or Journal is greater than
the critical threshold.KS3_RMM_CDS_Space_Warning
If VALUE S3_RMM_Control_Dataset.RMM_Percent_Used GE 80 AND
VALUE S3_RMM_Control_Dataset.RMM_Percent_Used LE 90
The percentage of space used by the DFSMSrmm CDS or Journal is greater
than the warning threshold.KS3_RMM_Exit_Status_Critical
If ( ( VALUE S3_RMM_Config.EDGUX200_Status NE Enabled ) OR
( VALUE S3_RMM_Config.EDGUX100_Status NE Enabled ) )
The DFSMSrmm EDGUX100
or EDGUX200 exit is not Enabled.KS3_RMM_Journal_Status_Critical
If VALUE S3_RMM_Config.Journal_Status NE Enabled
The DFSMSrmm Journal
is either Disabled or Locked. DFSMSrmm does not allow further updates
to the journal until BACKUP is run to back up the DFSMSrmm control dataset and
to clear the journal. If the Journal is Locked, DFSMSrmm fails any requests
that result in an update to the DFSMSrmm control dataset. Message EDG2103D
might also have been issued to the DFSMSrmm operator console.KS3_RMM_Operating_Mode_Warning
If VALUE S3_RMM_Config.Operating_Mode NE Protect
DFSMSrmm
is not operating in Protect mode. Certain actions that should be rejected
are permitted if DFSMSrmm is
not operating in protect mode, for example attempting to read a scratch
tape volume.KS3_RMM_Scratch_Tape_Critical
If VALUE S3_RMM_Summary.Type EQ 0 AND
VALUE S3_RMM_Summary.Scratch_Volumes LT 100
The number of Scratch volumes is below the critical threshold.KS3_RMM_Scratch_Tape_Warning
If VALUE S3_RMM_Summary.Type EQ 0 AND
VALUE S3_RMM_Summary.Scratch_Volumes LT 200 AND
VALUE S3_RMM_Summary.Scratch_Volumes GE 100
The number of Scratch volumes is below the warning threshold.KS3_RMM_Inactive_Critical
If VALUE S3_RMM_Config.Subsystem_Status EQ Inactive
The DFSMSrmm subsystem
is inactive.KS3_Stg_Toolkit_Result_Critical
If VALUE S3_Storage_Toolkit_Result_Summary.Return_Code GT 4
The
batch job submitted by the Storage Toolkit to execute a command or
user-defined JCL returns a value greater than 4. Or the Storage Toolkit
encountered an error while attempting to process a command or user-defined
JCL. A value that is greater than 4, and is not specific to the Storage
Toolkit, typically denotes that a command failed to complete. If you
elected to save the results of the batch job, go to the Storage Toolkit
Result Detail workspace to determine whether the error requires further
attention.KS3_Stg_Toolkit_Result_Warning
If VALUE S3_Storage_Toolkit_Result_Summary.Return_Code EQ 4
The
batch job submitted by the Storage Toolkit to execute a command or
user-defined JCL returns the value 4. A value of 4 typically denotes
a warning. If you elected to save the results of the batch job, go
to the Storage Toolkit Result Detail workspace to determine whether
the warning requires further attention.KS3_Storage_Gr_Pct_Free_Crit
If VALUE S3_Volume_Group_Summary.Free_Space_Percent LT 5.0 AND
VALUE S3_Volume_Group_Summary.Group_Type EQ SMSGROUP AND
VALUE S3_Vol ume_Group_Summary.Free_Space_Percent GE 0.0
Monitors
the percentage of free space available for allocation in the storage
group and detects when free space has dropped below the critical threshold.
To prevent allocation failures, you might have to either add one or
more logical volumes to the storage group, or to move datasets off
of the logical volumes in the storage group.KS3_Storage_Gr_Pct_Free_Warning
If VALUE S3_Volume_Group_Summary.Free_Space_Percent LT 10.0 AND
VALUE S3_Volume_Group_Summary.Group_Type EQ SMSGROUP AND
VALUE S3_Volume_Group_Summary.Free_Space_Percent GE 5.0",
Monitors
the percentage of free space available for allocation in the storage
group and detects when free space has dropped below the warning threshold.
In order to prevent allocation failures, you might have to either
add one or more logical volumes to the storage group, or to migrate
datasets off of the logical volumes in the storage group.KS3_TDS_Array_Degraded_Crit
If VALUE S3_TotalStorageDS_Array.RAID_Degraded EQ Yes
Monitors
the arrays in a TotalStorageDS storage facility for a degraded condition
where one or more arrays need rebuilding.KS3_TDS_Array_Prob_Crit
If VALUE S3_TotalStorageDS_Configuration.Number_of_arrays_with_problems GT 0
Monitors
for the condition where the number of arrays in the TotalStorageDS
storage facility running degraded, throttled, or with an RPM exception
exceeds the threshold. The RAID Degraded condition indicates that
one or more DDMs in the array need rebuilding. The DDM Throttling
condition indicates that a near-line DDM in the array is throttling
performance due to temperature or workload. The RPM Exception condition
indicates that a DDM with a slower RPM than the normal array DDMs
is a member of the array as a result of a sparing action.KS3_TDS_Array_RPM_Crit
If VALUE S3_TotalStorageDS_Array.RPM_Exception EQ Yes
Monitors
the arrays in a TotalStorageDS for a condition where a DDM with a
slower RPM than the normal array DDMs is a member of the array as
a result of a sparing action.KS3_TDS_Array_Throttled_Crit
If VALUE S3_TotalStorageDS_Array.DDM_Throttling EQ Yes
Monitors
the arrays in a TotalStorageDS for a condition where the array is
throttling performance due to overload or temperature.KS3_TDS_ExtPool_Array_Prob_Crit
If VALUE S3_TotalStorageDS_Extent_Pool.Number_of_arrays_with_problems GT 0
Monitors
for the condition where the number of arrays in the extent pool running
degraded, throttled, or with an RPM exception exceeds the threshold.
The RAID Degraded condition indicates that one or more DDMs in the
array need rebuilding. The DDM Throttling condition indicates that
a near-line DDM in the array is throttling performance due to temperature
or workload. The RPM Exception condition indicates that a DDM with
a slower RPM than the normal array DDMs is a member of the array as
a result of a sparing action.KS3_TDS_Rank_Array_Prob_Crit
If VALUE S3_TotalStorageDS_Rank.Number_of_arrays_with_problems GT 0
Monitors
for the condition where the number of arrays in the rank running degraded,
throttled, or with an RPM exception exceeds the threshold. The RAID
Degraded condition indicates that one or more DDMs in the array need
rebuilding. The DDM Throttling condition indicates that a near-line
DDM in the array is throttling performance due to temperature or workload.
The RPM Exception condition indicates that a DDM with a slower RPM
than the normal array DDMs is a member of the array as a result of
a sparing action.KS3_Vol_Cache_DFW_Retry_Critical
If VALUE S3_Cache_Devices.DFW_Retry_Percent GE 2 AND
VALUE S3_Cache_Devices.I/O_Count GE 25
Monitors for the condition where the percentage of
DASD fast write attempts for a volume that cannot be satisfied due to a
shortage of available nonvolatile storage (NVS) space exceeded the critical
threshold. Check for pinned NVS and correct the problem if NVS is pinned.
Otherwise, if the impact on performance is not acceptable, move a volume or
dataset to another cache control unit or add NVS to this control unit.KS3_Vol_Cache_DFW_Retry_Warning
If VALUE S3_Cache_Devices.DFW_Retry_Percent GE 1 AND
VALUE S3_Cache_Devices.DFW_Retry_Percent LT 2 AND
VALUE S3_Cache_Devices.I/O_Count GE 25
Monitors for the condition where the percentage of
DASD fast write attempts for a volume that cannot be satisfied due to a
shortage of available nonvolatile storage (NVS) space exceeded the warning
threshold. Check for pinned NVS and correct the problem if NVS is pinned.
Otherwise, if the impact on performance is not acceptable, move a volume or
dataset to another cache control unit or add NVS to this control unit.KS3_Vol_Cache_Read_HitP_Critical
If VALUE S3_Cache_Devices.Read_Hit_Percent LE 45 AND
VALUE S3_Cache_Devices.Read_Hit_Percent GE 0 AND
VALUE S3_Cache_Devices.I/O_Count GE 25
Monitors for the condition where the cache read
hit percent is below the critical threshold. If performance is impacted,
determine the reason for the low read hit percent. Common problems are
cache-unfriendly applications and over-utilization of the control unit.KS3_Vol_Cache_Read_HitP_Warning
If VALUE S3_Cache_Devices.Read_Hit_Percent LE 55 AND
VALUE S3_Cache_Devices.Read_Hit_Percent GT 45 AND
VALUE S3_Cache_Devices.I/O_Count GE 25
Monitors for the condition where the cache read
hit percent is below the warning threshold. If performance is impacted,
determine the reason for the low read hit percent. Common problems are
cache-unfriendly applications and over-utilization of the control unit.KS3_Vol_Cache_Writ_HitP_Critical
If VALUE S3_Cache_Devices.Write_Hit_Percent LE 20 AND
VALUE S3_Cache_Devices.Write_Hit_Percent GE 0 AND
VALUE S3_Cache_Devices.I/O_Count GE 25
Monitors for the
condition where the cache write hit percent for a volume is below
the critical threshold. Check the status of the nonvolatile storage
in the cache control unit. You can move volumes or datasets to balance
the workload.KS3_Vol_Cache_Writ_HitP_Warning
If VALUE S3_Cache_Devices.Write_Hit_Percent LE 30 AND
VALUE S3_Cache_Devices.Write_Hit_Percent GT 20 AND
VALUE S3_Cache_Devices.I/O_Count GE 25
Monitors for the
condition where the cache write hit percent for a volume is below
the warning threshold. Check the status of the nonvolatile storage
in the cache control unit. You can move volumes or datasets to balance
the workload.KS3_Vol_Disabled_VTOC_Critical
If VALUE S3_DASD_Volume_Space.VTOC_Index_Status EQ Disabled
Monitors
for the condition where a VTOC index has been disabled. This condition
can degrade performance on the volume. Enable the VTOC index.KS3_Vol_EAV_Fragment_Index_Crit
If VALUE S3_DASD_Volume_Space.Extended_Address_Volume EQ Yes AND
VALUE S3_DASD_Volume_Space.Track_Managed_Fragmentation_Index GE 850
The fragmentation index in the track managed area of
an Extended Address Volume exceeds the critical threshold.KS3_Vol_EAV_Fragment_Index_Warn
If VALUE S3_DASD_Volume_Space.Extended_Address_Volume EQ Yes AND
VALUE S3_DASD_Volume_Space.Track_Managed_Fragmentation_Index GE 650
AND VALUE S3_DASD_Volume_Space.Track_Managed_Fragmentation_Index LT 850
The fragmentation index in the track managed area of
an Extended Address Volume exceeds the warning threshold.KS3_Vol_EAV_Free_Space_Pct_Crit
If VALUE S3_DASD_Volume_Space.Track_Managed_Percent_Free LE 5.0 AND
VALUE S3_DASD_Volume_Space.Track_Managed_Percent_Free GE 0.0 AND
VALUE S3_DASD_Volume_Space.Extended_Address_Volume EQ Yes
The percentage of free space in the track managed area
of an Extended Address Volume is below the critical threshold.KS3_Vol_EAV_Free_Space_Pct_Warn
If VALUE S3_DASD_Volume_Space.Track_Managed_Percent_Free LE 10.0 AND
VALUE S3_DASD_Volume_Space.Track_Managed_Percent_Free GT 5.0 AND
VALUE S3_DASD_Volume_Space.Extended_Address_Volume EQ Yes
The percentage of free space in the track managed area
of an Extended Address Volume is below the warning threshold.KS3_Vol_Fragment_Index_Critical
If VALUE S3_DASD_Volume_Space.Fragmentation_Index GE 850
Monitors
for the condition where a volume has a fragmentation index that exceeds
the critical threshold. Defragment the volume so that free extents
are combined to help prevent dataset allocation failures.KS3_Vol_Fragment_Index_Warning
If VALUE S3_DASD_Volume_Space.Fragmentation_Index GE 650 AND
VALUE S3_DASD_Volume_Space.Fragmentation_Index LT 850
Monitors
for the condition where a volume has a fragmentation index that exceeds
the warning threshold. Defragment the volume so that free extents
are combined to help prevent dataset allocation failures.KS3_Vol_Free_Space_Pct_Critical
If VALUE S3_DASD_Volume_Space.Percent_Free_Space LE 5 AND
VALUE S3_DASD_Volume_Space.Percent_Free_Space GE 0
Monitors
for the condition where the percentage of free space on a volume is
below the critical threshold. If datasets on the volume require more
space, then either migrate some datasets to another volume or release
space from datasets that might be over-allocated.KS3_Vol_Free_Space_Pct_Warning
If VALUE S3_DASD_Volume_Space.Percent_Free_Space LE 10 AND
VALUE S3_DASD_Volume_Space.Percent_Free_Space GT 5
Monitors
for the condition where the percentage of free space on a volume is
below the critical threshold. If datasets on the volume require more
space, then either migrate some datasets to another volume or release
space from datasets that might be over-allocated.KS3_Vol_Perf_Resp_Time_Critical
If VALUE S3_DASD_Volume_Performance.Response_Time GE 55 AND
VALUE S3_DASD_Volume_Performance.I/O_Count GE 25
Monitors
for the condition where response time for the volume exceeds the critical
threshold. Look at the volume to see if high utilization is a problem.
If so, it might be necessary to migrate datasets from the volume
to reduce utilization. Also check the cache status of the volume.
Look at the components of I/O to determine where the time is being
spent and address the problem accordingly.KS3_Vol_Perf_Resp_Time_Warning
If VALUE S3_DASD_Volume_Performance.Response_Time GE 35 AND
VALUE S3_DASD_Volume_Performance.Response_Time LT 55 AND
VALUE S3_DASD_Volume_Performance.I/O_Count GE 25
Monitors
for the condition where response time for the volume exceeds the warning
threshold. Look at the volume to see whether high utilization is a
problem. If so, you can migrate datasets from the volume to reduce
utilization. Also check the cache status of the volume. Look at the
components of I/O to determine where the time is being spent and address
the problem accordingly.KS3_VTS_Disconnect_Time_Crit
If VALUE S3_VTS_Overview.Virtual_Disconnect_Time GE 500
Monitors
for the condition where the logical control unit disconnect time for
the virtual tape server exceeds the critical threshold. This condition
is often an indication that the tape volume cache capacity is being
exceeded.KS3_VTS_Host_GB_Warning
If VALUE S3_VTS_Overview.Host_Channel_Activity_GB GE 18
Monitors
for the condition where the activity between the MVS system and the virtual tape server on the
host channels exceeds 19 GB over the hour interval. This condition
can be an indication that the virtual tape server is being overloaded.KS3_VTS_Pct_Copy_Throt_Warn
If VTSTPVOLC.PCTCPT GT 50
Monitors for the
condition where copy is the predominant reason for throttling.KS3_VTS_Pct_Wr_Over_Throt_Warn
If VTSTPVOLC.PCTWROT GT 50
Monitors for the
condition where write overrun is the predominant reason for throttling.KS3_VTS_Recall_Pct_Warning
If VALUE S3_VTS_Overview.Volume_Recall_Percent GE 20
Monitors
for the condition where the percent of virtual tape mounts that required
a physical tape mount to be satisfied exceeded the warning threshold.
This condition can lead to unacceptably large virtual mount times.
If so, then investigate the reason for the recalls. If rescheduling
or removing the application workload is not possible, you need to
increase the cache capacity of the VTS.KS3_VTS_Virt_MtPend_Av_Warning
If VALUE S3_VTS_Overview.Average_Virtual_Mount_Pend_Time GE 300
Monitors
for the condition where the average seconds required to satisfy a
virtual mount in the virtual tape subsystem exceeded the warning threshold.
If this condition persists, then further study is required to determine
the cause for the elongated mount times. The condition might be due
to VTS-hostile applications or to a shortage of VTS resources.KS3_VTS_Virt_MtPend_Mx_Warning
If VALUE S3_VTS_Overview.Maximum_Virtual_Mount_Pend_Time EQ 900
Monitors
for the condition where the maximum seconds required to satisfy a
virtual mount in the virtual tape subsystem exceeded the warning threshold.
If this condition persists, then further study is required to determine
the cause for the elongated mount times. The condition might be due
to VTS-hostile applications or to a shortage of VTS resources.