Once every monitoring interval, the resolver calculates the percentage
of queries to a name server that failed in the previous 30-seconds
or 5 minutes, and then compares this percentage to the threshold value
that you set in the UNRESPONSIVETHRESHOLD statement to determine whether
that DNS name server is unresponsive. If the resolver sends a query
to a name server multiple times and the name server does not respond
to multiple queries, each query is considered to be a unique failure
to respond. When you specify the UNRESPONSIVETHRESHOLD value, consider
the following factors that have an impact on the effectiveness of
your setting:
- If you specify a small percentage for this value, an excessive
number of operator notifications might occur. Short network disruptions
that occur during the 30-second or 5-minute monitoring interval might
result in some undeliverable resolver queries or name server responses,
and a low threshold value might cause the resolver to alert the operator,
and possibly stop using the name server, unnecessarily.
- If you specify a large percentage for this value, persistent issues
with the network or the name server might be undetected even though
a significant portion of resolver queries are not being processed
by the name server.
- The setting on the RESOLVERTIMEOUT statement in the TCPIP.DATA
file also affects the value that you should specify for the UNRESPONSIVETHRESHOLD
setting. If you set a very short timeout value, even slight network
disruptions might cause name server responses to be delayed longer
than the amount of time specified by the RESOLVERTIMEOUT value. These
delays are considered to be non-responses from the name server, which
might cause unnecessary messages to be generated for this name server.
A less aggressive (higher) percentage setting for the UNRESPONSIVETHRESHOLD
value might be warranted in such a situation.
- The settings of the RESOLVERUDPRETRIES, SEARCH, and NAMESERVER
statements in the TCPIP.DATA file can also contribute to high numbers
of apparent failures on the part of the name server. See Examples of resolver monitoring of DNS name servers for information
about how these settings can influence the statistics that are collected
by the resolver.
Guideline: When you set the optimal
threshold by determining the error rate for a given name server, determine
the error rate before you activate the autonomic quiescing of unresponsive
name servers function.
One strategy that you can use to select the most optimal threshold
value is to start with the default setting, which is 25%, and determine
how many network operator messages are issued, if any, during normal
operation of the network.
- If your network is operating in an acceptable manner (for example,
no performance issues are detected and no host name or IP address
resolutions delays are detected), examine the number of network operator
alerts that are generated by the resolver:
- If the number of network operator messages is zero or insignificant,
leave the setting at the default value, or even decrease the threshold
value slightly.
- If the number of network operator messages is excessive, which
suggests that a lot of false negative conditions were detected by
the resolver, increase the threshold setting until the number of messages
that is generated is appropriate for your network.
- If the name server is now responsive, but the failure rate is
just slightly below the threshold value, the name server will probably
become unresponsive again with a minor disruption in the network.
If your network is currently operating in a satisfactory manner, consider
increasing the threshold setting so that the resolver issues EZZ9308E
messages only when your network conditions change significantly. Use
the statistics that are displayed when message EZZ9309I is issued
to modify the threshold setting to a more optimal value.
- If your network is experiencing performance issues that resolver
delays might be contributing to (for example, unexplained application
delays), consider decreasing the responsiveness threshold setting
to determine whether issues with the name servers are being detected
by the resolver but are not being reported as unresponsive. If this
lower threshold value causes the resolver to generate network operator
messages that identify name servers that are unresponsive and that
are impacting network operations, consider using this lower value
for normal operations to provide more timely identification of name
server issues.
A second strategy that you can use to select the most optimal threshold
value is to start with the lowest threshold setting, which is 1%.
If your name servers are failing to respond to a small percentage
of the overall resolver queries that are being sent, the resolver
generates EZZ9308E messages. At 5-minute intervals, the resolver also
generates EZZ9310I messages, which indicate the percentage of failures
for the most recent 5-minute sliding window. Use the EZZ9310I messages
to determine the highest failure rate during normal operation of the
network, and then set the threshold value to that rate, or to a value
slightly above that rate. For example, if the highest failure percentage
displayed on the EZZ9310I messages is 4%, set the threshold value
to 5% for your network. This value ensures that the resolver considers
name servers to be unresponsive only when they experience a failure
rate that is greater than the rate that typically occurs in your network.