Optimizing the UNRESPONSIVETHRESHOLD value for your network

Once every monitoring interval, the resolver calculates the percentage of queries to a name server that failed in the previous 30-seconds or 5 minutes, and then compares this percentage to the threshold value that you set in the UNRESPONSIVETHRESHOLD statement to determine whether that DNS name server is unresponsive. If the resolver sends a query to a name server multiple times and the name server does not respond to multiple queries, each query is considered to be a unique failure to respond. When you specify the UNRESPONSIVETHRESHOLD value, consider the following factors that have an impact on the effectiveness of your setting:

Guideline: When you set the optimal threshold by determining the error rate for a given name server, determine the error rate before you activate the autonomic quiescing of unresponsive name servers function.

One strategy that you can use to select the most optimal threshold value is to start with the default setting, which is 25%, and determine how many network operator messages are issued, if any, during normal operation of the network.

A second strategy that you can use to select the most optimal threshold value is to start with the lowest threshold setting, which is 1%. If your name servers are failing to respond to a small percentage of the overall resolver queries that are being sent, the resolver generates EZZ9308E messages. At 5-minute intervals, the resolver also generates EZZ9310I messages, which indicate the percentage of failures for the most recent 5-minute sliding window. Use the EZZ9310I messages to determine the highest failure rate during normal operation of the network, and then set the threshold value to that rate, or to a value slightly above that rate. For example, if the highest failure percentage displayed on the EZZ9310I messages is 4%, set the threshold value to 5% for your network. This value ensures that the resolver considers name servers to be unresponsive only when they experience a failure rate that is greater than the rate that typically occurs in your network.