Examples of resolver monitoring of DNS name servers

Values in the TCPIP.DATA file can affect the statistics that the resolver collects when it monitors DNS name servers. For example, consider the following settings from TCPIP.DATA:

NAMESERVER 10.43.25.200 10.43.125.203 10.43.25.200
RESOLVERUDPRETRIES 2
RESOLVERTIMEOUT 0.075
RESOLVEVIA UDP

In this example, one name server (10.43.25.200) appears twice in the list of name servers that the resolver will search. The resolver should try that list of name servers again one time before it considers the name servers to be unresponsive. Assume that the resolver generates a query to resolve the address user.ibm.com as part of gethostbyname processing. The following example sequence occurs:

  1. The resolver sends the query to name server 10.43.25.200, which times out after 75 milliseconds (based on the RESOLVERTIMEOUT value).
  2. The resolver forwards the request to name server 10.43.125.203, which also times out.
  3. The request goes to name server 10.43.25.200 a second time (as the last name server in the list), which times out again.

    The first retry of the list name servers is complete.

  4. The resolver begins at the top of the list again and sends the request to name server 10.43.25.200 for a third time. A response arrives from this name server, possibly as the result of name server delays.
  5. The resolver stops searching for the resource.

Based on the search that the resolver performed, the system-wide total request count for name server 10.43.25.200 is incremented by 3, and the total failure count is incremented by 2. If the searches that are shown in this example are all the activity for this name server over the course of a 5-minute sliding window or the 30-second monitoring interval, the failure rate for this name server is 66%; the system-wide total request count and the total failure count for name server 10.43.124.203 are both incremented by 1. If the resolver does not send any more queries to this name server during the 5-minute sliding window or the 30-second monitoring interval, the failure rate for name server 10.43.124.203 is 100%.

If you are using the network operator notification and the threshold percentage is less than 66%, the resolver reports both of these name servers as unresponsive, but continues to send DNS queries to both name servers.

If you are automatically quiescing unresponsive name servers, the resolver does not consider either of these name servers to be unresponsive because less than 10 queries were directed to the name server during the 30-second interval. The name servers are not considered unresponsive regardless of the setting of the UNRESPONSIVETHRESHOLD value.

Consider these different TCPIP.DATA file settings:

NAMESERVER 10.43.25.200
SEARCH raleigh.ibm.com
RESOLVERTIMEOUT 0.075
RESOLVEVIA UDP

In this example, only one name server is coded, and only one domain name can be appended to the input host name as an additional search attempt. Assume that an application issues getaddrinfo() for host name user, and that ai_family=AF_UNSPEC is specified. The following example sequence occurs:

  1. The resolver searches for domain name user.raleigh.ibm.com and requests AAAA records.
  2. One of the following actions occurs:
    • If the resolver obtains resource information, the search ends.
    • If the resolver does not obtain resource information, the resolver continues to request AAAA records, but searches the next domain in the sequence, which is user.
  3. One of the following actions occurs:
    • If the resolver obtains resource information, the search ends.
    • If the resolver does not obtain resource information, the resolver searches for domain name user.raleigh.ibm.com and requests A records.
  4. One of the following actions occurs:
    • If the resolver obtains resource information, the search ends.
    • If the resolver does not obtain resource information, the resolver continues to request A records, but searches the next domain in the sequence, which is user.

If the name server at 10.43.25.200 fails to respond to all of the queries, the system-wide total request count and the total failure count for this name server are incremented by 4. The network operator notification function would consider this name server to be unresponsive (because it experienced a 100% failure rate), but the autonomic quiescing of unresponsive name servers function would not (because less than 10 requests were directed to the name server). If this query were repeated three times within the 30-second monitoring interval and the total request count and total failure count for this name server was 12 and not 4, then the autonomic quiescing of unresponsive name servers function would consider this name server to be unresponsive because more than 10 queries were directed to the name server.