Monitoring the responsiveness of Domain Name System name servers
- Network operator notification
The resolver alerts the network operator about name servers that fail to respond to a significant percentage of resolver queries, but continues to send DNS queries that are generated by an application to the unresponsive name server. You can use these alerts to better manage the list of name servers that the system uses and to avoid unnecessary delays when a host name or IP address is being resolved.
- Autonomic quiescing of unresponsive name servers
The resolver alerts the network operator about name servers that fail to respond to a significant percentage of resolver queries, and does not send additional DNS queries that are generated by an application to the unresponsive name server. While the name server remains unresponsive, the resolver periodically sends DNS polling queries to the name server. When the name server is responsive to the resolver's DNS polling queries, the resolver resumes sending DNS queries that are generated by an application to the name server.
To determine name server responsiveness, the resolver collects statistics about name server responsiveness at 30-second or 1-minute intervals, depending on the monitoring function that is being performed. During a given monitoring interval, the resolver keeps system-wide statistics about the total number of resolver queries that are sent to a name server and about the number of those resolver queries that were not responded to by the name server. At the end of the monitoring interval, the resolver calculates a percentage of the total number of queries that were not responded to by the name server over the course of the last interval, or the last five intervals, depending on the monitoring function being performed. This percentage is compared to the setting on the UNRESPONSIVETHRESHOLD resolver setup statement. If the percentage of failures equals or exceeds the threshold value, the resolver considers the name server to be unresponsive. For information about the UNRESPONSIVETHRESHOLD statement and how to set its value, see z/OS Communications Server: IP Configuration Reference and Optimizing the UNRESPONSIVETHRESHOLD value for your network.
The phrase resolver queries does not mean the same thing as resolver API calls in the context of name server responsiveness. A single resolver API call, such as getaddrinfo() or gethostbyname(), can generate multiple resolver queries to one or more DNS name servers, based on retry counts, domain names to append to a search, or the type of information that is being requested by the API. Conversely, a resolver API call might not generate any resolver queries to any DNS name servers, if the resource is already in the resolver cache. See Examples of resolver monitoring of DNS name servers for examples of how different TCPIP.DATA file settings can influence name server responsiveness statistics.
The resolver considers the following failures to be indicative of an unresponsive name server:
- The resolver sends a UDP or TCP query to a name server and never receives a response.
- The resolver sends a UDP query to a name server and receives a response after the RESOLVERTIMEOUT value has expired.
- The resolver attempts to send data to a name server using UDP, but the data cannot be sent to the target IP address (for example, because of an error in the route configuration).
- The resolver attempts to connect to a name server using TCP, but the connection attempt times out.
- In some situations, the BIND 9 DNS utilities (for example, dig or nsupdate) issue getaddrinfo() API calls to resolve a host name that represents a remote DNS name server, and those API calls invoke z/OS® resolver processing. If any of the previously mentioned failures occur during these BIND 9 resolver calls, the failures are included in the name-server statistics.
The resolver does not consider the following failures to be indicative of an unresponsive name server:
- The resolver cannot open a socket (UDP or TCP) to send a request to a name server, including instances in which the system is IPv4-only capable and an IPv6 name server IP address is coded on the NSINTERADDR statement.
- The resolver sends a UDP query to a name server to determine whether the name server is EDNS0-capable, but does not receive a response to that UDP query; see Extension Mechanisms for DNS standards and the resolver for more information about EDNS0 processing.
- The resolver sends a UDP query to a name server and the name server responds with a DNS return code (such as SERVFAIL or NOTIMPL) that indicates that the name server is active and responding but is unable to process the request that was sent.
- Timeouts or failures occur during SMTP processing (SMTP uses its own resolver services to send queries to a name server).
- Timeouts or failures occur during BIND 9 DNS utility processing that does not involve getaddrinfo() calls (that processing uses BIND 9 resolver services to send queries to a name server).