Steps for responding to message EZZ9308E

Message EZZ9308E is displayed when the resolver is monitoring the responsiveness of name servers, the network operator notification function is active, and the resolver determined that a name server is not responding to a significant percentage of queries over a 5-minute interval, where significant is defined by the setting of the UNRESPONSIVETHRESHOLD resolver setup statement. You might want to take action for your name servers when you see message EZZ9308E.

Procedure

Use the following steps to determine your course of action, if any, in response to message EZZ9308E:

  1. Evaluate the scope of the problem with the name server.
    Answer the following questions to determine whether there is a significant issue with the name server:
    Table 1. Message EZZ9308E name server resolution, part 1
    Question Implication
    Is the monitoring function enabled when it should not be enabled? The resolver might have detected syntax errors when processing the resolver setup file, and might have used the default setting for UNRESPONSIVETHRESHOLD.
    Are there known network problems that are contributing to the situation? If known network problems exist, then message EZZ9308E is likely a result of those problems, and the name server issues should clear up once the network problem is resolved.
    Is only one name server unresponsive, or are multiple name servers unresponsive? Resolver generates message EZZ9308E for each name server that is unresponsive.
    • If multiple name servers are reported as unresponsive at roughly the same time, it is likely a more systemic network problem than a problem with any given name server.
    • If only one name server is reported as unresponsive, it is more likely to be an issue with that specific name server.
    Is this name server the primary, mission critical name server, or is it the backup name server? Problems with the primary name server are likely to cause more disruption to the network and to your system, and thus are more likely to require intervention, than problems with a secondary name server.
    Is the reported IP address not a valid address for contacting a name server? One or more TCPIP.DATA data sets being used by applications on the systems have an incorrect IP address coded on an NSINTERADDR or NAMESERVER statement, and resolver is repeatedly attempting to send queries to that IP address.
    Is the volume of requests that are failing significant? Resolver reports the total number of resolver queries and total number of failures associated with this name server in message EZZ9310I, which is displayed when the name server is first identified as being unresponsive, and also at 5-minute intervals after that for as long as the name server remains unresponsive. High numbers of failures, or a significant percentage of a high number of requests, represents a larger disruption to your system than a small number of failures to a name server that is seldom used.
    Is the name server responding, but not fast enough to be considered responsive by resolver? Coding a very small RESOLVERTIMEOUT value might cause resolver to treat late arriving name server responses as failures to respond, even though the response from the name server is used by resolver to satisfy the API call. A less aggressive RESOLVERTIMEOUT value might alleviate the situation, or the situation could be ignored if slight network disruptions are thought to be the issue.

    Issue dig commands to query information at the name server being reported as unresponsive. Use the +time operand on the dig command to override the setting for RESOLVERTIMEOUT in order to determine whether the setting of RESOLVERTIMEOUT is possibly at fault.

    Is the number of failures artificially high due to TCPIP.DATA settings? The settings for RESOLVERUDPRETRIES and SEARCH might cause resolver to increment the failure count multiple times for a single resolver API call, thereby possibly exaggerating the impact of the failures to your system.
    • If multiple domain names are coded on the SEARCH statement, multiple searches for one hostname, with the unique domain names appended, might be attempted.
    • If a value greater than 1 is coded for RESOLVERUDPRETRIES, multiple attempts to contact the same name server for the same resource might be attempted.
    In either case, a small number of API calls could result in a significantly higher failure rate than would be expected for a lightly used name server. If these settings are combined with a small RESOLVERTIMEOUT value, the number could potentially be very high and yet the name server could be running normally.
  2. Based on the answer to your evaluation of the scope of the problem, take the following actions:
    Table 2. Message EZZ9308E name server resolution, part 2
    Answer Actions required
    The problem is related to errors in the resolver setup file.
    1. Identify and correct the errors in the resolver setup file.
    2. Issue the MODIFY RESOLVER,REFRESH,SETUP=setup_filename command to correct the resolver configuration settings.
    The problem is network-related.
    1. Identify and correct the network problem.
    2. Optionally, clear the eventual action messages from the operator console, or leave the message on the operator console and wait for the resolver to clear the message when the name server is again responsive.
    The problem is related to a valid name server.
    1. Identify and correct the problem with the name server.
    2. Optionally, enable the autonomic quiescing of unresponsive name servers function so that the resolver automatically stops forwarding DNS queries generated by an application to the valid name server while it is unresponsive. The automatic quiescing function requires you to define a global TCPIP.DATA file. Review the information about the resolver and the global TCPIP.DATA file in the z/OS Communications Server: IP Configuration Guideto determine whether you can use a global TCPIP.DATA file. If you can use a global TCPIP.DATA file, enable the automatic quiescing function by performing the following steps:
      • If you do not have a resolver setup file, create one.
      • If you do not have a global TCPIP.DATA file, create one. Code one or more NSINTERADDR statements in the global TCPIP.DATA file, specifying the IP addresses of the name servers to be used in your environment.
      • Code the GLOBALTCPIPDATA statement in the resolver setup file, and specify the name of your global TCPIP.DATA on the statement. Code the UNRESPONSIVETHRESHOLD statement in the setup file, specifying a threshold percentage and also specifying the AUTOQUIESCE operand. Issue the MODIFY RESOLVER,REFRESH,SETUP=setup_filename command to cause the resolver to use the new threshold value and to automatically stop forwarding DNS queries generated by an application to unresponsive name servers.
    3. Optionally, clear the eventual action messages from the operator console, or alternatively leave the message on the operator console and wait for the resolver to clear the message when the name server is again responsive.
    The problem is related to an aggressive timeout value for resolver queries.
    1. Identify the TCPIP.DATA data sets that have the small RESOLVERTIMEOUT.
    2. Modify the TCPIP.DATA data sets to increase the timeout value.
    3. Issue MODIFY RESOLVER,REFRESH to cause resolver to read the updated TCPIP.DATA data sets the next time any application using the data set issues a resolver request.
    4. Optionally, clear the eventual action messages from the operator console, or alternatively leave the message on the operator console and wait for resolver to clear the message when the name server is again responsive.
    The problem is related to an incorrect IP address.
    1. Clear the eventual action messages from the operator console.
    2. Identify the TCPIP.DATA data sets that have the incorrect IP address in the list of name servers to contact.
    3. Modify the TCPIP.DATA data sets to eliminate the IP address.
    4. Issue MODIFY RESOLVER,REFRESH to cause resolver to read the updated TCPIP.DATA data sets the next time any application using the data set issues a resolver request.
    The problem is not considered to be an error, or is considered to be insignificant, or is a result of TCPIP.DATA settings that are adding to the problem.
    1. Clear the eventual action messages from the operator console.
    2. Optionally, if TCPIP.DATA settings are thought to be exaggerating the issue, and those settings can be modified without disrupting normal network processing, perform the following steps:
      • Identify the TCPIP.DATA data sets that have the settings that need to be changed.
      • Modify the TCPIP.DATA data sets to correct the RESOLVERUDPRETRIES or SEARCH settings.
      • Issue MODIFY RESOLVER,REFRESH to cause the resolver to use the new settings.
    3. Optionally modify the value of threshold value for declaring a name server to be unresponsive in order to reduce the likelihood of future EZZ9308E messages being displayed for this name server. To modify the threshold value, perform the following steps:
      • If you do not have a resolver setup file, create one.
      • Code the desired percentage value for UNRESPONSIVETHRESHOLD in the setup file. Coding a value of zero will turn off the monitoring function.
      • Issue the MODIFY RESOLVER,REFRESH,SETUP=setup_filename command to cause the resolver to use the new threshold value.
  3. Manually clearing message EZZ9308E does not mean that the name server is now responsive, so you might need to continue to monitor the state of the name server. Based on your actions in Step 2, do one of the following:
    Table 3. Message EZZ9308E name server resolution, part 3
    Monitoring function status Actions required
    You did not turn off the monitoring function. Monitor the operator console for these resolver messages:
    1. EZZ9310I, which the resolver issues to provide statistics for unresponsive name servers at 5-minute intervals. Use the statistics provided in EZZ9310I to ensure that the problem with this name server does not become a more serious problem, for instance that the percentage of failed messages to the name server does not become significantly higher. If the name server continues to fail to respond to messages for a longer period of time than expected, or the percentage of queries receiving no response increases significantly, re-evaluate the problem by using the questions in Step 1.
    2. EZZ9309I, which the resolver issues to indicate that a previously unresponsive name server is now responding to queries at an acceptable level. One last instance of EZZ9310I is displayed with the EZZ9309I message. If EZZ9308E had not already been cleared by the operator, the resolver will clear the message from the console at this time.
    You turned on the autonomic quiescing of unresponsive name servers function. Issue the MODIFY RESOLVER,DISPLAY command to verify that the UNRESPONSIVETHRESHOLD statement is set to the percentage that you specified in the resolver setup file and that AUTOQUIESCE is set. If the name server remains unresponsive, the resolver will now issue message EZZ9311E, and will stop forwarding DNS queries generated by an application to the unresponsive name server.
    You turned off the monitoring function. Issue the MODIFY RESOLVER,DISPLAY command to verify that the UNRESPONSIVETHRESHOLD statement is set to 0. You will receive no new unresponsive messages, including EZZ9308E or EZZ9311E, until you restart the monitoring function.

Results

You know you are done when EZZ9308E no longer appears on the operator console and the name server is now responsive or the monitoring function is disabled or the autonomic quiescing function is active. If the monitoring function is still enabled and resolver subsequently determines that the name server is again unresponsive, a new message EZZ9308E is displayed.