Configuring automatic failover

DNS failover refers to a traffic steering configuration in which the platform automatically diverts DNS traffic away from down or unavailable endpoints, opting for alternatives to ensure the high availability of your applications and services during an outage. This configuration is recommended if multiple endpoints host the same application or provide the same service. Automating this process makes it easy to adapt quickly to changing network conditions—minimizing downtime and the manual effort to maintain this configuration.

On the IBM® NS1 Connect® platform, critical components of an automatic failover configuration include:

An NS1 Connect monitor or third-party data source tracks an endpoint's up/down status.
A data feed connects the monitor to the up/down status of the corresponding DNS answer, enabling automatic updates.
A Filter Chain containing the Up filter eliminates unavailable answers when making traffic steering decisions. Typically, the Up filter is combined with other traffic steering filters. Some examples:
- Up + Priority + Select First N supports an active-passive failover configuration.
- Up + Shuffle + Select First N supports active-active failover with round-robin traffic distribution.
- Up + Geotarget Regional + Select First N supports active-active failover with geographic-based distribution.

How it works

Suppose you have an A record with multiple answers—each specifying the IPv4 address of a host on which an application or service is accessible. To configure automatic failover, you create a monitor for each endpoint, connect each monitor to its corresponding DNS answer, and create a Filter Chain within the record that includes the Up filter.

Each monitor frequently probes its designated endpoint from one or more monitoring regions to determine whether it should be considered up or available based on the up conditions defined in the monitor settings. If the results of a probe fail to meet these conditions, the endpoint is considered down. In response, the data feed connecting the monitor to its corresponding answer pushes an update, automatically changing the answer's up metadata value to false.

As the platform receives incoming queries for record domain and type, it references the Filter Chain to determine the best answer(s) to return. The Filter Chain must include the Up filter to facilitate automatic failover, but it is typically used in conjunction with other filters to apply secondary processing. Without additional filters, you risk directing all traffic to the same answer as long as it is up.

For example, if the Filter Chain contains the Up, Shuffle, and Select First N filters (in that order), then incoming queries would be processed as follows:

The Up filter eliminates any answer marked as down from the answer pool before passing the list to the next filter. Note that because you have automatic updates configured, each answer's up/down status reflects the monitored endpoint.
(Optional) The Shuffle filter randomizes the order of answers in the list. Note that many filters can be used after the Up filter to achieve more even traffic distribution among the available endpoints or to favor specific endpoints over others based on some conditions.
The Select First N filter eliminates all but the first N (number) of answers in the list. In most cases, and by default, N is set to 1, meaning only the first answer in the list remains. This filter is placed at the end of most Filter Chain configurations to ensure only one answer is returned to requesting clients.

At a minimum, the Filter Chain must include the Up filter to support automatic failover, but most Filter Chain leverage additional filters based on the desired outcome. For example, the Select First N filter is typically placed at the end of the Filter Chain so that only one answer is returned to the requesting client. Further, additional filters, such as randomization or geographic-based filters, can be used to achieve more even traffic distribution among the available endpoints or to favor specific endpoints over others based on some conditions.

Step 1: Create or connect a monitoring job

Create an NS1 Connect monitoring job or connect an external monitoring job from a supported monitoring integration to collect the up/down status of an application endpoint or service.

Step 2: Connect the monitoring job feeds to the corresponding answers

Configure automatic updates from the monitoring jobs to their corresponding answers by connecting each job feed to the Up/down metadata field for each answer in the DNS record. Refer to Connecting a monitoring job to a DNS answer for instructions.

Step 3: Create a Filter Chain containing the Up filter

Refer to the follow steps to configure a Filter Chain that supports an active-active or active-passive failover configuration.

Note: The order of filters in the Filter Chain indicates the order in which they are applied when processing incoming queries to the record.

On the record details page, click Create Filter Chain.
Click +) next to the Up filter.
(Recommended) Add one or more filters to the middle of the Filter Chain to apply secondary processes. Doing so can help prevent one endpoint from being overloaded when multiple or all answers are available.
- If configuring an active-passive configuration with one primary endpoint and one or more backup endpoints, use the Priority filter and enter a priority metadata value for each answer. Note that lower numbers indicate a higher priority—for example, 1 is the highest priority.
  Attention: The order of answers on the Record details page indicates the priority order unless the priority is defined in the answer metadata. If you do not override the priority or use a second filter after the Up filter in this chain, then the platform will always return the first answer that appears on this page if it is available.
- If configuring an active-active configuration where all endpoints should share DNS traffic, use another filter to apply secondary filtering. For example, use the Shuffle filter to distribute traffic evenly across your endpoints, a Weighted Shuffle filter to skew traffic toward specific endpoints more often, a geographic filter to favor endpoints that are geographically proximate to the requester, or any of the other filters to achieve the desired outcome.
  Note: If you apply a filter that references answer metadata, you must edit the answer metadata manually or connect a data source to update that field automatically.
(Recommended) Add the Select First N filter at the end of the Filter Chain to control the number of responses returned to the requesting client.
Click Save Filter Chain.

This completes the automatic configuration process. When a client queries the DNS record, any endpoints marked as down are removed from the answer pool to ensure the requester can connect to your application or service.