Designing your application network for high availability

When you design an application network with Red Hat® Service Interconnect for high availability, you must consider load balancing, multipath routing, link cost, and layer 7 protocol. In particular, you need to think about how application connections are routed between local and remote routers, and how the remote router distributes the load across service instances.

Load balancing behavior

When a router sends a request to a service instance, the load balancing method depends on the configuration of the host Kubernetes cluster. The load is balanced across pods for cluster endpoints or IP addresses for external endpoints.

The primary factor is the Kubernetes network proxy kube-proxy. You can configure kube-proxy in several modes, each of which offers different features and performance characteristics. IBM® Hybrid Cloud Mesh (Mesh) cannot control the kube-proxy mode, so consider carefully the effect of the kube-proxy mode on the load balancing behavior that your system requires.

The following are some common kube-proxy modes and their effects on load balancing:

  • iptables - The service instance is selected in a statistically random way by using iptables rules.
  • ipvs - The service instance is selected by using the round-robin algorithm by default, but other options are available. This mode is also called IP Virtual Server (IPVS) mode.

These effects on load balancing are also relevant to alternatives to kube-proxy in some Container Network Interfaces (CNIs).

Routing across multipath application networks

Application networks can span multiple sites, clouds, and clusters. Often, multiple paths and routers are set up for network redundancy. Traffic might flow across some or all of these paths. It's important to understand the factors that affect what path a connection takes from the local router to the remote router.

In Red Hat Service Interconnect, the route is determined when the client application makes a TCP connection. The decision of the routing path is static: it is made one time and doesn't change until the client breaks the connection, a router link fails, or the service becomes unreachable. This behavior works well when routers or service instances become unavailable because when the connection breaks, the client can reconnect around network failures through redundant paths.

However, this behavior also prevents requests from failing back to the best route when routers or service instances are restored. To fail back, client applications must close and re-create their service connections. This behavior is application-dependent, so it is important to consider the retry logic of your application in these scenarios.

Link cost in routing

Many application networks use multiple router paths through the network to help ensure highly available connectivity between applications. Red Hat Service Interconnect uses only one route per client connection. Red Hat Service Interconnect typically selects the route with the lowest link cost.

Link cost is an arbitrary measure of the cost that is associated with sending data over a specific network link. It helps routers to determine the most efficient path for forwarding requests to services. The link cost can be based on various factors, such as the bandwidth, delay, or reliability of the link. By assigning a cost to each link, you can help routers to calculate the total cost of a path according to your needs and select the one with the lowest cost. The behavior can help ensure optimal network performance and resource usage.

On any TCP connection between two routers, Red Hat Service Interconnect might queue connections that have not yet been delivered to the target service. Connections are queued for a service when the service is busy processing other connections and is unable to process anymore. Queued connections remain queued until the service completes its current requests and becomes available to process more, at which point it starts to process the oldest connection in the queue.

When multiple paths to reach a service exist, the router selects a path by using to the sum of connection queue depth and the cost of the link. In short, Red Hat Service Interconnect considers traffic congestion in its decision of path selection.

For example, consider 2 router paths, p1 and p2, from application A to service B. The link cost of p1 is 10. The link cost of p2 is 25. Initially, p1 is selected for new connections because it is the lower-cost path. Path p2 is only selected for new connections when service B on p1 has 15 queued connections. That means that the sum of link cost and queued connections is higher on p1 than on p2. While new connections continue to arrive and the sum of queued connections and link cost on p2 grows higher than the sum on p1, p1 is selected again for new connections.

As a result, it is important to consider the purpose of the alternative router paths. If the purpose is to provide extra capacity for handling connections, then the delta cost between alternative paths can be small. If the purpose is to provide a backup path, then make sure that the delta cost between alternative paths is large. A large delta cost helps to make sure that the alternative path is not chosen except in rare circumstances, such as if the primary path is unavailable.

If the network can't deliver a request on the route with the lowest link cost, Red Hat Service Interconnect breaks the connection and forces the application to create a new connection. That new connection can be routed along on the new higher-cost link. Even if the low-cost path becomes available again, the traffic remains on the high-cost path until the application closes the connection. As a result, applications that cache service connections might need to close persistent connections so that changes in available route paths are used.

Routing for HTTP requests

When an application sends multiple HTTP requests, you might expect each HTTP request to be routed independently of each other. That behavior is typically not the case because routing decisions occur at the time of connection.

HTTP uses TCP as the layer 4 transport for HTTP requests. HTTP/1.1 and HTTP/2 use persistent TCP connections to multiplex multiple HTTP requests across a single connection. In this case, the routing decision happens at the start of the TCP connection, and every HTTP request during that connection is routed along the same path to the destination gateway. The duration of connection timeouts depend on the exact systems in question but can take 5-15 seconds.

Some older systems use the older HTTP/1.0 protocol. This protocol creates a new TCP connection for every HTTP request. In this case, each HTTP request is routed independently of each other.

This behavior is similar to a Network Load Balancer (NLB), which balances traffic based on layer 4 connections. However, this behavior is not similar to an Application Load Balancer (ALB), which balances traffic on a layer 7 requests.

Key points for routing path design

To summarize, it's important to understand the following when you design routing paths for a Red Hat Service Interconnect network:

  • Load balancing behavior by a particular router depends on your cluster configuration, but the selection of a service instance is typically either random or by round-robin algorithm.
  • The router path across the application network is determined when the client makes a TCP connection.
  • The router path that is selected is the lowest cost path that is available, considering both the assigned link cost and traffic congestion.
  • Multiple HTTP requests within a persistent connection take the same router path.

Therefore, consider the following best practices when you design routing paths:

  • Configure more than one route path from any application to the services that it uses.
  • Colocate multiple service instances in the same namespace to achieve high availability.
  • Deploy service instances in more than one namespace to provide the highest possible availability.
  • Set link costs such that the cost of the preferred router path is much less than the cost of a backup router path.
  • Applications must regularly close and create new connections so that when service or route availability changes, the changes to the available route paths are used.