Implement a scalable architecture that is resilient to node and availability zone failures.
This blog post presents an example architecture that utilizes a network load balancer (NLB) and is resilient to a zonal failure:
IBM Cloud Internet Services (ICS) provides security, reliability and performance to external web content. A global load balancer (GLB), as seen in the diagram above, can be configured to provide high availability by spreading load across zones.
IBM Cloud VPC load balancers
The right side of the diagram shows a VPC in an MZR with three zones. Health checks will allow the NLB to distribute connections to the healthy servers. In this example, the servers are in the same zone as the NLB, but it is possible to accept members across all zones using multi-zone support.
Why use a network load balancer instead of an application load balancer?
A network load balancer (NLB) works in Layer 4 and is best fit for workloads that require high throughput and low latency.
You may be asking why a separate network load balancer is needed if the application load balancer supports Layer 4 traffic. Often, a client will submit a request that is fairly small in size, with little performance impact on the load balancer; however, the information returned from the backend targets (virtual servers or container workloads) can be significant — perhaps several times larger than the client request.
With Direct Server Return (DSR), the information processed by the backend targets is sent directly back to the client, thus minimizing latency and optimizing throughput performance.
Additionally, network load balancers have the following unique characteristics when compared to an application load balancer (for more information, see the Load Balancer Comparison Chart):
- Source IP preservation: Network load balancers don’t NAT the client IP address. Instead, it is forwarded to the target server.
- Fixed IP address: Network load balancers have a fixed IP address.
IBM Cloud Internet Services (CIS) global load balancer
Global load balancer (GLB) health checks allow for the distribution of requests across healthy NLB/servers:
Each red ‘X’ in the diagram above shows an unhealthy scenario (i.e., an unhealthy server detected by an NLB health check and an NLB or zonal failure that is detected by the CIS GLB health check).
This next diagram shows more concretely how the CIS GLB performs load balancing via DNS name resolution:
- The client requests the DNS name
- The client computer has a DNS resolver that contacts a web of DNS resolvers to determine the corresponding IP addresses. The diagram shows the client’s DNS resolver contacting an on-premises DNS resolver that will reach Cloudflare as the authoritative DNS Server for the IBM Cloud Internet Services and, therefore, the GLB
- A list of the NLB load balancers is returned, and one of those is used by the client. The order and weight of the origin pool members can be adjusted by configuring a global load balancer.
- The client uses the IP address to establish a TCP connection directly to a server through the NLB.
Provisioning the VPC instance
The first step is to use the IBM Cloud Console to create a Cloud Internet Services (CIS) instance if one is not available. A number of pricing plans are available, including a free trial. The provisioning process of a new CIS will explain how to configure your existing DNS registrar (probably outside of IBM) to use the CIS-provided domain name servers. The post uses
ibmom.com for the DNS name.
Follow the instructions in the companion GitHub repository to provision the VPC, VSI, NLB and CIS Configuration on your desktop or in IBM Cloud Schematics. After provisioning is complete, the Terraform output will show
test curl commands that can be executed to verify load is being balanced across zones via the GLB and across servers via the NLB.
Visit the IBM Cloud Console Resource list. Find and click on the name of the Internet Services product to open the CIS instance and navigate to the Reliability section of the CIS instance. Check out the Load balancer, origin pools and health checks. Navigate to the VPC Infrastructure and find the VPC, subnets, NLBs, etc. Verify that the CIS GLB is connected to the IP addresses of the VPC NLBs.
Kubernetes and OpenShift
Follow the instructions in the companion GitHub repository to provision the IKS, NLB, CIS Configuration on your desktop or in IBM Cloud Schematics. While the Kubernetes Service cluster is being provisioned, read on to understand the Kubernetes resources configured.
Kubernetes deployments, by default, will spread pods evenly across worker nodes (and zones). The example configuration uses a nodeSelector to place the pods on zone-specific worker nodes (like those in us-south-1) using an IBM Cloud node attribute shown in the cutdown below:
The service ingress is configured to keep the load in the worker node that receives the network request using
externalTrafficPolicy: Local. The Kubernetes default policy will balance the load across all selected pods in all workers in all zones. The default may be preferred for your workload:
After the Terraform provision is complete, visit the IBM Cloud Console Resource list. Find and click on the name of the Internet Services product to open the CIS instance and navigate to the Reliability section of the CIS instance. Check out the Load balancer, origin pools and health checks. Note that the origins contain IP addresses of the Kubernetes Service VPC NLBs.
This can be verified using the cli. The Terraform output has a
test_kubectl output that can be used to initialize the Kubernetes kubectl command-line tool. After initialization, get the services to see output like this:
Summary and next steps
The IBM Cloud Internet Services GLB is probing for health checks through the NLB to the server computers. This health check is a path very similar to a client accessing the servers. Under the extremely unlikely event of a zone failure, this architecture will continue to balance load across the remaining zones/workers. Each NLB has a static public IP address that remains fixed for the lifetime of the NLB, so the GLB will not need to be updated.
The TCP traffic in the example is not TLS encrypted. The TLS will need to be managed by the worker applications. IBM Cloud Secrets Manager can be used to automate the distribution of TLS certificates.