Implement a scalable architecture that is resilient to node and availability zone failures.

IBM Cloud has a global network of multizone regions (MZRs) distributed around the world. Each zone has isolated power, cooling and network infrastructures.

This blog post presents an example architecture that utilizes a network load balancer (NLB) and is resilient to a zonal failure:

IBM Cloud Internet Services (ICS) provides security, reliability and performance to external web content. A global load balancer (GLB), as seen in the diagram above, can be configured to provide high availability by spreading load across zones.

IBM Cloud VPC load balancers

IBM Cloud Virtual Private Cloud (VPC) supports two types of load balancers: an application load balancer (ALB) and a network load balancer (NLB).

The right side of the diagram shows a VPC in an MZR with three zones. Health checks will allow the NLB to distribute connections to the healthy servers. In this example, the servers are in the same zone as the NLB, but it is possible to accept members across all zones using multi-zone support.

Why use a network load balancer instead of an application load balancer?

A network load balancer (NLB) works in Layer 4 and is best fit for workloads that require high throughput and low latency.

You may be asking why a separate network load balancer is needed if the application load balancer supports Layer 4 traffic. Often, a client will submit a request that is fairly small in size, with little performance impact on the load balancer; however, the information returned from the backend targets (virtual servers or container workloads) can be significant — perhaps several times larger than the client request. 

With Direct Server Return (DSR), the information processed by the backend targets is sent directly back to the client, thus minimizing latency and optimizing throughput performance.

Additionally, network load balancers have the following unique characteristics when compared to an application load balancer (for more information, see the Load Balancer Comparison Chart):

  • Source IP preservation: Network load balancers don’t NAT the client IP address. Instead, it is forwarded to the target server.
  • Fixed IP address: Network load balancers have a fixed IP address.

IBM Cloud Internet Services (CIS) global load balancer

Global load balancer (GLB) health checks allow for the distribution of requests across healthy NLB/servers:

Each red ‘X’ in the diagram above shows an unhealthy scenario (i.e., an unhealthy server detected by an NLB health check and an NLB or zonal failure that is detected by the CIS GLB health check).

This next diagram shows more concretely how the CIS GLB performs load balancing via DNS name resolution:

  1. The client requests the DNS name
  2. The client computer has a DNS resolver that contacts a web of DNS resolvers to determine the corresponding IP addresses. The diagram shows the client’s DNS resolver contacting an on-premises DNS resolver that will reach Cloudflare as the authoritative DNS Server for the IBM Cloud Internet Services and, therefore, the GLB
  3. A list of the NLB load balancers is returned, and one of those is used by the client. The order and weight of the origin pool members can be adjusted by configuring a global load balancer.
  4. The client uses the IP address to establish a TCP connection directly to a server through the NLB.

Provisioning the VPC instance

The first step is to use the IBM Cloud Console to create a Cloud Internet Services (CIS) instance if one is not available. A number of pricing plans are available, including a free trial. The provisioning process of a new CIS will explain how to configure your existing DNS registrar (probably outside of IBM) to use the CIS-provided domain name servers. The post uses for the DNS name.

Follow the instructions in the companion GitHub repository to provision the VPC, VSI, NLB and CIS Configuration on your desktop or in IBM Cloud Schematics. After provisioning is complete, the Terraform output will show test curl commands that can be executed to verify load is being balanced across zones via the GLB and across servers via the NLB.

Visit the IBM Cloud Console Resource list. Find and click on the name of the Internet Services product to open the CIS instance and navigate to the Reliability section of the CIS instance. Check out the Load balancer, origin pools and health checks. Navigate to the VPC Infrastructure and find the VPC, subnets, NLBs, etc. Verify that the CIS GLB is connected to the IP addresses of the VPC NLBs.

Kubernetes and OpenShift

The same architecture can be used for Red Hat OpenShift on IBM Cloud or IBM Cloud Kubernetes Service. The IBM Cloud Kubernetes Service worker nodes replace the servers in the original diagram:

Follow the instructions in the companion GitHub repository to provision the IKS, NLB, CIS Configuration on your desktop or in IBM Cloud Schematics. While the Kubernetes Service cluster is being provisioned, read on to understand the Kubernetes resources configured.

Kubernetes deployments, by default, will spread pods evenly across worker nodes (and zones). The example configuration uses a nodeSelector to place the pods on zone-specific worker nodes (like those in us-south-1) using an IBM Cloud node attribute shown in the cutdown below:

kind: Deployment
      app: cogs
    name: cogs-0
    namespace: default
    replicas: 2
        app: cogs
          app: cogs
          ... pod code

A Kubernetes service is configured to expose applications using load balancers for VPC. Each service is configured with a VPC NLB that can be access publicly. A service is created for each zone.

The service ingress is configured to keep the load in the worker node that receives the network request using externalTrafficPolicy: Local. The Kubernetes default policy will balance the load across all selected pods in all workers in all zones. The default may be preferred for your workload:

apiVersion: v1
kind: Service
    name: myloadbalancer1
    annotations: "nlb" "public" "us-south-1"
    type: LoadBalancer
        app: cogs
     - name: http
         protocol: TCP
         port: 80
    externalTrafficPolicy: Local

After the Terraform provision is complete, visit the IBM Cloud Console Resource list. Find and click on the name of the Internet Services product to open the CIS instance and navigate to the Reliability section of the CIS instance. Check out the Load balancer, origin pools and health checks. Note that the origins contain IP addresses of the Kubernetes Service VPC NLBs.

This can be verified using the cli. The Terraform output has a test_kubectl output that can be used to initialize the Kubernetes kubectl command-line tool. After initialization, get the services to see output like this:

$ kubectl get services
NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
kubernetes                 ClusterIP       <none>           443/TCP                      7d
load-balancer-us-south-1   LoadBalancer   80:30656/TCP,443:31659/TCP   121m
load-balancer-us-south-2   LoadBalancer    80:32152/TCP,443:32683/TCP   121m

Summary and next steps

The IBM Cloud Internet Services GLB is probing for health checks through the NLB to the server computers. This health check is a path very similar to a client accessing the servers. Under the extremely unlikely event of a zone failure, this architecture will continue to balance load across the remaining zones/workers. Each NLB has a static public IP address that remains fixed for the lifetime of the NLB, so the GLB will not need to be updated.

The TCP traffic in the example is not TLS encrypted. The TLS will need to be managed by the worker applications. IBM Cloud Secrets Manager can be used to automate the distribution of TLS certificates.

If you have feedback, suggestions or questions about this post, please email me or reach out to me on Twitter (@powellquiring).


More from Cloud

Kubernetes version 1.28 now available in IBM Cloud Kubernetes Service

2 min read - We are excited to announce the availability of Kubernetes version 1.28 for your clusters that are running in IBM Cloud Kubernetes Service. This is our 23rd release of Kubernetes. With our Kubernetes service, you can easily upgrade your clusters without the need for deep Kubernetes knowledge. When you deploy new clusters, the default Kubernetes version remains 1.27 (soon to be 1.28); you can also choose to immediately deploy version 1.28. Learn more about deploying clusters here. Kubernetes version 1.28 In…

Temenos brings innovative payments capabilities to IBM Cloud to help banks transform

3 min read - The payments ecosystem is at an inflection point for transformation, and we believe now is the time for change. As banks look to modernize their payments journeys, Temenos Payments Hub has become the first dedicated payments solution to deliver innovative payments capabilities on the IBM Cloud for Financial Services®—an industry-specific platform designed to accelerate financial institutions' digital transformations with security at the forefront. This is the latest initiative in our long history together helping clients transform. With the Temenos Payments…

Foundational models at the edge

7 min read - Foundational models (FMs) are marking the beginning of a new era in machine learning (ML) and artificial intelligence (AI), which is leading to faster development of AI that can be adapted to a wide range of downstream tasks and fine-tuned for an array of applications.  With the increasing importance of processing data where work is being performed, serving AI models at the enterprise edge enables near-real-time predictions, while abiding by data sovereignty and privacy requirements. By combining the IBM watsonx data…

The next wave of payments modernization: Minimizing complexity to elevate customer experience

3 min read - The payments ecosystem is at an inflection point for transformation, especially as we see the rise of disruptive digital entrants who are introducing new payment methods, such as cryptocurrency and central bank digital currencies (CDBC). With more choices for customers, capturing share of wallet is becoming more competitive for traditional banks. This is just one of many examples that show how the payments space has evolved. At the same time, we are increasingly seeing regulators more closely monitor the industry’s…