December 13, 2022 By Powell Quiring
Ahmed Osman
Arda Gumusalan
5 min read

Implement a scalable architecture that is resilient to node and availability zone failures.

IBM Cloud has a global network of multizone regions (MZRs) distributed around the world. Each zone has isolated power, cooling and network infrastructures.

This blog post presents an example architecture that utilizes a network load balancer (NLB) and is resilient to a zonal failure:

IBM Cloud Internet Services (ICS) provides security, reliability and performance to external web content. A global load balancer (GLB), as seen in the diagram above, can be configured to provide high availability by spreading load across zones.

IBM Cloud VPC load balancers

IBM Cloud Virtual Private Cloud (VPC) supports two types of load balancers: an application load balancer (ALB) and a network load balancer (NLB).

The right side of the diagram shows a VPC in an MZR with three zones. Health checks will allow the NLB to distribute connections to the healthy servers. In this example, the servers are in the same zone as the NLB, but it is possible to accept members across all zones using multi-zone support.

Why use a network load balancer instead of an application load balancer?

A network load balancer (NLB) works in Layer 4 and is best fit for workloads that require high throughput and low latency.

You may be asking why a separate network load balancer is needed if the application load balancer supports Layer 4 traffic. Often, a client will submit a request that is fairly small in size, with little performance impact on the load balancer; however, the information returned from the backend targets (virtual servers or container workloads) can be significant — perhaps several times larger than the client request. 

With Direct Server Return (DSR), the information processed by the backend targets is sent directly back to the client, thus minimizing latency and optimizing throughput performance.

Additionally, network load balancers have the following unique characteristics when compared to an application load balancer (for more information, see the Load Balancer Comparison Chart):

  • Source IP preservation: Network load balancers don’t NAT the client IP address. Instead, it is forwarded to the target server.
  • Fixed IP address: Network load balancers have a fixed IP address.

IBM Cloud Internet Services (CIS) global load balancer

Global load balancer (GLB) health checks allow for the distribution of requests across healthy NLB/servers:

Each red ‘X’ in the diagram above shows an unhealthy scenario (i.e., an unhealthy server detected by an NLB health check and an NLB or zonal failure that is detected by the CIS GLB health check).

This next diagram shows more concretely how the CIS GLB performs load balancing via DNS name resolution:

  1. The client requests the DNS name cogs.ibmom.com.
  2. The client computer has a DNS resolver that contacts a web of DNS resolvers to determine the corresponding IP addresses. The diagram shows the client’s DNS resolver contacting an on-premises DNS resolver that will reach Cloudflare as the authoritative DNS Server for the IBM Cloud Internet Services and, therefore, the GLB cogs.ibmom.com.
  3. A list of the NLB load balancers is returned, and one of those is used by the client. The order and weight of the origin pool members can be adjusted by configuring a global load balancer.
  4. The client uses the IP address to establish a TCP connection directly to a server through the NLB.

Provisioning the VPC instance

The first step is to use the IBM Cloud Console to create a Cloud Internet Services (CIS) instance if one is not available. A number of pricing plans are available, including a free trial. The provisioning process of a new CIS will explain how to configure your existing DNS registrar (probably outside of IBM) to use the CIS-provided domain name servers. The post uses ibmom.com for the DNS name.

Follow the instructions in the companion GitHub repository to provision the VPC, VSI, NLB and CIS Configuration on your desktop or in IBM Cloud Schematics. After provisioning is complete, the Terraform output will show test curl commands that can be executed to verify load is being balanced across zones via the GLB and across servers via the NLB.

Visit the IBM Cloud Console Resource list. Find and click on the name of the Internet Services product to open the CIS instance and navigate to the Reliability section of the CIS instance. Check out the Load balancer, origin pools and health checks. Navigate to the VPC Infrastructure and find the VPC, subnets, NLBs, etc. Verify that the CIS GLB is connected to the IP addresses of the VPC NLBs.

Kubernetes and OpenShift

The same architecture can be used for Red Hat OpenShift on IBM Cloud or IBM Cloud Kubernetes Service. The IBM Cloud Kubernetes Service worker nodes replace the servers in the original diagram:

Follow the instructions in the companion GitHub repository to provision the IKS, NLB, CIS Configuration on your desktop or in IBM Cloud Schematics. While the Kubernetes Service cluster is being provisioned, read on to understand the Kubernetes resources configured.

Kubernetes deployments, by default, will spread pods evenly across worker nodes (and zones). The example configuration uses a nodeSelector to place the pods on zone-specific worker nodes (like those in us-south-1) using an IBM Cloud node attribute shown in the cutdown below:

kind: Deployment
  metadata:
    labels:
      app: cogs
    name: cogs-0
    namespace: default
  spec:
    replicas: 2
    selector:
      matchLabels:
        app: cogs
    template:
      metadata:
        labels:
          app: cogs
      spec:
        nodeSelector:
          ibm-cloud.kubernetes.io/zone: us-south-1
        containers:
          ... pod code
          …

A Kubernetes service is configured to expose applications using load balancers for VPC. Each service is configured with a VPC NLB that can be access publicly. A service is created for each zone.

The service ingress is configured to keep the load in the worker node that receives the network request using externalTrafficPolicy: Local. The Kubernetes default policy will balance the load across all selected pods in all workers in all zones. The default may be preferred for your workload:

apiVersion: v1
kind: Service
metadata:
    name: myloadbalancer1
    annotations:
        service.kubernetes.io/ibm-load-balancer-cloud-provider-enable-features: "nlb"
        service.kubernetes.io/ibm-load-balancer-cloud-provider-ip-type: "public"
        service.kubernetes.io/ibm-load-balancer-cloud-provider-zone: "us-south-1"
spec:
    type: LoadBalancer
    selector:
        app: cogs
    ports:
     - name: http
         protocol: TCP
         port: 80
    externalTrafficPolicy: Local

After the Terraform provision is complete, visit the IBM Cloud Console Resource list. Find and click on the name of the Internet Services product to open the CIS instance and navigate to the Reliability section of the CIS instance. Check out the Load balancer, origin pools and health checks. Note that the origins contain IP addresses of the Kubernetes Service VPC NLBs.

This can be verified using the cli. The Terraform output has a test_kubectl output that can be used to initialize the Kubernetes kubectl command-line tool. After initialization, get the services to see output like this:

$ kubectl get services
NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
kubernetes                 ClusterIP      172.21.0.1       <none>           443/TCP                      7d
load-balancer-us-south-1   LoadBalancer   172.21.60.163    150.240.66.122   80:30656/TCP,443:31659/TCP   121m
load-balancer-us-south-2   LoadBalancer   172.21.185.234   52.116.196.75    80:32152/TCP,443:32683/TCP   121m

Summary and next steps

The IBM Cloud Internet Services GLB is probing for health checks through the NLB to the server computers. This health check is a path very similar to a client accessing the servers. Under the extremely unlikely event of a zone failure, this architecture will continue to balance load across the remaining zones/workers. Each NLB has a static public IP address that remains fixed for the lifetime of the NLB, so the GLB will not need to be updated.

The TCP traffic in the example is not TLS encrypted. The TLS will need to be managed by the worker applications. IBM Cloud Secrets Manager can be used to automate the distribution of TLS certificates.

If you have feedback, suggestions or questions about this post, please email me or reach out to me on Twitter (@powellquiring).

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters