IBM Cloud Kubernetes Service: Deployment Patterns for Maximizing Throughput and Availability

By: Arpad Kun

Exploring the new IBM Cloud Kubernetes Service LoadBalancer 2.0

With the upcoming release of Kubernetes version 1.12 on IBM Cloud Kubernetes Service, we are releasing the new IKS LoadBalancer 2.0 for public beta so that customers may test. This article discusses the capabilities of this LoadBalancer service and a few deployment patterns around it, providing examples along the way.

Efficiency and availability

At IBM, it is very important to us that you can maximize your investment dollars in your IBM Cloud Kubernetes Service clusters with both efficiency and availability. There are multiple ways to deploy IBM Cloud Kubernetes Service clusters and the applications within, depending on your applications’ requirements. In this article, we will go through a few deployment patterns and include examples so that you can pick the one that is closest to your requirements and refine further.

Load balancing types

Before we jump into the deployment patterns, let’s quickly go through a few different load balancing types since I will be referring to these concepts later. Feel free to skip these sections if you are familiar with DNS RR, ALB, NLB, and DSR concepts.

DNS-based load balancing (round robin)

DNS round robin is one of the simplest ways to distribute traffic between servers that have different individual IP addresses. By registering multiple A (or AAAA) records for the same host name, the DNS resolver will return all of them—usually as a permuted list for each query. The client usually connects to the first in the list.

DNS-based load balancing (round robin)

There are a lot of limitations and drawbacks with this technique that I will not go into in this article. The fact that it relies completely on the client-side implementation is both an advantage and a disadvantage. (Note: There are DNS providers who implemented features to mitigate a few drawbacks.)

Application load balancing (ALB)

The ALB concept is typically a proxy-like operation where the client connects to a backend service and has to go through a reverse-proxy, which understands the protocol, the application, and is capable of making smart decisions based on the content. For communication between microservices, we typically use HTTP; therefore, an HTTP reverse proxy comes in handy to make decisions based on HTTP location, headers, cookies, parameters, etc.

Application load balancing (ALB)

The client opens a TCP connection to the ALB, the ALB opens another TCP connection to the backends, and based on the load balancing policy, scheduler, and settings, it routes the requests between the available backends. Notice that there is a single IP address in the DNS response—the ALB handles all incoming requests from the clients.

Network load balancing (NLB)

The NLB concept is typically used for horizontal scale-out at Layer 4 (OSI layers). This typically means that TCP and UDP applications can live on multiple servers (physical or virtual) and still share the same IP address. An NLB typically does not terminate the TCP connection; it just acts as a “forwarder” of packets sitting in the middle.

Network load balancing (NLB)

With an NLB, we can also enable the Direct Server Return (DSR) concept, which means that the NLB is not in the return path of the packet. The backend server sees the original source IP address and port of the client and responds directly back without going through the NLB. It is the backend server that actually terminates the TCP connection and returns with the source IP of the NLB (MUX) directly to the client. For some, this might sound like black magic at first, but fundamentally, it is pretty simple. Here is a diagram explaining the concept (follow the packets from the client):

Direct Server Return (DSR)

The NLB (MUX) encapsulates the packet into an IPIP tunnel packet and sends the original packet over to a backend server. The backend server decapsulates it and responds directly to the client IP with the public IP as source ( in my example).

This enables us to preserve the original client IP on protocols that cannot pass it on in headers (like HTTP can) and do horizontal scaling behind the same IP address. Examples of these are typically—but not exclusively—binary protocols like MQTT, RTMP, MySQL, PostgreSQL, etc.

IBM Cloud Kubernetes Service cluster deployment patterns

There are multiple ways to deploy IBM Cloud Kubernetes Service clusters, and there is no silver bullet. You have to understand the following factors:

  • The requirements of the application you are running (including scale)

  • The SLA target

  • The budget

All of these factors will influence the pattern you will choose in the end.

Zone = A fault domain—like a data center—in a metropolitan area (e.g., Dallas, Texas: DAL10).

Region = A combination of multiple zones within the same metropolitan area. For example, Dallas, Texas: US-South can be seen as a combination of DAL10, DAL12, DAL3, which are each individual data centers/fault domains.

Single-zone cluster

The simplest pattern is deploying an IBM Cloud Kubernetes Service cluster in a single zone within a region. Step-by-step example to deploy a single-zone cluster.

Single-zone cluster

There is one A record associated with my ALB’s host name, which is exposed via the LoadBalancer service (NLB). In a single-zone cluster setup, IBM Cloud Kubernetes Service does not configure health checks for the ALBs by default.

On this view, you can see both an HTTP(S) application exposed via the IKS ALB and a binary protocol (MQTT) application, which is also exposed directly via the LoadBalancer service. Notice the DSR operation—both the ALB and the MQTT app returns directly to the client.

Multi-zone cluster—ALB only

In this pattern, you can observe the default behavior of a multi-zone IBM Cloud Kubernetes Service cluster and the ALB. In my example, I am running in three zones in US-South (DAL10, DAL12 and DAL13). Step-by-step example to deploy multi-zone, ALB-use only pattern.

Single-zone cluster

There are three A records associated with my ALB’s host name, which is exposed via a LoadBalancer service (therefore the traffic is flowing through the NLB). There are Health Checks configured for the ALBs in each zone. If there is a zone failure, the ALB’s IP address is automatically removed from the DNS response within ~60 seconds.

Notice the DSR behavior of the ALB. By default, the backend pods are treated equally behind the ALBs, regardless of if they run in the local zone or in a remote zone. This way, the computing capacity of the region is aggregated.

In the case of a zone-failure, there are two locations where there will be some requests lost:

  • The local ALBs will be removed from the DNS in ~60 seconds via the health checks. The remaining ALBs in the other (still operational) zones will try to connect to the configured endpoints. Clients connecting to the IP that is down will experience TCP timeout and reconnect to a subsequent IP in the list (client-side implementation specific).

  • Once the worker nodes in the failing zone go NotReady, the endpoints will be removed from the still-operational ALB configs within 40 seconds. In those 40 seconds, the ALBs in the healthy zones will still try to send requests to the failed zone’s endpoints, but once they hit a failure (like an endpoint not responding, which will result in a 502 response), it will hold off sending another request to that same endpoint for 10 seconds. Depending on how many endpoints you have in the failing zone and how many ALB pods you are running, a different number of requests are going to fail. As a rule of thumb, you can use the following calculation: [Number of endpoints in the failing zone] * [Number of ALB pods] * 4.

Multi-zone cluster—NLB

There are two major ways to expose applications via the NLB:

  • Keep the destination endpoints of the application where the NLB sends the requests to within the same zone.

  • Aggregate the capacity of the whole region by allowing the NLBs to distribute traffic to all zones within the region.

Let’s take a closer look at each.

Keeping traffic local within a zone—Step-by-step example to deploy multi-zone, local-endpoints only patterns.

There are three A records associated with each LoadBalancer service that lives in each zone. Incoming traffic to the NLBs is sent to the local endpoints within the zone only—the return traffic is leaving through the local default gateway of the IBM Cloud Kubernetes Service worker nodes. This is not the default behavior. In order to achieve this, you have to prepare your application deployment by using node selectors and annotations.

Multi-zone cluster—NLB


Aggregating capacity of the region (all zones)

There are three A records associated with each LoadBalancer service that lives in each zone. Incoming traffic to the NLBs are sent across all service endpoints within the cluster, regardless of if the endpoints live in the same zone or another zone within the region. Step-by-step example to deploy aggregating region capacity pattern.

Important: There is a manual step you must do for this to work, unfortunately. You have to open a ticket on the portal (Technical > Infrastructure > Public Network Question) and add the following to the request:

  • “Please set up the network to allow capacity aggregation on my VLANs associated with my account. (It is even better if you list your VLANs.) The reference ticket for this request is:″

You have to repeat this if a new VLAN is created under your account (like ordering a new IBM Cloud Kubernetes Service cluster in a different region where you had no VLAN before). We fully understand this is a suboptimal user experience, and we are working to improve it as soon as possible.

Aggregating capacity of the region (all zones)

This is really powerful for aggregating the available outgoing bandwidth (per worker node) on the return path. Also, it is an excellent technique to aggregate the computing capacity of your whole region. An example of this is if you run something CPU-intensive that renders a single worker node very busy and you want to scale out horizontally to multiple worker nodes behind the same IP address. Like running a complicated Lua code or turning short cat videos upside down by re-rendering the clip.

Multiple single-zone deployments in one region

It is also a completely valid deployment pattern if you want to run a single IBM Cloud Kubernetes Service cluster per zone, resulting in multiple clusters in a region.

Multiple single-zone deployments in one region

In this pattern, you can see there are two zones (DAL10 and DAL12), and each has a single-zone IBM Cloud Kubernetes Service Cluster. With this setup, you lose features like the automated health checks and DNS failover across zones provided by IKS. The ALB host names are also going to be different, and the SSL certificate that is generated automatically by IBM Cloud Kubernetes Service is not distributed across the clusters—they are becoming their own mini fault domains.

With CIS (Cloud Internet Services) on the IaaS dashboard, however, you can build your own health checks and use the GLB function to configure very similar behavior with what IBM Cloud Kubernetes Service provides automatically with a multi-zone cluster. CIS is powered by Cloudflare, and it enables services such as authoritative DNS servers, global and local load balancing, web application firewall (WAF), DDoS protection, caching, and page rules. You can order CIS through this link. For further documentation on CIS, refer to this page.

The global view—GLB enabled

Once you have deployed your application to multiple IBM Cloud Kubernetes Service clusters around the globe, you can use CIS to enable global load balancing and achieve the following:

The global view—GLB enabled

In this example, the theoretical website is CNAME’d to the ALB’s hostname, which already has the health checks enabled for all three zones.

With the GLB, it is possible to set up to send end-users from Europe to the local EU-DE cluster while end-users from North America will be directed to the cluster in US-SOUTH region. This works with both use cases when you use DNS-only mode (i.e., send your end-users directly to the IBM Cloud Kubernetes Service cluster) or use proxy mode when the traffic is proxied via Cloudflare.

Finding the right pattern

As you learn more about your workload, you can adjust and even switch between patterns as needed. Different applications will require different patterns; please let us help you decide which is right!

You can learn more about the various deployment patterns in the following posts:

Contact us

If you have questions, engage our team via Slack by registering here and joining the discussion in the #general channel on our public IBM Cloud Kubernetes Service Slack.

Be the first to hear about news, product updates, and innovation from IBM Cloud