As of 14 June 2023, PROXY protocol is supported for Ingress Controllers in Red Hat OpenShift on IBM Cloud clusters hosted on VPC infrastructure.

Introduction

Modern software architectures often include multiple layers of proxies and load balancers. Preserving the IP address of the original client through these layers is challenging, but might be required for your use cases. A potential solution for the problem is to use PROXY Protocol.

Starting with Red Hat OpenShift on IBM Cloud version 4.13, PROXY protocol is now supported for Ingress Controllers in clusters hosted on VPC infrastructure.

If you are interested in using PROXY protocol for Ingress Controllers on IBM Cloud Kubernetes Service clusters, you can find more information in our previous blog post.

Setting up PROXY protocol for OpenShift Ingress Controllers

When using PROXY protocol for source address preservation, all proxies that terminate TCP connections in the chain must be configured to send and receive PROXY protocol headers after initiating L4 connections. In the case of Red Hat OpenShift on IBM Cloud clusters running on VPC infrastructure, we have two proxies: the VPC Application Load Balancer (ALB) and the Ingress Controller.

On OpenShift clusters, the Ingress Operator is responsible for managing the Ingress Controller instances and the load balancers used to expose the Ingress Controllers. The operator watches IngressController resources on the cluster and makes adjustments to match the desired state.

Thanks to the Ingress Operator, we can enable PROXY protocol for both of our proxies at once. All we need to do is to change the endpointPublishingStrategy configuration on our IngressController resource:

endpointPublishingStrategy:
  type: LoadBalancerService
  loadBalancer:
    scope: External
    providerParameters:
      type: IBM
      ibm:
        protocol: PROXY

When you apply the previous configuration, the operat,or switches the Ingress Controller into PROXY protocol mode and adds the service.kubernetes.io/ibm-load-balancer-cloud-provider-enable-features: "proxy-protocol" annotation to the corresponding LoadBalancer typed Service resource, enabling PROXY protocol for the VPC ALB.

Example

In this example, we deployed a test application in a single-zone Red Hat OpenShift on IBM Cloud 4.13 cluster that uses VPC generation 2 compute. The application accepts HTTP connections and returns information about the received requests, such as the client address. The application is exposed by the default-router created by the OpenShift Ingress Operator on the echo.example.com domain.

Client information without using PROXY protocol

By default, the PROXY protocol is not enabled. Let’s test accessing the application:

$ curl https://echo.example.com

Hostname: test-application-cd7cd98f7-9xbvm

Pod Information:
    -no pod information available-

Server values:
    server_version=nginx: 1.13.3 - lua: 10008

Request Information:
    client_address=172.24.84.165
    method=GET
    real path=/
    query=
    request_version=1.1
    request_scheme=http
    request_uri=http://echo.example.com:8080/

Request Headers:
    accept=*/*
    forwarded=for=10.240.128.45;host=echo.example.com;proto=https
    host=echo.example.com
    user-agent=curl/7.87.0
    x-forwarded-for=10.240.128.45
    x-forwarded-host=echo.example.com
    x-forwarded-port=443
    x-forwarded-proto=https

Request Body:
    -no body in request-

As you can see, the address in the x-forwarded-for header 10.240.128.45 does not match your address. That is the worker node’s address that received the request from the VPC load balancer. That means we can not recover the original address of the client:

$ kubectl get nodes
NAME            STATUS   ROLES           AGE     VERSION
10.240.128.45   Ready    master,worker   5h33m   v1.26.3+b404935
10.240.128.46   Ready    master,worker   5h32m   v1.26.3+b404935

Enabling PROXY protocol on the default ingress controller

First, edit the Ingress Controller resource:

oc -n openshift-ingress-operator edit ingresscontroller/default

In the Ingress controller resource, find the spec.endpointPublishingStrategy.loadBalancer section and define the following providerParameters values:

endpointPublishingStrategy:
  loadBalancer:
    providerParameters:
      type: IBM
      ibm:
        protocol: PROXY
    scope: External
  type: LoadBalancerService

Then, save and apply the resource.

Client information using PROXY protocol

Wait until the default-router pods are recycled and test access to the application again:

$ curl https://echo.example.com


Hostname: test-application-cd7cd98f7-9xbvm

Pod Information:
    -no pod information available-

Server values:
    server_version=nginx: 1.13.3 - lua: 10008

Request Information:
    client_address=172.24.84.184
    method=GET
    real path=/
    query=
    request_version=1.1
    request_scheme=http
    request_uri=http://echo.example.com:8080/

Request Headers:
    accept=*/*
    forwarded=for=192.0.2.42;host=echo.example.com;proto=https
    host=echo.example.com
    user-agent=curl/7.87.0
    x-forwarded-for=192.0.2.42
    x-forwarded-host=echo.example.com
    x-forwarded-port=443
    x-forwarded-proto=https

Request Body:
    -no body in request-

This time, you can find the actual client address 192.0.2.42 in the request headers, which is the actual public IP address of the original client.

Limitations

The PROXY protocol feature on Red Hat OpenShift on IBM Cloud is supported for only VPC generation 2 clusters that run 4.13 OpenShift version or later.

More information

For more information, check out our official documentation about exposing apps with load balancers, enabling PROXY protocol for Ingress Controllers or the Red Hat OpenShift documentation.

Was this article helpful?
YesNo

More from Cloud

Apache Kafka use cases: Driving innovation across diverse industries

6 min read - Apache Kafka is an open-source, distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. Whether checking an account balance, streaming Netflix or browsing LinkedIn, today’s users expect near real-time experiences from apps. Apache Kafka’s event-driven architecture was designed to store data and broadcast events in real-time, making it both a message broker and a storage unit that enables real-time…

Primary storage vs. secondary storage: What’s the difference?

6 min read - What is primary storage? Computer memory is prioritized according to how often that memory is required for use in carrying out operating functions. Primary storage is the means of containing primary memory (or main memory), which is the computer’s working memory and major operational component. The main or primary memory is also called “main storage” or “internal memory.” It holds relatively concise amounts of data, which the computer can access as it functions. Because primary memory is so frequently accessed,…

Cloud investments soar as AI advances

3 min read - These days, cloud news often gets overshadowed by anything and everything related to AI. The truth is they go hand-in-hand since many enterprises use cloud computing to deliver AI and generative AI at scale. "Hybrid cloud and AI are two sides of the same coin because it's all about the data," said Ric Lewis, IBM’s SVP of Infrastructure, at Think 2024. To function well, generative AI systems need to access the data that feeds its models wherever it resides. Enter…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters