Keepalived and HAProxy

Find out how to install HAProxy and high availability with keepalived.

Red Hat OpenShift Container Platform (OpenShift) requires a load balancer for the following tasks:

API load balancing (Configure and Control the internals of OpenShift)
Application ingress load balancing (Load balance requests workload running in OpenShift)
External access (Make both load balancers externally accessible)

For load balancing, the software HAProxy is used: haproxy.org and haproxy.com.

The HAProxy load balancer is critical for the reachability and health of OpenShift, and for this reason a second instance of HAProxy serves as a backup. For this kind of active/passive set up, keepalived is used to keep track of two floating IP addresses. One floating IP for the 10er network and one for 172er network.

You can find a blog entry about keepalived in this context here: Pen Test Partners blog.

Set two DNS servers via systemd-resolved

Previously two KVM guests, loadbalancer-1 and loadbalancer-2, were installed.

Perform on loadbalancer-1 and loadbalancer-2.

Both loadbalancer guests must be able to resolve:

Internet domains: In the following via the default NIC. Alternative: via the custom DNS server: 10.128.0.1.
OpenShift internal domains ending with .sa.boe and .ocp0.sa.boe to 10.128.0.0/14 ip addresses: via the custom DNS server: 10.128.0.1.

For the following setup, two DNS servers are used. This means that a split-DNS setup is required because by default the system can only handle one DNS server.

systemd-resolved is used for a split-dns setup. systemd-resolved automatically takes the configuration from the NetworkManager connection profiles (nmcli c show). As soon as the systemd-resolved is the system-wide DNS resolver, the DNS requests are handled as follows:

The routing table (ip route) and search DNS domain (sa.boe ocp0.sa.boe) determines which interface for the DNS requests is used.
The DNS server set on this specific interface is then used.
In this example the DNS server is automatically received via DHCP for all interfaces. This means that no further configuration changes are needed.

Install systemd-resolved:
```
yum install systemd-resolved
```

Activate and start systemd-resolved. After that, check the status:

systemctl enable systemd-resolved --now 
systemctl status systemd-resolved

Edit /etc/NetworkManager/NetworkManager.conf to make it the system-wide DNS resolver:
```
[main]
dns=systemd-resolved
```
Reload the NetworkManager to make the changes active:
```
systemctl reload NetworkManager
```
To verify that systemd-resolved is now used as system-wide DNS resolver, run the following command:
```
cat /etc/resolv.conf
```
The output should look similar to:
```
# Generated by NetworkManager
search sa.boe ocp0.sa.boe
nameserver 127.0.0.53
```
When the file was not generated by NetworkManager as above you can link to the systemd-resolved resolv.conf as follows:
```
mv /etc/resolv.conf /root/resolv.conf.backup
ln -s /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
```
The custom DNS server running on 10.128.0.1 does not currently have DNSSEC enabled and therefore you might have to disable DNSSEC in the following. Skip this step if you configured DNSSEC correctly.
Edit /etc/systemd/resolved.conf :
```
DNSSEC=no
```
Note: You don't have to set the DNS server option DNS= here because the settings are automatically taken from the NetworkManager.
Restart systemd-resolved to put the changes in effect:
```
systemctl restart systemd-resolved.service
```

Verify and display the configuration:

yum provides dig
yum install bind-utils
dig bastion.sa.boe

resolvectl status
resolvectl query bastion.sa.boe
resolvectl query control0.ocp0.sa.boe

Keepalived installation

The latest version of the keepalived documentation is hard to find. The only reliable source of information that can be trusted is the man page.

References:

look at the man page: man keepalived.conf
keepalived documentation from Red Hat (outdated): High availability guide
Keepalived instructions for Red Hat Enterprise Linux 8 (RHEL): https://access.redhat.com/solutions/798373.

Perform on loadbalancer-1 and loadbalancer-2:

Install keepalived:

yum install keepalived

yum provides killall
yum install psmisc

Add a keepalived user for script executions and configure Linux networking:

useradd keepalived_script

# enable ip forwarding
sysctl -w net.ipv4.ip_forward="1"

# This option seems to be required and allows
# applications to bind to ip addresses which does not exist
# on the system (yet). For example a service which might get the virtual
# ip address later.
sysctl -w net.ipv4.ip_nonlocal_bind="1"

Copy the example configurations from example configuration files to /etc/keepalived/keepalived.conf. The active configuration is copied to loadbalancer-1 and the passive config is copied to loadbalancer-2.

Configure the firewall to allow the VRRP protocol:

firewall-cmd --permanent --new-service=VRRP
firewall-cmd --permanent --service=VRRP --set-description="Virtual Router Redundancy Protocol"
firewall-cmd --permanent --service=VRRP --set-short=VRRP
firewall-cmd --permanent --service=VRRP --add-protocol=vrrp
firewall-cmd --permanent --service=VRRP --set-destination=ipv4:224.0.0.18
firewall-cmd --add-service=VRRP --permanent

Start and verify the keepalived service. It will probably fail because of selinux permission issues:
```
systemctl enable keepalived --now
```

To fix the selinux permissions for scripts that are run by keepalived, repeat the following process to create custom "Type Enforcement (TE) allow rules". Perform this on one of them, for example, loadbalancer-1:

systemctl restart keepalived

grep keepalived_t /var/log/audit/audit.log|audit2allow -M keepalived_custom

# this will generate two files: keepalived_custom.te and keepalived_custom.pp
# we install the custom allowance rules with:
semodule -i keepalived_custom.pp

When you repeat the above process the keepalived_custom.te file grows until it contains all the required allowance rules. It should look similar to this example:

module keepalived_custom 1.0;

require {
    type init_t;
    type keepalived_t;
    type fs_t;
    type haproxy_unit_file_t;
    type systemd_systemctl_exec_t;
    class file { execute execute_no_trans getattr map open read };
    class filesystem getattr;
    class unix_stream_socket connectto;
    class service status;
}

#============= keepalived_t ==============
allow keepalived_t fs_t:filesystem getattr;
allow keepalived_t haproxy_unit_file_t:service status;
allow keepalived_t init_t:unix_stream_socket connectto;

allow keepalived_t systemd_systemctl_exec_t:file { execute execute_no_trans getattr map open read };

The keepalived_custom.pp file is the compiled form to install it to your system. You can copy the final version to the loadbalancer-2 and install it via semodule -i keepalived_custom.pp directly.

Note: When md5 authentication is enabled then the protocol number for VRRP number 112 is changed to 51 which can lead to blocked packets by the firewall --add-protocol .... Source: https://kb.juniper.net/InfoCenter/index?page=content&id=KB13332&cat=E_SERIES&actp=LIST.

Additional information:

Avoid split brain with keepalived: Taken from this discussion to avoid split brain: Use the same interfaces for multicast traffic then for the virtual ip address. The solution in use is that the virtual network created by KVM is always up even when all OSA and RoCE cards fail.

Allow a script run by a non-root user (keepalived_script) to run a script which requires root privileges.

# vim /etc/sudoers.d/systemctl4keepalived_script
keepalived_script  ALL=(ALL)  NOPASSWD: /usr/bin/systemctl

# test script (with correct user):
/bin/su -c "/usr/bin/killall -0 haproxy" - "keepalived_script"
/bin/su -c "/usr/bin/systemctl is-active --quiet haproxy" - "keepalived_script"

Test multicast connectivity:

yum install omping
omping -m 224.0.0.18 -p 1234 192.168.122.31 192.168.122.32

Use systemd-analyze blame after a reboot to verify the startup order. If necessary,change the requirements or startup priorities.

Install HAProxy

Perform on loadbalancer-1 and loadbalancer-2:

Install HAProxy, back up the old configuration file and copy the files from the Example configurations files section below to /etc/haproxy/haproxy.cfg. Use the active HAProxy configuration file for loadbalancer-1 and the passive HAProxy configuration file for loadbalancer-2.

Install the HAProxy package:
```
yum install haproxy -y
```
Back up the HAProxy configuration and configure HAProxy as needed. Use the example configuration below:
```
cp haproxy.cfg haproxy.cfg.old
# use config from example
vim haproxy.cfg
```
Set the required selinux permission for HAProxy:
```
setsebool -P haproxy_connect_any=1
```

Set the required firewall rules:

firewall-cmd --permanent --new-service=haproxy
firewall-cmd --permanent --service=haproxy --set-description="haproxy firewall rules for openshift"
firewall-cmd --permanent --service=haproxy --set-short="haproxy for openshift"
firewall-cmd --permanent --service=haproxy --add-port=443/tcp
firewall-cmd --permanent --service=haproxy --add-port=80/tcp
firewall-cmd --permanent --service=haproxy --add-port=1936/tcp
firewall-cmd --permanent --service=haproxy --add-port=6443/tcp
firewall-cmd --permanent --service=haproxy --add-port=22623/tcp
firewall-cmd --add-service=haproxy --permanent
firewall-cmd --reload

Start the HAProxy service and enable it. Verify that the service runs correctly:
```
systemctl enable haproxy.service --now
systemctl status haproxy.service
```

Reference: HAProxy documentation

Example configuration files

haproxy-active.conf

global
    log         /dev/log local0 info
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    daemon
defaults
    mode                    http
    log                     global
    option                  dontlognull
    option                  http-server-close
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000
frontend stats
    bind *:1936
    mode http
    log  global
    maxconn 10
    stats enable
    stats hide-version
    stats refresh 10s
    stats show-node
    stats show-desc Stats for ocp0 cluster
    stats auth admin:password
    stats uri /stats
listen api-server-6443
    bind *:6443
    mode tcp
    balance roundrobin
    server bootstrap bootstrap.ocp0.sa.boe:6443 check inter 1s backup
    server control0 control0.ocp0.sa.boe:6443 check inter 1s
    server control1 control1.ocp0.sa.boe:6443 check inter 1s
    server control2 control2.ocp0.sa.boe:6443 check inter 1s
listen machine-config-server-22623
    bind *:22623
    mode tcp
    balance roundrobin
    server bootstrap bootstrap.ocp0.sa.boe:22623 check inter 1s backup
    server control0 control0.ocp0.sa.boe:22623 check inter 1s
    server control1 control1.ocp0.sa.boe:22623 check inter 1s
    server control2 control2.ocp0.sa.boe:22623 check inter 1s
listen ingress-router-443
    bind *:443
    mode tcp
    balance source
    server compute0 compute0.ocp0.sa.boe:443 check inter 1s
    server compute1 compute1.ocp0.sa.boe:443 check inter 1s
listen ingress-router-80
    bind *:80
    mode tcp
    balance source
    server compute0 compute0.ocp0.sa.boe:80 check inter 1s
    server compute1 compute1.ocp0.sa.boe:80 check inter 1s

haproxy-passive.conf

global
    log         /dev/log local0 info
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    daemon
defaults
    mode                    http
    log                     global
    option                  dontlognull
    option                  http-server-close
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000
frontend stats
    bind *:1936
    mode            http
    log             global
    maxconn 10
    stats enable
    stats hide-version
    stats refresh 10s
    stats show-node
    stats show-desc Stats for ocp0 cluster
    stats auth admin:password
    stats uri /stats
listen api-server-6443
    bind *:6443
    mode tcp
    balance roundrobin
    server bootstrap bootstrap.ocp0.sa.boe:6443 check inter 1s backup
    server control0 control0.ocp0.sa.boe:6443 check inter 1s
    server control1 control1.ocp0.sa.boe:6443 check inter 1s
    server control2 control2.ocp0.sa.boe:6443 check inter 1s
listen machine-config-server-22623
    bind *:22623
    mode tcp
    balance roundrobin
    server bootstrap bootstrap.ocp0.sa.boe:22623 check inter 1s backup
    server control0 control0.ocp0.sa.boe:22623 check inter 1s
    server control1 control1.ocp0.sa.boe:22623 check inter 1s
    server control2 control2.ocp0.sa.boe:22623 check inter 1s
listen ingress-router-443
    bind *:443
    mode tcp
    balance source
    server compute0 compute0.ocp0.sa.boe:443 check inter 1s
    server compute1 compute1.ocp0.sa.boe:443 check inter 1s
listen ingress-router-80
    bind *:80
    mode tcp
    balance source
    server compute0 compute0.ocp0.sa.boe:80 check inter 1s
    server compute1 compute1.ocp0.sa.boe:80 check inter 1s

keepalived-active.conf

# https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.6/html/high_availability_guide/configuring_haproxy
# new version https://access.redhat.com/documentation/en-us/red_hat_cloudforms/5.0/html-single/high_availability_guide/index
# https://readthedocs.org/projects/keepalived-pqa/downloads/pdf/latest/
# https://www.keepalived.org/manpage.html

# the man pages seems to be the only source which can be trusted

# version 3 supports ipv6, default is version 2

global_defs {
    process_names
    # check that the script can only be edited by root
    enable_script_security
    # systemctl does only work with root
    script_user root
    # dynamic_interfaces
    vrrp_version 3

    # after switching to MASTER state 5 gratuitous arp (garp) are send and
    # after 5 seconds another 5 garp are send. (For the switches to update
    # the arp table)
    # the following option disables the second time 5 garp are send (as this
    # is not necessary with modern switches)
    vrrp_min_garp true

    # disables non compliant features (e.g. unicast_peers)
    vrrp_strict

    # default and assigned by IANA for VRRP
    # vrrp_multicast_group4 224.0.0.18

    # optimization option for advanced use
    # max_auto_priority
}

vrrp_script chk_haproxy {
    # Note: use su -c to check if that commmand works under the
    #       keepalived_script user.
    # simple way of checking haproxy process:
    # script "/usr/bin/killall -0 haproxy"
    # the more intelligent way of checking the haproxy process
    script "/usr/bin/systemctl is-active --quiet haproxy"

    fall 2                               # 2 fails required for failure
    rise 2                               # 2 OKs required to consider the
                                         #   process up after failure
    interval 5                           # check every 5 seconds
    weight 51                            # add 50 points rc=0
}


vrrp_instance VI_1 {
    state MASTER                # MASTER on haproxy1, BACKUP on haproxy2

    interface enc1              # perform multicast over the KVM DHCP network

    virtual_router_id 1         # unique, same across peers

    priority 100                # most relevant for electing master (for master
                                #   50 more then on the other machines)

    advert_int 5                # specify the advertisement interval in seconds

    # check that 10er network is up
    track_interface {
        enc5 weight 50
    }

    # check that haproxy is up
    track_script {
        chk_haproxy
    }

    #authentication {            # non compliant but maybe good with unicast
    #    auth_type PASS
    #    auth_pass pass4kee      # 8 characters
    #}

    # Defaults to primary ip on the interface.
    # Does not really matter as the answer is received anyways with multicast.
    # You can hide the location of VRRPD by changing this source IP address
    #mcast_src_ip 192.168.123.123

    # unicast is not compliant (therefore not used but would be more simple)
    #unicast_src_ip 10.129.0.1   # unicast to talk with other instance
    #unicast_peer {              # VRRP adverts will be send to following peers
    #    10.130.0.1
    #}

    virtual_ipaddress {
        172.18.100.100/16 brd 172.18.255.255 dev enc6        # virtual ip address
        10.131.0.1/14 brd 10.131.255.255 dev enc5           # virtual ip address
    }
}

keepalived-passive.conf

# https://access.redhat.com/documentation/en-us/red_hat_cloudforms/4.6/html/high_availability_guide/configuring_haproxy
# new version https://access.redhat.com/documentation/en-us/red_hat_cloudforms/5.0/html-single/high_availability_guide/index
# https://readthedocs.org/projects/keepalived-pqa/downloads/pdf/latest/
# https://www.keepalived.org/manpage.html

# the man pages seems to be the only source which can be trusted

# version 3 supports ipv6, default is version 2

global_defs {
    process_names
    # check that the script can only be edited by root
    enable_script_security
    # systemctl does only work with root
    script_user root
    # dynamic_interfaces
    vrrp_version 3

    # after switching to MASTER state 5 gratuitous arp (garp) are send and
    # after 5 seconds another 5 garp are send. (For the switches to update
    # the arp table)
    # the following option disables the second time 5 garp are send (as this
    # is not necessary with modern switches)
    vrrp_min_garp true

    # disables non compliant features (e.g. unicast_peers)
    vrrp_strict

    # default and assigned by IANA for VRRP
    # vrrp_multicast_group4 224.0.0.18

    # optimization option for advanced use
    # max_auto_priority
}

vrrp_script chk_haproxy {
    # Note: use su -c to check if that commmand works under the
    #       keepalived_script user.
    # simple way of checking haproxy process:
    # script "/usr/bin/killall -0 haproxy"
    # the more intelligent way of checking the haproxy process
    script "/usr/bin/systemctl is-active --quiet haproxy"

    fall 2                               # 2 fails required for failure
    rise 2                               # 2 OKs required to consider the
                                         #   process up after failure
    interval 5                           # check every 5 seconds
    weight 51                            # add 50 points rc=0
}


vrrp_instance VI_1 {
    state BACKUP                # MASTER on haproxy1, BACKUP on haproxy2

    interface enc1              # perform multicast over the KVM DHCP network

    virtual_router_id 1         # unique, same across peers

    priority 50                 # most relevant for electing master (for master
                                #   50 more then on the other machines)

    advert_int 5                # specify the advertisement interval in seconds

    # check that 10er network is up
    track_interface {
        enc5 weight 50
    }

    # check that haproxy is up
    track_script {
        chk_haproxy
    }

    #authentication {            # non compliant but maybe good with unicast
    #    auth_type PASS
    #    auth_pass pass4kee      # 8 characters
    #}

    # Defaults to primary ip on the interface.
    # Does not really matter as the answer is received anyways with multicast.
    # You can hide the location of VRRPD by changing this source IP address
    #mcast_src_ip 192.168.123.123

    # unicast is not compliant (therefore not used but would be more simple)
    #unicast_src_ip 10.130.0.1   # unicast to talk with other instance
    #unicast_peer {              # VRRP adverts will be send to following peers
    #    10.129.0.1
    #}

    virtual_ipaddress {
        172.18.100.100/16 brd 172.18.255.255 dev enc6        # virtual ip address
        10.131.0.1/14 brd 10.131.255.255 dev enc5           # virtual ip address
    }
}