Deliver Network Parallelism to HPC Workloads in OpenShift on IBM Cloud with Multi-NIC CNI

1 min read

How Multi-NIC CNI enables you to provide automated and scaled network parallelism to your HPC workloads in Red Hat OpenShift on IBM Cloud.

Red Hat OpenShift on IBM Cloud has offered a fast and secure way to automate the deployment and scaling of containerized enterprise workloads since 2019. With continuous and rapid development, this year, we announced that it will also serve as an HPC cluster for resource-intensive workloads. Such workloads are typically serverless computing, analytics, big data (Hadoop and Spark) and machine learning.

The key benefits of HPC infrastructure are high throughput with automated, scaled parallelism of compute, storage and networking building blocks. However, there are still some challenges, and the most critical one is related to network performance. In this blog post, we will go into more detail about the network bottleneck and how to mitigate it.

HPC convergence on the cloud container platform and its network bottleneck barrier

The convergence of HPC workloads on the container platform in the cloud has attracted much attention, with promising flexibility, scalability and cost-effective benefits. In the meantime, an overlaying network bottleneck is a big barrier. Although enabling access to multiple network devices (Multi-NIC) is a direct way to top up the network bandwidth, routing container IP packets across multiple network interfaces is a mess. It requires many manual steps and expertise, especially when we want to remove the overlay stack of the container platform and utilize the full bandwidth. 

Furthermore, unlike dedicated static systems, there are many dynamic changes during the operating time on the cloud infrastructure. For example, the cluster can be automatically scaled by demand. The worker nodes may join and leave the cluster or become unavailable by any chance. Additional network devices or newly-introduced network technology can be added to the node at any point in time. Even for the existing network devices, they might be unconnected by failures from time to time. 

What is Multi-NIC CNI?

Multi-NIC CNI is a project that implements the CNI (Container Network Interface)—an incubating project accepted by Cloud Native Computing Foundation (CNCF)—to deliver a simple, automated and scaled Multi-NIC network solution to the container platform on the cloud infrastructure, starting from Red Hat OpenShift on top of IBM Cloud. Multi-NIC CNI can help you deal with the multi-network complexity and dynamic changes and make your container networks on Cloud ready for the HPC and AI workloads, like a native system. Currently, the Multi-NIC CNI is available on OperatorHub.io for the general Kubernetes community and embedded in the OperatorHub in OpenShift and OKD. 

Get started with Multi-NIC CNI

To get started with Multi-NIC CNI for HPC and AI on Red Hat OpenShift in IBM Cloud, complete the following steps.

Step 1: Get the multi-network cluster ready

The first step is to build an OpenShift Cluster on IBM Cloud infrastructure with the openshift-installer. Check out this article for more information.

After the cluster is ready, you can simply create and attach the secondary networks with the provided Terraform script here.

Step 2: Install the operator from the OperatorHub

Step 2: Install the operator from the OperatorHub

Step 3: Deploy MultiNicNetwork CR

Create MultiNicNetwork CR:

$ cat <<EOF >> network.yaml
apiVersion: multinic.fms.io/v1
kind: MultiNicNetwork
metadata:
  name: multi-nic-sample
spec:
  subnet: "192.168.0.0/16"
  ipam: |
    {
      "type": "multi-nic-ipam",
      "hostBlock": 6, 
      "interfaceBlock": 2,
      "vlanMode": "l3"
    }
  multiNICIPAM: true
  plugin:
    cniVersion: "0.3.0"
    type: ipvlan
    args: 
      mode: l3
  attachPolicy:
    strategy: none
EOF

Apply it to your cluster:

$ kubectl apply -f network.yaml

Wait until it gets ready:

$ kubectl wait multinicnetwork multinic-ipvlanl3 --for jsonpath='{.status.routeStatus}'=Success --timeout=120s

If it completes successfully, you should see something like this:

multinicnetwork.multinic.fms.io/multinic-ipvlanl3 condition met

Step 4: Test the connection

$ kubectl create -f https://raw.githubusercontent.com/foundation-model-stack/multi-nic-cni/main/connection-check/concheck.yaml; kubectl wait job multi-nic-concheck --for condition=complete; oc logs job/multi-nic-concheck

Then, you should see something like this on your screen:

###########################################
## Connection Check: multinic-ipvlanl3
###########################################
2022/09/28 06:37:18 3/3 clients successfully finished
FROM                           TO                              CONNECTED/TOTAL IPs                            BANDWIDTHs
multi-nic-n7zf6-worker-2-86jfd multi-nic-n7zf6-worker-2-dbjpg  2/2             [192.168.0.1 192.168.64.1]     [ 10.1Gbits/sec 10.1Gbits/sec]
multi-nic-n7zf6-worker-2-86jfd multi-nic-n7zf6-worker-2-zt5l5  2/2             [192.168.0.65 192.168.64.65]   [ 10.09Gbits/sec 10.09Gbits/sec]
multi-nic-n7zf6-worker-2-dbjpg multi-nic-n7zf6-worker-2-86jfd  2/2             [192.168.0.129 192.168.64.129] [ 10.07Gbits/sec 10.09Gbits/sec]
multi-nic-n7zf6-worker-2-dbjpg multi-nic-n7zf6-worker-2-zt5l5  2/2             [192.168.0.65 192.168.64.65]   [ 11.07Gbits/sec 11.09Gbits/sec]
multi-nic-n7zf6-worker-2-zt5l5 multi-nic-n7zf6-worker-2-86jfd  2/2             [192.168.0.129 192.168.64.129] [ 10.07Gbits/sec 10.08Gbits/sec]
multi-nic-n7zf6-worker-2-zt5l5 multi-nic-n7zf6-worker-2-dbjpg  2/2             [192.168.0.1 192.168.64.1]     [ 10.08Gbits/sec 10.09Gbits/sec]
###########################################

Conclusion

This blog post highlighted the network bottleneck barrier of HPC convergence on Kubernetes-based cloud container platforms and introduced Multi-NIC CNI, a container network interface plugin that can be used to deliver network parallelism and satisfy the collective needs of HPC workloads. 

Red Hat OpenShift on IBM Cloud pioneered the creation of HPC cloud solutions and can be further enhanced by taking advantage of the use of Multi-NIC CNI with no modification requirement to the routing tables and IP management of the underlay infrastructure. We are looking forward to expanding this solution to other platforms and being widely adopted here and there. The Multi-NIC CNI operator is now an open-source project and is integrated to Operator Hub community. Future collaborations and contributions are more than welcome to build this solution toward a hybrid cloud era. 

Try out the solution on an IBM Cloud OpenShift HPC cluster today.

Be the first to hear about news, product updates, and innovation from IBM Cloud