IBM Support

IBM PureData system for Analytics - Creating a network bond in Redhat Enterprise Linux (RHEL 5) on N1001 series of appliances

Question & Answer


Question

How do I create a network bond in Red Hat Enterprise Linux (RHEL) 5?

Answer

This document applies to Red Hat 5. To create a network bond in RHEL 4:
http://eclient.lenexa.ibm.com:9082/search?fetch=source/TechNote/1570931

By combining multiple Ethernet interfaces in a one virtual interface, known as “bonding” or “teaming,” you can increase network throughput and/or achieve greater redundancy.

The following example uses eth4 and eth5 to create a bond1 interface.  A bond0 interface exists in this example because it exists in DL585 hosts running NPS 3.x on Mustang series hardware.

Work closely with your network administrator. Note that not all of the bonding driver modes listed below may be available, and some modes require specific settings or features of the network switch infrastructure that is not controlled or provided by Netezza.



Note: You must modify both ifcfg-eth4 ifcfg-eth5 and enslave them to bond1. Then, create a bond1 file to specify the network/IP information of the bonded interface.

Steps to Create Bonds in RHEL5

1. Create a new bond1 (or bond2 on DRDB machines) ifcfg file and edit it to look as follows:

Note: The bonding options are now set in the ifcfg file.
    # vi /etc/sysconfig/network-scripts/ifcfg-bond1

    DEVICE=bond1
    BOOTPROTO=static
    IPADDR=10.63.86.42
    NETMASK=255.255.255.128
    ONBOOT=yes
    USERCTL=no
    TXQUEUELEN=100000
    BONDING_OPTS="mode=1 miimon=200”

or
    BONDING_OPTS="mode=4 miimon=200 lacp_rate=fast xmit_hash_policy=layer3+4 use_carrier=1”


2. If you are bonding eth4 and eth 5, the ifcfg-eth4 and ifcfg-eth5 can look nearly identical. The only parameters that need to be adjusted are hwaddr= and device=. Create the two ifcfg files as below:
    # vi /etc/sysconfig/network-scripts/ifcfg-eth4

    DEVICE=eth<N–put your actual interface number here>
    BOOTPROTO=none
    HWADDR=<?–put your NIC MAC address here>
    ONBOOT=yes
    MASTER=bond1
    SLAVE=yes
    USERCTL=no

3. Add the alias for the new bond into modprobe.conf, with no options. The bonding options will be set in the ifcfg file.
    [root@ ~]#more /etc/modprobe.conf
    alias eth0 bnx2
    alias eth1 bnx2
    alias eth2 e1000e
    alias eth3 e1000e
    alias eth4 e1000e
    alias eth5 e1000e
    alias eth6 e1000e
    alias eth7 e1000e
    alias eth8 e1000e
    alias eth9 e1000e
    alias scsi_hostadapter cciss
    alias scsi_hostadapter1 lpfc
    alias bond0 bonding
    alias bond1 bonding

4. Bring the interface up and validate the configuration. Sometimes it is necessary to run ifdown and ifup on slaves and bond1 to get them to come up properly. The other option is to run a service network restart (you cannot use a network restart during a hostname/IP change with a DRDB config).

Note: Replace eth4 and eth5 with your slave interfaces.
    # ifdown eth4
    # ifdown eth5
    # ifup eth4
    # ifup eth5
    # ifup bond1

    # ifconfig
    # cat /proc/net/bonding/bond0

Bonding Options

The following sections summarize the different options for the bonding module or the BONDING_OPTS line in the ifcfg.

max_bonds=This should no longer be used. Specifies the total amount of bonds you have on the system. Do not include the max_bonds parameter if you require multiple bonding devices with different parameters, as it will force all bonds to inherit the settings from bond0.

mode= — Specifies one of four policies allowed for the bonding module.
    0 — Sets a round-robin policy for fault tolerance and load balancing. Transmissions are received and sent out sequentially on each bonded slave interface beginning with the first one available.
    This is the only mode that will stripe one application socket application transfer, ie nzload. All others need 2 or more application socket connections to use available bandwidth.

    1 — Sets an active-backup policy for fault tolerance. Transmissions are received and sent out via the first available bonded slave interface. Another bonded slave interface is only used if the active bonded slave interface fails.

    2 — Sets an XOR (exclusive-or) policy for fault tolerance and load balancing. Using this method, the interface matches up the incoming request's MAC address with the MAC address for one of the slave NICs. Once this link is established, transmissions are sent out sequentially beginning with the first available interface.

    3 — Sets a broadcast policy for fault tolerance. All transmissions are sent on all slave interfaces.

    4 — Sets an IEEE 802.3ad dynamic link aggregation policy. Creates aggregation groups that share the same speed and duplex settings. Transmits and receives on all slaves in the active aggregator. Requires a switch that is 802.3ad compliant.

    5 — Sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. The outgoing traffic is distributed according to the current load on each slave interface. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed slave.

    6 — Sets an Active Load Balancing (ALB) policy for fault tolerance and load balancing. Includes transmit and receive load balancing for IPV4 traffic. Receive load balancing is achieved through ARP negotiation.

Note: Modes 5 and 6 are not “network vendor switch” dependant. Both members of bond must be on the same switch.

miimon= — Specifies (in milliseconds) how often MII link monitoring occurs. Normal values 100-200, recommend 100.

downdelay= — Specifies (in milliseconds) how long to wait after link failure before disabling the link. The value must be a multiple of the value specified in the miimon parameter. The value is set to 0 by default, which disables it.

updelay= — Specifies (in milliseconds) how long to wait before enabling a link. The value must be a multiple of the value specified in the miimon parameter. The value is set to 0 by default, which disables it.

arp_interval= — Specifies (in milliseconds) how often ARP monitoring occurs. The value is set to 0 by default, which disables it.

If using this setting while in mode 0 or 2 (the two load-balancing modes), the network switch must be configured to distribute packets evenly across the NICs. For more information on how to accomplish this, refer to /usr/share/doc/kernel-doc-<kernel-version>/Documentation/networking/bonding.txt.

arp_ip_target= — Specifies the target IP address of ARP requests when the arp_interval parameter is enabled. Up to 16 IP addresses can be specified in a comma separated list.

primary= — Specifies the interface name, such as eth0, of the primary device. The primary device is the first of the bonding interfaces to be used and is not abandoned unless it fails. This setting is particularly useful when one NIC in the bonding interface is faster and, therefore, able to handle a bigger load.

This setting is only valid when the bonding interface is in active-backup mode. Refer to
/usr/share/doc/kernel-doc-<kernel-version>/Documentation/networking/bonding.txt for more information.
multicast= — Specifies an integer value for the type of multicast support desired. Acceptable values for this parameter are:
    0 — Disables multicast support.

    1 — Enables multicast support, but only on the active slave.

    2 — Enables multicast support on all slaves (the default).

Important: It is essential that either the arp_interval and arp_ip_target or miimon parameters are specified. Failure to due so can cause degradation of network performance in the event a link fails.

Verifying Bonding Mode Settings

It is important that you verify that the bonding modes have been properly set. The example below is for a DL585 with bond0 as the NPS SPA fabric network, and bond1 as the customer house bonded network:
    # cat /proc/net/bonding/bond0
    Ethernet Channel Bonding Driver: v2.6.1 (October 29, 2004)

    Bonding Mode: load balancing (xor)
    MII Status: up
    MII Polling Interval (ms): 200
    Up Delay (ms): 0
    Down Delay (ms): 0

    Slave Interface: eth0
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:16:35:7c:de:81

    Slave Interface: eth3
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:16:35:7c:de:80


    # cat /proc/net/bonding/bond1
    Ethernet Channel Bonding Driver: v2.6.1 (October 29, 2004)

    Bonding Mode: active load balancing (alb)
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0

    Slave Interface: eth4
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:17:36:7d:df:82

    Slave Interface: eth5
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:17:36:7d:df:81


Switch Configuration and Modes

Customer switch custom configuration is needed for the different modes of bonding. (See the mode descriptions above.)

Only modes active/backup(mode2) and broadcast (mode3) do not require both ports on the same switch. There are ways around this, but the network admin would have to configure this appropriately and it is not a standard configuration.


Mode(s)Switch-related Notes
Mode 1The active-backup mode typically will work with any Layer-II switch.
Modes 2, 5, 6The active-backup, balance-tlb and balance-alb modes are controlled by software and do not need a switch configured to support.
Mode 4Requires 802.3ad aggregation. For example, Cisco uses a single etherchannel instance, then set the etherchannel mode to "lacp" to enable 802.3ad
Modes 0, 2, 3The balance-rr, balance-xor and broadcast modes require etherchannels or trunking as above.


The transmit policy also needs to be configured, usually XOR by MAC address.

Detailed Bonding Modes Testing

Hardware
  • DL585G1 x 2
  • DL585G2
  • Cisco 3750

Benchmark Tools
  • Netcat

1. Setup a listen server:
    #nc -l -p 7778 > /dev/null

This listens on port 7778 and transfers the output to /dev/null. Use /dev/null to get rid of the hdd write speed caps.

2. Transfer a file:
    #time cat test_500mb | nc -s 192.168.0.101 192.168.0.125 777

This will send the file test_500mb through local interface 192.168.0.101 to 192.168.0.125 using port 7777. Typically transfer from a ramdisk, to avoid hard drive read speed cap.


Benchmarks
  • Loopback check
  • 235MB/s
  • 1882mbs

Round Robin mode= 0:

Round Robin will transmit (tx) a packet onto each slave, dividing a single connection into multiple streams. The receive side is completely controlled by the switches/router that you are going through. Also round robin can have packet order issues, resulting in slower then expected speeds with more overhead. Round Robin will always transmit packets across all active slave interfaces. (This is the only mode that will split 1 connection into 2 etherchannels.)

Routing:

Typically switches will route etherchannels based on either {src-mac | dst-mac | src-dst-mac | src-ip | dst-ip | src-dst-ip }. Therefore, if you are sending one connection to multiple destinations, you will get 2 GB with the dst-mac, dst-ip, or either of the xor methods. The src-mac and src-ip will result in only 1 GB, due to a bond having static MAC/IP for both slave devices. Also, hitting a gateway or a single host will result in only 1 gigabit (a gateway is considered the same as hitting a single host by the etherchannel). The only way around this is to use a switch that allows routing etherchannels by port (src-port | dst-port | src-dst-port). Typically, the xor methods allow for a better bandwidth distribution. The xor port method allows for the best distribution on a round robin bond.


Examples:

1 host> eth6/eth7 = Round Robin bound> cisco> 1000 hosts=2 gigabit
(xor,dst)

1000 hosts> eth6/eth7 = Round Robin bound> cisco> 1 host=2 gigabit
(xor only!)

(Requires an L4 switch)
1 host> eth6/eth7 = Round Robin bound> cisco> 1 host=2 gigabit
(src-port | dst-port | src-dst-port only!)

Results:
---------------------------------------------
RR 2 transfers to 2 different IPs src xor dst ip 2 slaves active:
Transferred 1024MB in 4.558s
224MB/s
1797mbs
---------------------------------------------
RR 2 transfers 1 slave active:
Transferred 1024MB in 8.513s
120MB/s
962mbs
----------------------------------------------

Switch Configuration:
-
port-channel load-balance src-dst-ip (if available, src-port | dst-port | src-dst-port is preferred)
-
interface Port-channel1
description rr channel bundle
switchport trunk encapsulation dot1q
switchport mode trunk
spanning-tree portfast trunk
-
interface GigabitEthernet1/0/1
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 1 mode on
spanning-tree portfast trunk
-
interface GigabitEthernet1/0/2
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 1 mode on
spanning-tree portfast trunk
-


ALB mode = 6

Software active load balance mode. This mode will route packets by the dst-ip. Requires no switch configuration to work; uses arp “trickery” to achieve load balance. Saw no increase on transfers.

Examples:

1 host> eth6/eth7 = Round Robin bound> cisco> 1000 hosts=1 gigabit

1000 hosts> eth6/eth7 = Round Robin bound> cisco> 1 host=1 gigabit

1 host> eth6/eth7 = Round Robin bound> cisco> 1 host=1 gigabit


Results:
---------------------------------------------
RR 2 transfers to 2 different ip’s src xor dst ip 2 slaves active:
Transferred 1024MB in 8.558s
119MB/s
960mbs
---------------------------------------------
RR 2 transfers 1 slave active:
Transferred 1024MB in 8.513s
120MB/s
962mbs
----------------------------------------------

Switch Configuration:
-
interface GigabitEthernet1/0/1
switchport trunk encapsulation dot1q
switchport mode trunk
-
interface GigabitEthernet1/0/2
switchport trunk encapsulation dot1q
switchport mode trunk
-

LACP mode = 4

A limitation of link aggregation is that all physical ports in the link aggregation group must reside on the same switch. Nortel's SMLT, DSMLT and RSMLT technologies, as well as Cisco StackWise, Juniper Virtual Chassis, and Extreme Networks XOS remove this limitation by allowing the physical ports to be split between two switches.

Examples:

1 host> eth6/eth7 = Round Robin bound> cisco> 1000 hosts=2 gigabit

1000 hosts> eth6/eth7 = Round Robin bound> cisco> 1 host=2 gigabit

1 host> eth6/eth7 = Round Robin bound> cisco> 1 host=1 gigabit


lacp_rate=<value>
Specifies the rate at which link partners should transmit LACPDU packets in 802.3ad mode. Possible values are:
· slow or 0 — Default setting. This specifies that partners should transmit LACPDUs every 30 seconds.
· fast or 1 — Specifies that partners should transmit LACPDUs every 1 second


Switch Configuration:
-
interface GigabitEthernet1/0/1
switchport trunk encapsulation dot1q
switchport mode trunk
-
interface GigabitEthernet1/0/2
switchport trunk encapsulation dot1q
switchport mode trunk

Note: If the switch is not set to LACP on group mode, you will see the following (channel-group 1 mode active or passive):
    [root@nps26002 ~]# cat /proc/net/bonding/bond1
            Partner Mac Address: 00:00:00:00:00:00

Correctly configured, you should see a mac address in the Partner MAC field:
    [root@nps26002 ~]# cat /proc/net/bonding/bond1
            Partner Mac Address: 00:1f:26:d4:b1:80

Refer to the Matrix of Cisco Load Balancing Methods.

Enable src-dst-ip on switch. Without src-dst-ip enabled, the packets will route based on MAC address, basically disabling the bond.

    Switch# configure
    Configuring from terminal, memory, or network [terminal]?
    Enter configuration commands, one per line.  End with CNTL/Z.
    Switch(config)#port-channel load-balance ?
      dst-ip       Dst IP Addr
      dst-mac      Dst Mac Addr
      src-dst-ip   Src XOR Dst IP Addr
      src-dst-mac  Src XOR Dst Mac Addr
      src-ip       Src IP Addr
      src-mac      Src Mac Addr

    Switch(config)#port-channel load-balance src-dst-ip  

    Switch#show etherchannel load-balance
    EtherChannel Load-Balancing Operational State (src-dst-ip):
    Non-IP: Source XOR Destination MAC address
      IPv4: Source XOR Destination IP address
      IPv6: Source XOR Destination IP address


Example: Setup a Cisco 3750 for LACP (mode 4 only) or PAGP bonding

This example is provided for Cisco IOS configuration (Cisco CatOS-based devices require alternate configuration).

We require:

1. A port-channel interface to represent the bonded/teamed interfaces.
2. Physical interfaces to join the port-channel.
3. The configurations below to add trunking information.

Note: for PAGP(mode=0, 2, 3) substitute the following settings:
  • channel-protocol pagp
  • channel-group 1 mode auto
    configure terminal
    interface Port-channel1
    description LACP Channel Bundle for virt-host-1
    switchport
    switchport trunk encapsulation dot1q
    switchport trunk allowed vlan all
    switchport trunk native vlan 1
    switchport mode trunk
    no ip address
    no shutdown
    exit

Add physical interfaces to the bonded interface (max 8), note that the "switchport" instructions must match.
    !
    configure terminal
    interface GigabitEthernet1/0/1
    description virt-host-1 port 1 (eth0)
    switchport
    switchport trunk encapsulation dot1q <<< sets trunk type, dot1q vs Cisco ISL
    switchport trunk allowed vlan all <<< select vlans to expose on trunk
    switchport trunk native vlan 1 <<< forces the native VLAN (see warning)
    switchport mode trunk <<< puts the interface into trunk (802.1q) mode
    no ip address
    no shutdown
    channel-protocol lacp <<< selects lacp as the bonding protocol (802.3ad)
    channel-group 1 mode active <<< channel-group # must match port-channel #
    !
    configure terminal
    interface GigabitEthernet1/0/2
    description virt-host-1 port 2 (eth1)
    switchport
    switchport trunk encapsulation dot1q
    switchport trunk allowed vlan 1,50-89
    switchport trunk native vlan 1
    switchport mode trunk
    no ip address
    no shutdown
    channel-protocol lacp
    channel-group 1 mode active
    !

See Cisco for a description of the Cisco channel-group Command.

For more information on bonding, see also:

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":null,"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

NZ029377

Document Information

Modified date:
17 October 2019

UID

swg21586893