Ubuntu 14.04 LACP + VLAN Setup

Using LACP and tagged VLANs in Ubuntu 14.04

 

Setting up a load balanced VLAN enabled link in Ubuntu isn't all roses.

We did this very thing a few days back and in this page you'll find what we configured.

Some things are really quirky, butI don't think it's possible to completely fix this in Ubuntu.

 

This article is just there to help you as much as possible.

 

A short overview

LACP is a modern protocol for combining multiple network interfaces into one. It actively exchanges status data between partners (switch and server) and is very easy to configure on most devices. Failures are detected reliably and fast. A failed part of the link is automatically redistributed among the others.

No bandwidth is wasted for a "failover" link.

Adding tagged VLAN interfaces on top of the link allows to easily deploy new networks to a system without adding more cabling. If you want redundancy for many networks it's a lot easier to only deploy those networks on top of an existing redundant link.

With better switches the configuration is automatic to a point that you only need to plug your cables and they'll automatically match your server's cables into the same group.

The big picture 

 

You see available interfaces (eth0-eth3), the bonding (mess of water), assigning vlans (above the mess) and advanced direction of outgoing traffic (river).

 

Setup

Packages

APT package installation
 # apt-get install ifenslave vlan ethtool

 

Both are to needed have it actually work. error messages without are highly misleading. bond0 will not come up if ifenslave isn't there to assign interfaces, but it will *not* tell you so.

Kernel modules

/etc/modules
# cat /etc/modules 
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
# Parameters can be specified after the module name.
lp
rtc
bonding
8021q

 

The kernel modules are also needed to have it actually work. error messages without them pre-loaded are also highly misleading :)

 

Kernel module options

/etc/modprobe.d/bonding.conf
options bonding max_bonds=4 mode=4 lacp-rate=1 miimon=100 xmit-hash-policy=layer3+4 use_carrier=0

 

In theory this should not be needed because it could be configured in /etc/network/interfaces. But the reality differs. The bonding module is loaded by the kernel, and the kernel sets the bonding default mode. The default is active-passive, and no matter what documentation says, you'll end up with an active-passive bond (as seen in /proc/net/bonding/bond0). So to make sure your options aren't ignored, correctly initialize the bonding module prior to load.

max_bonds=4 generates additional bonding devices for future use, if you'd rather not have that, set it to max_bonds=1

mode=4 sets the mode to active LACP where the server and the uplink switch start exchanging LACPDU data. (link leads to IEEE standard)

lacp_rate=1 sets the interval for LACPDU exchanges to "fast" meaning one is send every second. At works a failover should occur after 3 retries, but in reality it'll usually be just a single second.

miimon=100 tells linux to query the "media independent interface" for link status every 100ms

use_carrier=0 tells linux to not query the driver itself for the linux status. apparently some Linux drivers don't implement that but DON'T have a way of telling the client so. That means, they'll say link is down even if the card DOES HAVE LINK. Guess how I found out. So please, test and if neccessary disable the feature.

xmit-hash-policy=layer3+4 tells the bonding driver to load balance the traffic based on tcp source and destination address AND the port numbers involved. This way, in an ideal world, you can exceed single-line rate throughput between two nodes. if there's switches along the path, they would need to also support this and unfortunately it is NOT a feature found in lower end switches, nor do all midrange switches implement it. It should still work for systems connected to the same switch and also, lets just try as good as we can!

 

# update-initramfs -u

I'm not sure if this step is needed to include the module config in the initrd.

 

Networking configuration

The actual setup is done in /etc/network/interfaces.

Any error in the "interfaces" file breaks ALL your networking configuration. Please have emergency console access available before you start.

# The loopback network interface
auto lo
iface lo inet loopback


# all physical interfaces are set to manual
# and assigned to a master
auto eth0
  iface eth0 inet manual
  bond-master bond0

auto eth1
  iface eth1 inet manual
  bond-master bond0

auto eth2
  iface eth2 inet manual
  bond-master bond0

auto eth3
  iface eth3 inet manual
  bond-master bond0

# the bond master interface
# also set the bonding parameters for reference
# assign all slaves (this calls ifenslave)
auto bond0
iface bond0 inet manual
  bond-mode 4
  bond-miimon 100
  bond-lacp-rate 1
  bond-xmit-hash-policy layer3+4
  bond-slaves eth0 eth1 eth2 eth3

# set up a vlan interface
auto bond0.20
iface bond0.20 inet static
  address 192.168.20.10
  netmask 255.255.255.0
  network 192.168.20.0
  broadcast 192.168.20.255
  gateway 192.168.20.1
  dns-nameservers 192.168.20.1
  dns-search my.domain.tld
  up link set $IFACE up
  down link set $IFACE down
# not using those yet since I don't fully understand mstp behind bond.
##  mstpctl_ports bond0.20
##  mstpctl_stp on
auto bond0.30
iface bond0.30 inet static 
  address 192.168.30.10
  netmask 255.255.255.0
  network 192.168.30.0
  broadcast 10.0.0.255
  gateway 192.168.30.1
  dns-nameservers 192.168.30.1
  dns-search myother.domain.tld
  up link set $IFACE up
  down link set $IFACE down
##  mstpctl_ports bond0.30
##  mstpctl_stp on

 

 

These two lines from the above file seem to cause an error on ifup/ifdown . They're straight from the documentation and the end result worked for me. Just be warned.

  up link set $IFACE up
  down link set $IFACE down

 

There are a few parts in Ubuntu that just don't work. i.e. having a bridge or a bond like above will cause a long hang on shutdown/reboot because the Ubuntu networking scripts are not able to handle advanced networking.

In this case we decided to go along with this since it's a main fileserver that is up almost always and we can handle a timeout on the very rare reboots.

 

Checking if it works

 

Bonding status

LACP is a pretty smart protocol, so you can actually see your partner's info there. Yes, Your switch has to be visible in the bonding status!

If it's showing a 00:00:00:00:00:00 Partner Mac Address it's not working yet. You might need to re-check that LACP is enabled on the switch already, and also for the ports you are connected to.

On highend HP / Huawei it'd boil down to "configure, interface range 1-24, lacp enable"

 

So, here's the full status:

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
	Aggregator ID: 1
	Number of ports: 3
	Actor Key: 9
	Partner Key: 1539
	Partner Mac Address: 00:YY:YY:YY:YU:b2

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:XX:XX:XX:XX:X1
Aggregator ID: 1
Slave queue ID: 0


Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:XX:XX:XX:XX:X2
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:XX:XX:XX:XX:X3
Aggregator ID: 3
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:XX:XX:XX:XX:X4
Aggregator ID: 1
Slave queue ID: 0

 

 

VLAN Config

You can view your VLAN config through /proc, and for managing them, if needed, there's also the vconfig utility, which is always used under the hood.

vlan/config
# cat /proc/net/vlan/config 
VLAN Dev name	 | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
bond0.20       | 20  | bond0
bond0.30       | 30  | bond0

 

As you can see the interface is rather awkward, shabby, intransparent and unreliable.

One of the reasons why OpenVswitch is a good thing to consider, and /etc/network/interfaces should also die in a fire.

But this is what we have now, and this is how you verify it's working :)