Using LACP and tagged VLANs in Ubuntu 14.04
Setting up a load balanced VLAN enabled link in Ubuntu isn't all roses.
We did this very thing a few days back and in this page you'll find what we configured.
Some things are really quirky, butI don't think it's possible to completely fix this in Ubuntu.
This article is just there to help you as much as possible.
A short overview
LACP is a modern protocol for combining multiple network interfaces into one. It actively exchanges status data between partners (switch and server) and is very easy to configure on most devices. Failures are detected reliably and fast. A failed part of the link is automatically redistributed among the others.
No bandwidth is wasted for a "failover" link.
Adding tagged VLAN interfaces on top of the link allows to easily deploy new networks to a system without adding more cabling. If you want redundancy for many networks it's a lot easier to only deploy those networks on top of an existing redundant link.
With better switches the configuration is automatic to a point that you only need to plug your cables and they'll automatically match your server's cables into the same group.
The big picture
You see available interfaces (eth0-eth3), the bonding (mess of water), assigning vlans (above the mess) and advanced direction of outgoing traffic (river).
Both are to needed have it actually work. error messages without are highly misleading. bond0 will not come up if ifenslave isn't there to assign interfaces, but it will *not* tell you so.
The kernel modules are also needed to have it actually work. error messages without them pre-loaded are also highly misleading :)
Kernel module options
In theory this should not be needed because it could be configured in /etc/network/interfaces. But the reality differs. The bonding module is loaded by the kernel, and the kernel sets the bonding default mode. The default is active-passive, and no matter what documentation says, you'll end up with an active-passive bond (as seen in /proc/net/bonding/bond0). So to make sure your options aren't ignored, correctly initialize the bonding module prior to load.
max_bonds=4 generates additional bonding devices for future use, if you'd rather not have that, set it to max_bonds=1
mode=4 sets the mode to active LACP where the server and the uplink switch start exchanging LACPDU data. (link leads to IEEE standard)
lacp_rate=1 sets the interval for LACPDU exchanges to "fast" meaning one is send every second. At works a failover should occur after 3 retries, but in reality it'll usually be just a single second.
miimon=100 tells linux to query the "media independent interface" for link status every 100ms
use_carrier=0 tells linux to not query the driver itself for the linux status. apparently some Linux drivers don't implement that but DON'T have a way of telling the client so. That means, they'll say link is down even if the card DOES HAVE LINK. Guess how I found out. So please, test and if neccessary disable the feature.
xmit-hash-policy=layer3+4 tells the bonding driver to load balance the traffic based on tcp source and destination address AND the port numbers involved. This way, in an ideal world, you can exceed single-line rate throughput between two nodes. if there's switches along the path, they would need to also support this and unfortunately it is NOT a feature found in lower end switches, nor do all midrange switches implement it. It should still work for systems connected to the same switch and also, lets just try as good as we can!
I'm not sure if this step is needed to include the module config in the initrd.
The actual setup is done in /etc/network/interfaces.
Any error in the "interfaces" file breaks ALL your networking configuration. Please have emergency console access available before you start.
These two lines from the above file seem to cause an error on ifup/ifdown . They're straight from the documentation and the end result worked for me. Just be warned.
There are a few parts in Ubuntu that just don't work. i.e. having a bridge or a bond like above will cause a long hang on shutdown/reboot because the Ubuntu networking scripts are not able to handle advanced networking.
In this case we decided to go along with this since it's a main fileserver that is up almost always and we can handle a timeout on the very rare reboots.
Checking if it works
LACP is a pretty smart protocol, so you can actually see your partner's info there. Yes, Your switch has to be visible in the bonding status!
If it's showing a 00:00:00:00:00:00 Partner Mac Address it's not working yet. You might need to re-check that LACP is enabled on the switch already, and also for the ports you are connected to.
On highend HP / Huawei it'd boil down to "configure, interface range 1-24, lacp enable"
So, here's the full status:
You can view your VLAN config through /proc, and for managing them, if needed, there's also the vconfig utility, which is always used under the hood.
As you can see the interface is rather awkward, shabby, intransparent and unreliable.
One of the reasons why OpenVswitch is a good thing to consider, and /etc/network/interfaces should also die in a fire.
But this is what we have now, and this is how you verify it's working :)