Chris Friesen
2013-Feb-13 00:30 UTC
[Bridge] how to handle bonding failover when using a bridge over the bond?
On 02/12/2013 06:02 PM, Jay Vosburgh wrote:> Chris Friesen<chris.friesen at genband.com> wrote: > >> I've got a scenario that seems to be not well handled with the current >> bonding code in linux, but maybe I'm missing something. >> >> I have a physical host with two ethernet links that are bonded together >> (active/backup). Each link is connected to a separate L2 switch, which >> are in turn connected with a crosslink for redundancy. >> >> The physical host is running multiple virtual machines each with a virtual >> adapter. The virtual adapters and the bond are all bridged together to >> allow communication between the virtual machines, the host, and the >> outside world. >> >> Now suppose one of the slave links fails. The bond device will failover to >> the other slave and send out a gratuitous arp on the newly active slave. >> This will cause the L2 switches to update their lookup tables for the MAC >> address associated with the bond (so it now points to the newly active >> slave), but doesn't update the MAC addresses associated with the various >> virtual machines. If someone on the network sends a packet to one of the >> virtual machines, the switch will try to send it over the failed slave. > > If the link failure is such that there is no carrier on the > switch port, the switch will drop the forwarding entry for the virtual > machine's MAC address from that port. The traffic for the VM's MAC > would then flood to all ports, presumably including the link to the > other switch, which wouldn't have a forwarding entry for the MAC, either > (or it would be the switch link port), and would also flood it to all > ports, one of which is the correct one.This makes sense, though it wouldn't cover the case where the link only loses carrier in one direction, or if the bond is using arp failover and something fails beyond the first hop.> Is this actually failing for you, or is this a thought > experiment?It actually failed. During a customer demo. :) From what I understand it was a physical link pull, which (based on what you say above) should have caused the switch to react appropriately. I'll see if I can get some more information. Maybe the switches weren't behaving properly or something.>> What's the recommended solution for this? The logical solution would seem >> to be to have something issue GARPs for each virtual machine when the bond >> device fails over, but there doesn't seem to be any way to register for >> notification (via rtnetlink for instance) when the bond fails over. I >> could monitor for carrier loss, but that wouldn't work for the case where >> bonding is using arp monitoring. > > There is a NETDEV_BONDING_FAILOVER notifier that is called for > active-backup mode when a new active slave is assigned. The > rtnetlink_event function is on that chain, and will send an rtnetlink > message, although I don't see that the actual event is included in the > message.If I'm reading this right it will end up sending an RTM_NEWLINK message, which seems a bit odd.> The bond doesn't track all of the MACs that go through it, but > the bridge presumably does, and could respond to the FAILOVER notifier > with something to notify the switch that the port assignments for the > various MACs have changed.That would probably make sense. I've added the bridging folks, maybe they'll have a suggestion how this sort of thing should be handled. Chris
Chris Friesen
2013-Feb-13 17:14 UTC
[Bridge] how to handle bonding failover when using a bridge over the bond?
On 02/12/2013 06:30 PM, Chris Friesen wrote:> On 02/12/2013 06:02 PM, Jay Vosburgh wrote: >> Chris Friesen<chris.friesen at genband.com> wrote:>>> I have a physical host with two ethernet links that are bonded >>> together (active/backup). Each link is connected to a separate L2 >>> switch, which are in turn connected with a crosslink for >>> redundancy. >>> >>> The physical host is running multiple virtual machines each with >>> a virtual adapter. The virtual adapters and the bond are all >>> bridged together to allow communication between the virtual >>> machines, the host, and the outside world. >>> >>> Now suppose one of the slave links fails. The bond device will >>> failover to the other slave and send out a gratuitous arp on the >>> newly active slave. This will cause the L2 switches to update >>> their lookup tables for the MAC address associated with the bond >>> (so it now points to the newly active slave), but doesn't update >>> the MAC addresses associated with the various virtual machines. >>> If someone on the network sends a packet to one of the virtual >>> machines, the switch will try to send it over the failed slave. >> >> If the link failure is such that there is no carrier on the switch >> port, the switch will drop the forwarding entry for the virtual >> machine's MAC address from that port. The traffic for the VM's MAC >> would then flood to all ports, presumably including the link to >> the other switch, which wouldn't have a forwarding entry for the >> MAC, either (or it would be the switch link port), and would also >> flood it to all ports, one of which is the correct one.I talked with our networking guy. Apparently what is happening is that if we pull the link to switch A it drops the forwarding entries for all MACs on the downed link, but switch B still has stale entries pointing to the inter-switch link. If a packet destined for the VM that arrives at switch B, it will send it across to switch A. (Which is pointless since A no longer has a working link to the MAC in question.) If a packet destined for the VM that arrives at switch A, it will broadcast it to all ports, including the inter-switch link to switch B. However, switch B still thinks the MAC address is connected to switch A, so it drops the packet. Once the VMs send out packets switch B will update its tables, but if the VMs are event-driven and mostly only respond to incoming packets they could end up waiting a long time. Chris