Hi All, Having a very strange problem where a VM''s bridge will spontaneously stop bridging traffic. This only seems to occur on our 10gig interfaces (intel x540 on ixgbe driver, mtu 9000), which are 2x links bonded into bond0, then broken down into pvlan462/pvlan463/etc before being bridged with the DomU''s. Everything works great at first but several hours after starting a large rsync traffic stops crossing the bridge. Once it''s stopped working it only affects that single VM on that single interface. Other VM''s on the same dom0 still have access to the same affected vlan. Layout is Nexenta NFS ---> 2x arista 10gig switches --> intel x540-t2 (ixgbe) on dom0 --802.3ad--> bond0 --vconfig--> vlan 462 --bridged--> pvlan 462 / vif4.1 / vif6.1. Dom0 is running kernel 3.2.28 w/ xen 4.1.3, domU is kernel 2.6.32.27 xen3 ~ # brctl show bridge name bridge id STP enabled interfaces vlan462 8000.a0369f0eac2c no pvlan462 vif4.1 vif6.1 vlan463 8000.a0369f0eac2c no pvlan463 vif5.1 Once it breaks, doing a tcpdump inside the vm or on the dom0 against the vif show the same arp traffic from the VM (looking for the nfs server), but nothing incoming to the VM at all. Tcpdumping on the parent bridge shows the traffic as normal and other VMs on this bridge have regular access still, only the single vif is affected. I''ve tried toggling net.bridge.bridge-nf-call-(arp|ip|ip6)tables off and it didn''t seem to make a difference (also flushed all ip/eb/arptables rules just in case). It takes me several hours to reproduce just by copying data and I haven''t managed to figure out a nice small test case yet or what triggers the break. Considering I''ve found one bug in ixgbe already (reported + fixed!) I suspect the 10gig driver, but seems like this problem would come from either xen or bridging. This feels like a xen net back/front issue? Any ideas? Or suggestions on where to start looking? Thanks! - Nathan
Andrew Cooper
2012-Sep-18 18:40 UTC
Re: VM spontaneously losing network on 10gig interface
On 18/09/2012 00:00, Nathan March wrote:> Hi All, > > Having a very strange problem where a VM''s bridge will spontaneously > stop bridging traffic. This only seems to occur on our 10gig interfaces > (intel x540 on ixgbe driver, mtu 9000), which are 2x links bonded into > bond0, then broken down into pvlan462/pvlan463/etc before being bridged > with the DomU''s. Everything works great at first but several hours after > starting a large rsync traffic stops crossing the bridge. Once it''s > stopped working it only affects that single VM on that single interface. > Other VM''s on the same dom0 still have access to the same affected vlan. > > Layout is Nexenta NFS ---> 2x arista 10gig switches --> intel x540-t2 > (ixgbe) on dom0 --802.3ad--> bond0 --vconfig--> vlan 462 --bridged--> > pvlan 462 / vif4.1 / vif6.1. > Dom0 is running kernel 3.2.28 w/ xen 4.1.3, domU is kernel 2.6.32.27 > > xen3 ~ # brctl show > bridge name bridge id STP enabled interfaces > vlan462 8000.a0369f0eac2c no pvlan462 > vif4.1 > vif6.1 > vlan463 8000.a0369f0eac2c no pvlan463 > vif5.1 > > Once it breaks, doing a tcpdump inside the vm or on the dom0 against the > vif show the same arp traffic from the VM (looking for the nfs server), > but nothing incoming to the VM at all. Tcpdumping on the parent bridge > shows the traffic as normal and other VMs on this bridge have regular > access still, only the single vif is affected. > > I''ve tried toggling net.bridge.bridge-nf-call-(arp|ip|ip6)tables off and > it didn''t seem to make a difference (also flushed all ip/eb/arptables > rules just in case). > > It takes me several hours to reproduce just by copying data and I > haven''t managed to figure out a nice small test case yet or what > triggers the break. Considering I''ve found one bug in ixgbe already > (reported + fixed!) I suspect the 10gig driver, but seems like this > problem would come from either xen or bridging. This feels like a xen > net back/front issue? > > Any ideas? Or suggestions on where to start looking?What happens if you detach the vif from the bridge and reattach it - does the problem go away? ~Andrew> > Thanks! > > - Nathan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 9/17/2012 4:00 PM, Nathan March wrote:> Hi All, > > Having a very strange problem where a VM''s bridge will spontaneously > stop bridging traffic. This only seems to occur on our 10gig > interfaces (intel x540 on ixgbe driver, mtu 9000), which are 2x links > bonded into bond0, then broken down into pvlan462/pvlan463/etc before > being bridged with the DomU''s. Everything works great at first but > several hours after starting a large rsync traffic stops crossing the > bridge. Once it''s stopped working it only affects that single VM on > that single interface. Other VM''s on the same dom0 still have access > to the same affected vlan.In case anyone else runs into something like this, it appears to be a bug in ixgbe. Switching to using the latest release as a module instead of the built in kernel driver, resolved this. - Nathan