Gavin
2013-Feb-18 10:04 UTC
[Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
Hi, Firstly I apologise for the cross-post, however I don't expect to get as quick a response from the package maintainers as I do from the Debian community, and this issue affects a service that I've got scheduled to go live at midnight this evening. :( A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to version 4.1.3-8 on Debian Wheezy has caused all vm's on this host to not receive their arp replies anymore and as such they cannot reach their gateways and are now isolated from the network. There was a more recent update as well (4.1.4-2) which I have now since applied however this particular issue persists. The arp replies are received by the host and passed all the way up to the bridge (br200) being used by Xen, however they are not seen on the vif (vif2.0) created for the particular vm. If I statically add the arp entry to the vm all starts working, ie: vm is no longer isolated and is now connected to the world, but we all know that this is not an ideal workaround. This was working perfectly before this update. :( 1) Please let me know if I should roll-back this particular xen update, kernel and all, and what those steps may be, or if this is a known issue with a particular workaround that I can apply. 2) Would moving to openvswitch be another possible workaround? My config:- Bonded ethernet connected to trunks on Cisco 3750 stack with connection as follows:- eth0 --> bond0 eth1 --> bond0 --> br200 --> vif2.0 /etc/network/interfaces:- iface bond0 inet manual slaves eth0 eth1 bond_mode 5 bond-miimon 100 bond-downdelay 200 bond-updelay 200 auto br200 iface br200 inet static address 172.31.1.66 gateway 172.31.1.65 netmask 255.255.255.240 bridge_ports bond0 bridge_maxwait 0 bridge_fd 9 bridge_hello 2 bridge_maxage 12 bridge_stp off root at scjhb01:/home/gavin# brctl show bridge name bridge id STP enabled interfaces br200 8000.d4bed9f309a1 no bond0 vif2.0 root at scjhb01:/home/gavin# tcpdump -i bond0 'arp' tcpdump: WARNING: bond0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:26:00.287489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:00.287524 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:00.287669 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:01.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:01.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:01.303674 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:02.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:02.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:02.303579 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 root at scjhb01:/home/gavin# tcpdump -i br200 'arp' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br200, link-type EN10MB (Ethernet), capture size 65535 bytes 11:26:15.367489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:15.367514 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:15.367580 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:16.383476 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:16.383511 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:16.383592 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:17.383486 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:17.383520 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:17.383616 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 root at scjhb01:/home/gavin# tcpdump -i vif2.0 'arp' tcpdump: WARNING: vif2.0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vif2.0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:26:31.463481 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:31.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:32.463480 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:32.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:33.463477 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:33.463515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:34.479482 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:34.479523 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:35.479478 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:35.479515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 Thanks and Regards. Gavin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20130218/b7ab704f/attachment.html>
Gavin
2013-Feb-18 10:23 UTC
Fwd: Recent hypervisor update on Debian Wheezy breaks domU networking
Hi, Firstly I apologise for the cross-post, however I don''t expect to get as quick a response from the Debian/Xen package maintainers as I do from the Xen community, and this issue affects a service that I''ve got scheduled to go live at midnight this evening. :( A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to version 4.1.3-8 on Debian Wheezy has caused all vm''s on this host to not receive their arp replies anymore and as such they cannot reach their gateways and are now isolated from the network. There was a more recent update as well (4.1.4-2) which I have now since applied however this particular issue persists. The arp replies are received by the host and passed all the way up to the bridge (br200) being used by Xen, however they are not seen on the vif (vif2.0) created for the particular vm. If I statically add the arp entry to the vm all starts working, ie: vm is no longer isolated and is now connected to the world, but we all know that this is not an ideal workaround. This was working perfectly before this update. :( 1) Please let me know if I should roll-back this particular xen update, kernel and all, and what those steps may be, or if this is a known issue with a particular workaround that I can apply. 2) Would moving to openvswitch be another possible workaround? My config:- Bonded ethernet connected to trunks on Cisco 3750 stack with connection as follows:- eth0 --> bond0 eth1 --> bond0 --> br200 --> vif2.0 /etc/network/interfaces:- iface bond0 inet manual slaves eth0 eth1 bond_mode 5 bond-miimon 100 bond-downdelay 200 bond-updelay 200 auto br200 iface br200 inet static address 172.31.1.66 gateway 172.31.1.65 netmask 255.255.255.240 bridge_ports bond0 bridge_maxwait 0 bridge_fd 9 bridge_hello 2 bridge_maxage 12 bridge_stp off root@scjhb01:/home/gavin# brctl show bridge name bridge id STP enabled interfaces br200 8000.d4bed9f309a1 no bond0 vif2.0 root@scjhb01:/home/gavin# tcpdump -i bond0 ''arp'' tcpdump: WARNING: bond0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:26:00.287489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:00.287524 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:00.287669 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:01.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:01.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:01.303674 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:02.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:02.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:02.303579 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 root@scjhb01:/home/gavin# tcpdump -i br200 ''arp'' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br200, link-type EN10MB (Ethernet), capture size 65535 bytes 11:26:15.367489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:15.367514 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:15.367580 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:16.383476 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:16.383511 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:16.383592 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 11:26:17.383486 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:17.383520 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:17.383616 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui Unknown), length 46 root@scjhb01:/home/gavin# tcpdump -i vif2.0 ''arp'' tcpdump: WARNING: vif2.0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vif2.0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:26:31.463481 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:31.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:32.463480 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:32.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:33.463477 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:33.463515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:34.479482 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:34.479523 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 11:26:35.479478 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28 11:26:35.479515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46 Thanks and Regards. Gavin _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Ian Campbell
2013-Feb-18 10:50 UTC
[Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
On Mon, 2013-02-18 at 12:04 +0200, Gavin wrote:> Firstly I apologise for the cross-post,I've added xen-users since you also bounced this there.> however I don't expect to get as quick a response from the package > maintainers as I do from the Debian community, and this issue affects > a service that I've got scheduled to go live at midnight this > evening. :( > > > A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to > version 4.1.3-8 on Debian Wheezy has caused all vm's on this host to > not receive their arp replies anymore and as such they cannot reach > their gateways and are now isolated from the network. > > > There was a more recent update as well (4.1.4-2) which I have now > since applied however this particular issue persists.Networking level stuff is all done by the dom0 (or driver domain) kernel rather than the hypervisor so it is far more likely that a kernel level change rather than a hypervisor change would be responsible. What kernel version are you running? Did it also change?> The arp replies are received by the host and passed all the way up to > the bridge (br200) being used by Xen, however they are not seen on the > vif (vif2.0) created for the particular vm.Do you have any firewall or ebfilter entries which might have either been discarded or reintroduced by the reboot? (i.e. a manual settings modification which wasn't propagated to the startup scripts). Or perhaps sysctl tweaks?> 1) Please let me know if I should roll-back this particular xen > update, kernel and all, and what those steps may be, or if this is a > known issue with a particular workaround that I can apply.I'd certainly be tempted to try the older kernel, assuming that was also upgraded. It may even still be installed and in your grub menu already.> 2) Would moving to openvswitch be another possible workaround?Without knowing what the underlying issue is it is hard to predict whether it will also affect ovs.> My config:-Looks correct to me. Ian.
Ian Campbell
2013-Feb-18 10:50 UTC
Re: [Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
On Mon, 2013-02-18 at 12:04 +0200, Gavin wrote:> Firstly I apologise for the cross-post,I''ve added xen-users since you also bounced this there.> however I don''t expect to get as quick a response from the package > maintainers as I do from the Debian community, and this issue affects > a service that I''ve got scheduled to go live at midnight this > evening. :( > > > A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to > version 4.1.3-8 on Debian Wheezy has caused all vm''s on this host to > not receive their arp replies anymore and as such they cannot reach > their gateways and are now isolated from the network. > > > There was a more recent update as well (4.1.4-2) which I have now > since applied however this particular issue persists.Networking level stuff is all done by the dom0 (or driver domain) kernel rather than the hypervisor so it is far more likely that a kernel level change rather than a hypervisor change would be responsible. What kernel version are you running? Did it also change?> The arp replies are received by the host and passed all the way up to > the bridge (br200) being used by Xen, however they are not seen on the > vif (vif2.0) created for the particular vm.Do you have any firewall or ebfilter entries which might have either been discarded or reintroduced by the reboot? (i.e. a manual settings modification which wasn''t propagated to the startup scripts). Or perhaps sysctl tweaks?> 1) Please let me know if I should roll-back this particular xen > update, kernel and all, and what those steps may be, or if this is a > known issue with a particular workaround that I can apply.I''d certainly be tempted to try the older kernel, assuming that was also upgraded. It may even still be installed and in your grub menu already.> 2) Would moving to openvswitch be another possible workaround?Without knowing what the underlying issue is it is hard to predict whether it will also affect ovs.> My config:-Looks correct to me. Ian.
Gavin
2013-Feb-18 11:40 UTC
Re: [Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
On 18 February 2013 12:50, Ian Campbell <ijc@hellion.org.uk> wrote:> On Mon, 2013-02-18 at 12:04 +0200, Gavin wrote: > > > > Firstly I apologise for the cross-post, > > I''ve added xen-users since you also bounced this there. >Thanks. :-/ Thanks too for your quick reply.> > > however I don''t expect to get as quick a response from the package > > maintainers as I do from the Debian community, and this issue affects > > a service that I''ve got scheduled to go live at midnight this > > evening. :( > > > > > > A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to > > version 4.1.3-8 on Debian Wheezy has caused all vm''s on this host to > > not receive their arp replies anymore and as such they cannot reach > > their gateways and are now isolated from the network. > > > > > > There was a more recent update as well (4.1.4-2) which I have now > > since applied however this particular issue persists. > > Networking level stuff is all done by the dom0 (or driver domain) kernel > rather than the hypervisor so it is far more likely that a kernel level > change rather than a hypervisor change would be responsible. What kernel > version are you running? Did it also change? >This makes sense, although when I did the apt-get upgrade, there was no kernel update, however there may have been packages/drivers that required a kernel mod. Here is the apt history which details what was upgraded when this broke:- Upgrade: dnsmasq-base:amd64 (2.62-3, 2.62-3+deb7u1), tasksel-data:amd64 (3.14, 3.14+nmu1), xen-hypervisor-4.1-amd64:amd64 (4.1.3-7, 4.1.3-8), xen-utils-common:amd64 (4.1.3-7, 4.1.3-8), perl:amd64 (5.14.2-16, 5.14.2-17), firmware-linux-free:amd64 (3.1, 3.2), perl-base:amd64 (5.14.2-16, 5.14.2-17), xen-utils-4.1:amd64 (4.1.3-7, 4.1.3-8), libgnutls26:amd64 (2.12.20-2, 2.12.20-4), perl-modules:amd64 (5.14.2-16, 5.14.2-17), psmisc:amd64 (22.19-1, 22.19-1+deb7u1), python2.6:amd64 (2.6.8-0.2, 2.6.8-1.1), libxenstore3.0:amd64 (4.1.3-7, 4.1.3-8), python2.6-minimal:amd64 (2.6.8-0.2, 2.6.8-1.1), coreutils:amd64 (8.13-3.4, 8.13-3.5), libvirt0:amd64 (0.9.12-5, 0.9.12-6), libcurl3:amd64 (7.26.0-1, 7.26.0-1+wheezy1), manpages:amd64 (3.42-1, 3.44-1), tasksel:amd64 (3.14, 3.14+nmu1), libperl5.14:amd64 (5.14.2-16, 5.14.2-17), libsystemd-login0:amd64 (44-7, 44-8), libxen-4.1:amd64 (4.1.3-7, 4.1.3-8), libcurl3-gnutls:amd64 (7.26.0-1, 7.26.0-1+wheezy1), host:amd64 (9.8.4.dfsg.P1-1, 9.8.4.dfsg.P1-4), libvirt-bin:amd64 (0.9.12-5, 0.9.12-6), rinse:amd64 (2.0-1, 2.0.1-1), ca-certificates:amd64 (20120623, 20130119), xenstore-utils:amd64 (4.1.3-7, 4.1.3-8) The kernel I am using is: 3.2.0-2-amd64, also tried 3.2.0-4-amd64 on another host with no success. Would the upgrade above of xen-hypervisor-4.1-amd64 on this Debian system not cause the Dom0 kernel to be changed in any way ??> > The arp replies are received by the host and passed all the way up to > > the bridge (br200) being used by Xen, however they are not seen on the > > vif (vif2.0) created for the particular vm. > > Do you have any firewall or ebfilter entries which might have either > been discarded or reintroduced by the reboot? (i.e. a manual settings > modification which wasn''t propagated to the startup scripts). Or perhaps > sysctl tweaks? >Nope, not using iptables/ebtables, this was working 100% until the apt upgrade above. After this broke I did try and add the following to /etc/sysctl.conf but it made no difference:- net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 It did add iptables rules but made no difference to this issue.> > > 1) Please let me know if I should roll-back this particular xen > > update, kernel and all, and what those steps may be, or if this is a > > known issue with a particular workaround that I can apply. > > I''d certainly be tempted to try the older kernel, assuming that was also > upgraded. It may even still be installed and in your grub menu already. >The problem is now we are using grub2 and it appears that on boot grub loads a Linux menu, then the Xen Menu with configs in /etc/grub.d/ so I''m battling to figure out how to do this. I also do not have physical access to this host at the moment so need to set the boot order ''correctly'' prior to a reboot.> > > 2) Would moving to openvswitch be another possible workaround? > > Without knowing what the underlying issue is it is hard to predict > whether it will also affect ovs. >Agreed.> > > My config:- > > Looks correct to me. > > Ian. >Thanks Ian. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users