Gavin
2013-Feb-18 10:04 UTC
[Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
Hi,
Firstly I apologise for the cross-post, however I don't expect to get as
quick a response from the package maintainers as I do from the Debian
community, and this issue affects a service that I've got scheduled to go
live at midnight this evening. :(
A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to version
4.1.3-8 on Debian Wheezy has caused all vm's on this host to not receive
their arp replies anymore and as such they cannot reach their gateways and
are now isolated from the network.
There was a more recent update as well (4.1.4-2) which I have now since
applied however this particular issue persists.
The arp replies are received by the host and passed all the way up to the
bridge (br200) being used by Xen, however they are not seen on the vif
(vif2.0) created for the particular vm.
If I statically add the arp entry to the vm all starts working, ie: vm is
no longer isolated and is now connected to the world, but we all know that
this is not an ideal workaround.
This was working perfectly before this update. :(
1) Please let me know if I should roll-back this particular xen update,
kernel and all, and what those steps may be, or if this is a known issue
with a particular workaround that I can apply.
2) Would moving to openvswitch be another possible workaround?
My config:-
Bonded ethernet connected to trunks on Cisco 3750 stack with connection as
follows:-
eth0 --> bond0
eth1 --> bond0 --> br200 --> vif2.0
/etc/network/interfaces:-
iface bond0 inet manual
slaves eth0 eth1
bond_mode 5
bond-miimon 100
bond-downdelay 200
bond-updelay 200
auto br200
iface br200 inet static
address 172.31.1.66
gateway 172.31.1.65
netmask 255.255.255.240
bridge_ports bond0
bridge_maxwait 0
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bridge_stp off
root at scjhb01:/home/gavin# brctl show
bridge name bridge id STP enabled interfaces
br200 8000.d4bed9f309a1 no bond0
vif2.0
root at scjhb01:/home/gavin# tcpdump -i bond0 'arp'
tcpdump: WARNING: bond0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:26:00.287489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:00.287524 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:00.287669 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:01.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:01.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:01.303674 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:02.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:02.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:02.303579 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
root at scjhb01:/home/gavin# tcpdump -i br200 'arp'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br200, link-type EN10MB (Ethernet), capture size 65535 bytes
11:26:15.367489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:15.367514 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:15.367580 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:16.383476 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:16.383511 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:16.383592 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:17.383486 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:17.383520 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:17.383616 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
root at scjhb01:/home/gavin# tcpdump -i vif2.0 'arp'
tcpdump: WARNING: vif2.0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vif2.0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:26:31.463481 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:31.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:32.463480 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:32.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:33.463477 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:33.463515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:34.479482 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:34.479523 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:35.479478 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:35.479515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
Thanks and Regards.
Gavin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20130218/b7ab704f/attachment.html>
Gavin
2013-Feb-18 10:23 UTC
Fwd: Recent hypervisor update on Debian Wheezy breaks domU networking
Hi,
Firstly I apologise for the cross-post, however I don''t expect to get
as
quick a response from the Debian/Xen package maintainers as I do from the
Xen community, and this issue affects a service that I''ve got scheduled
to
go live at midnight this evening. :(
A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to version
4.1.3-8 on Debian Wheezy has caused all vm''s on this host to not
receive
their arp replies anymore and as such they cannot reach their gateways and
are now isolated from the network.
There was a more recent update as well (4.1.4-2) which I have now since
applied however this particular issue persists.
The arp replies are received by the host and passed all the way up to the
bridge (br200) being used by Xen, however they are not seen on the vif
(vif2.0) created for the particular vm.
If I statically add the arp entry to the vm all starts working, ie: vm is
no longer isolated and is now connected to the world, but we all know that
this is not an ideal workaround.
This was working perfectly before this update. :(
1) Please let me know if I should roll-back this particular xen update,
kernel and all, and what those steps may be, or if this is a known issue
with a particular workaround that I can apply.
2) Would moving to openvswitch be another possible workaround?
My config:-
Bonded ethernet connected to trunks on Cisco 3750 stack with connection as
follows:-
eth0 --> bond0
eth1 --> bond0 --> br200 --> vif2.0
/etc/network/interfaces:-
iface bond0 inet manual
slaves eth0 eth1
bond_mode 5
bond-miimon 100
bond-downdelay 200
bond-updelay 200
auto br200
iface br200 inet static
address 172.31.1.66
gateway 172.31.1.65
netmask 255.255.255.240
bridge_ports bond0
bridge_maxwait 0
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bridge_stp off
root@scjhb01:/home/gavin# brctl show
bridge name bridge id STP enabled interfaces
br200 8000.d4bed9f309a1 no bond0
vif2.0
root@scjhb01:/home/gavin# tcpdump -i bond0 ''arp''
tcpdump: WARNING: bond0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:26:00.287489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:00.287524 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:00.287669 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:01.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:01.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:01.303674 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:02.303484 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:02.303518 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:02.303579 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
root@scjhb01:/home/gavin# tcpdump -i br200 ''arp''
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br200, link-type EN10MB (Ethernet), capture size 65535 bytes
11:26:15.367489 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:15.367514 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:15.367580 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:16.383476 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:16.383511 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:16.383592 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
11:26:17.383486 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:17.383520 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:17.383616 ARP, Reply 172.31.1.49 is-at 00:09:0f:09:21:0e (oui
Unknown), length 46
root@scjhb01:/home/gavin# tcpdump -i vif2.0 ''arp''
tcpdump: WARNING: vif2.0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vif2.0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:26:31.463481 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:31.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:32.463480 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:32.463521 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:33.463477 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:33.463515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:34.479482 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:34.479523 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
11:26:35.479478 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 28
11:26:35.479515 ARP, Request who-has 172.31.1.49 tell 172.31.1.50, length 46
Thanks and Regards.
Gavin
_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users
Ian Campbell
2013-Feb-18 10:50 UTC
[Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
On Mon, 2013-02-18 at 12:04 +0200, Gavin wrote:> Firstly I apologise for the cross-post,I've added xen-users since you also bounced this there.> however I don't expect to get as quick a response from the package > maintainers as I do from the Debian community, and this issue affects > a service that I've got scheduled to go live at midnight this > evening. :( > > > A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to > version 4.1.3-8 on Debian Wheezy has caused all vm's on this host to > not receive their arp replies anymore and as such they cannot reach > their gateways and are now isolated from the network. > > > There was a more recent update as well (4.1.4-2) which I have now > since applied however this particular issue persists.Networking level stuff is all done by the dom0 (or driver domain) kernel rather than the hypervisor so it is far more likely that a kernel level change rather than a hypervisor change would be responsible. What kernel version are you running? Did it also change?> The arp replies are received by the host and passed all the way up to > the bridge (br200) being used by Xen, however they are not seen on the > vif (vif2.0) created for the particular vm.Do you have any firewall or ebfilter entries which might have either been discarded or reintroduced by the reboot? (i.e. a manual settings modification which wasn't propagated to the startup scripts). Or perhaps sysctl tweaks?> 1) Please let me know if I should roll-back this particular xen > update, kernel and all, and what those steps may be, or if this is a > known issue with a particular workaround that I can apply.I'd certainly be tempted to try the older kernel, assuming that was also upgraded. It may even still be installed and in your grub menu already.> 2) Would moving to openvswitch be another possible workaround?Without knowing what the underlying issue is it is hard to predict whether it will also affect ovs.> My config:-Looks correct to me. Ian.
Ian Campbell
2013-Feb-18 10:50 UTC
Re: [Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
On Mon, 2013-02-18 at 12:04 +0200, Gavin wrote:> Firstly I apologise for the cross-post,I''ve added xen-users since you also bounced this there.> however I don''t expect to get as quick a response from the package > maintainers as I do from the Debian community, and this issue affects > a service that I''ve got scheduled to go live at midnight this > evening. :( > > > A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to > version 4.1.3-8 on Debian Wheezy has caused all vm''s on this host to > not receive their arp replies anymore and as such they cannot reach > their gateways and are now isolated from the network. > > > There was a more recent update as well (4.1.4-2) which I have now > since applied however this particular issue persists.Networking level stuff is all done by the dom0 (or driver domain) kernel rather than the hypervisor so it is far more likely that a kernel level change rather than a hypervisor change would be responsible. What kernel version are you running? Did it also change?> The arp replies are received by the host and passed all the way up to > the bridge (br200) being used by Xen, however they are not seen on the > vif (vif2.0) created for the particular vm.Do you have any firewall or ebfilter entries which might have either been discarded or reintroduced by the reboot? (i.e. a manual settings modification which wasn''t propagated to the startup scripts). Or perhaps sysctl tweaks?> 1) Please let me know if I should roll-back this particular xen > update, kernel and all, and what those steps may be, or if this is a > known issue with a particular workaround that I can apply.I''d certainly be tempted to try the older kernel, assuming that was also upgraded. It may even still be installed and in your grub menu already.> 2) Would moving to openvswitch be another possible workaround?Without knowing what the underlying issue is it is hard to predict whether it will also affect ovs.> My config:-Looks correct to me. Ian.
Gavin
2013-Feb-18 11:40 UTC
Re: [Pkg-xen-devel] Recent hypervisor update on Debian Wheezy breaks domU networking
On 18 February 2013 12:50, Ian Campbell <ijc@hellion.org.uk> wrote:> On Mon, 2013-02-18 at 12:04 +0200, Gavin wrote: > > > > Firstly I apologise for the cross-post, > > I''ve added xen-users since you also bounced this there. >Thanks. :-/ Thanks too for your quick reply.> > > however I don''t expect to get as quick a response from the package > > maintainers as I do from the Debian community, and this issue affects > > a service that I''ve got scheduled to go live at midnight this > > evening. :( > > > > > > A recent update from xen-hypervisor-4.1-amd64 version 4.1.3-7, to > > version 4.1.3-8 on Debian Wheezy has caused all vm''s on this host to > > not receive their arp replies anymore and as such they cannot reach > > their gateways and are now isolated from the network. > > > > > > There was a more recent update as well (4.1.4-2) which I have now > > since applied however this particular issue persists. > > Networking level stuff is all done by the dom0 (or driver domain) kernel > rather than the hypervisor so it is far more likely that a kernel level > change rather than a hypervisor change would be responsible. What kernel > version are you running? Did it also change? >This makes sense, although when I did the apt-get upgrade, there was no kernel update, however there may have been packages/drivers that required a kernel mod. Here is the apt history which details what was upgraded when this broke:- Upgrade: dnsmasq-base:amd64 (2.62-3, 2.62-3+deb7u1), tasksel-data:amd64 (3.14, 3.14+nmu1), xen-hypervisor-4.1-amd64:amd64 (4.1.3-7, 4.1.3-8), xen-utils-common:amd64 (4.1.3-7, 4.1.3-8), perl:amd64 (5.14.2-16, 5.14.2-17), firmware-linux-free:amd64 (3.1, 3.2), perl-base:amd64 (5.14.2-16, 5.14.2-17), xen-utils-4.1:amd64 (4.1.3-7, 4.1.3-8), libgnutls26:amd64 (2.12.20-2, 2.12.20-4), perl-modules:amd64 (5.14.2-16, 5.14.2-17), psmisc:amd64 (22.19-1, 22.19-1+deb7u1), python2.6:amd64 (2.6.8-0.2, 2.6.8-1.1), libxenstore3.0:amd64 (4.1.3-7, 4.1.3-8), python2.6-minimal:amd64 (2.6.8-0.2, 2.6.8-1.1), coreutils:amd64 (8.13-3.4, 8.13-3.5), libvirt0:amd64 (0.9.12-5, 0.9.12-6), libcurl3:amd64 (7.26.0-1, 7.26.0-1+wheezy1), manpages:amd64 (3.42-1, 3.44-1), tasksel:amd64 (3.14, 3.14+nmu1), libperl5.14:amd64 (5.14.2-16, 5.14.2-17), libsystemd-login0:amd64 (44-7, 44-8), libxen-4.1:amd64 (4.1.3-7, 4.1.3-8), libcurl3-gnutls:amd64 (7.26.0-1, 7.26.0-1+wheezy1), host:amd64 (9.8.4.dfsg.P1-1, 9.8.4.dfsg.P1-4), libvirt-bin:amd64 (0.9.12-5, 0.9.12-6), rinse:amd64 (2.0-1, 2.0.1-1), ca-certificates:amd64 (20120623, 20130119), xenstore-utils:amd64 (4.1.3-7, 4.1.3-8) The kernel I am using is: 3.2.0-2-amd64, also tried 3.2.0-4-amd64 on another host with no success. Would the upgrade above of xen-hypervisor-4.1-amd64 on this Debian system not cause the Dom0 kernel to be changed in any way ??> > The arp replies are received by the host and passed all the way up to > > the bridge (br200) being used by Xen, however they are not seen on the > > vif (vif2.0) created for the particular vm. > > Do you have any firewall or ebfilter entries which might have either > been discarded or reintroduced by the reboot? (i.e. a manual settings > modification which wasn''t propagated to the startup scripts). Or perhaps > sysctl tweaks? >Nope, not using iptables/ebtables, this was working 100% until the apt upgrade above. After this broke I did try and add the following to /etc/sysctl.conf but it made no difference:- net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 It did add iptables rules but made no difference to this issue.> > > 1) Please let me know if I should roll-back this particular xen > > update, kernel and all, and what those steps may be, or if this is a > > known issue with a particular workaround that I can apply. > > I''d certainly be tempted to try the older kernel, assuming that was also > upgraded. It may even still be installed and in your grub menu already. >The problem is now we are using grub2 and it appears that on boot grub loads a Linux menu, then the Xen Menu with configs in /etc/grub.d/ so I''m battling to figure out how to do this. I also do not have physical access to this host at the moment so need to set the boot order ''correctly'' prior to a reboot.> > > 2) Would moving to openvswitch be another possible workaround? > > Without knowing what the underlying issue is it is hard to predict > whether it will also affect ovs. >Agreed.> > > My config:- > > Looks correct to me. > > Ian. >Thanks Ian. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users