Ingo Juergensmann
2013-Feb-26 17:42 UTC
[Pkg-xen-devel] Bug#701744: [xen] Update to hypervisor 4.0.1-5.6 or linux-image-2.6.32-5-xen-amd64 2.6.32-48 causes networking (VIF) failures
Package: xen Version: 4.0.1-5.5 Severity: critical --- Please enter the report below this line. --- Hi! Since the update last weekind in stable/squeeze I'm experiencing problems with running Xen on amd64 and multiple domUs losing their network connection/VIFs. From http://blog.windfluechter.net/content/blog/2013/02/26/1597-xen-problems-vms-2632-5-xen-amd64 Unfortunately this update appears to be problematic on my Xen hosting server. This night it happened the second time that some of the virtual network interfaces disappeared or turned out to be non-working. For example I have two VMs: one running the webserver and one running the databases. Between these two VMs there's a bridge on the dom0 and both VMs have a VIF to that (internal) bridge. What happens is that this bridge becomes inaccessible from within the webserver VM. Sadly there's not much to see in the log files. I just spotted this on dom0: Feb 26 01:01:29 gate kernel: [12697.907512] vif3.1: Frag is bigger than frame. Feb 26 01:01:29 gate kernel: [12697.907550] vif3.1: fatal error; disabling device Feb 26 01:01:29 gate kernel: [12697.919921] xenbr1: port 3(vif3.1) entering disabled state Feb 26 01:22:00 gate kernel: [13928.644888] vif2.1: Frag is bigger than frame. Feb 26 01:22:00 gate kernel: [13928.644920] vif2.1: fatal error; disabling device Feb 26 01:22:00 gate kernel: [13928.663571] xenbr1: port 2(vif2.1) entering disabled state Feb 26 01:40:44 gate kernel: [15052.629280] vif7.1: Frag is bigger than frame. Feb 26 01:40:44 gate kernel: [15052.629314] vif7.1: fatal error; disabling device Feb 26 01:40:44 gate kernel: [15052.641725] xenbr1: port 6(vif7.1) entering disabled state This corresponds to the number of VMs having lost their internal connection to the bridge. On the webserver VM I see this output: Feb 26 01:59:01 vserv1 kernel: [16113.539767] IPv6: sending pkt_too_big to self Feb 26 01:59:01 vserv1 kernel: [16113.539794] IPv6: sending pkt_too_big to self Feb 26 02:30:54 vserv1 kernel: [18026.407517] IPv6: sending pkt_too_big to self Feb 26 02:30:54 vserv1 kernel: [18026.407546] IPv6: sending pkt_too_big to self Feb 26 02:30:54 vserv1 kernel: [18026.434761] IPv6: sending pkt_too_big to self Feb 26 02:30:54 vserv1 kernel: [18026.434787] IPv6: sending pkt_too_big to self Feb 26 03:39:16 vserv1 kernel: [22128.768214] IPv6: sending pkt_too_big to self Feb 26 03:39:16 vserv1 kernel: [22128.768240] IPv6: sending pkt_too_big to self Feb 26 04:39:51 vserv1 kernel: [25764.250170] IPv6: sending pkt_too_big to self Feb 26 04:39:51 vserv1 kernel: [25764.250196] IPv6: sending pkt_too_big to self Rebooting the VMs will result in a non-working VM as it will get paused on creation and Xen scripts complain about not working hotplug scripts and Xen logs shows this: [2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:101) XendDomainInfo.create(['vm', ['name', 'vserv1'], ['memory', '2048'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'], ['vcpus', '2'], ['oos', 1], ['bootloader', '/usr/lib/xen-4.0/bin/pygrub'], ['bootloader_args', ''], ['image', ['linux', ['root', '/dev/xvdb '], ['videoram', 4], ['tsc_mode', 0], ['nomigrate', 0]]], ['s3_integrity', 1], ['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-boot'], ['dev', 'xvda'], ['mode', 'w']]], ['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-disk'], ['dev', 'xvdb'], ['mode', 'w']]], ['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-swap'], ['dev', 'xvdc'], ['mode', 'w']]], ['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1mirror'], ['dev', 'xvdd'], ['mode', 'w']]]]) [2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:2508) XendDomainInfo.constructDomain [2013-02-25 13:06:34 5470] DEBUG (balloon:220) Balloon: 2100000 KiB free; need 16384; done. [2013-02-25 13:06:34 5470] DEBUG (XendDomain:464) Adding Domain: 39 [2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:2818) XendDomainInfo.initDomain: 39 256 [2013-02-25 13:06:34 5781] DEBUG (XendBootloader:113) Launching bootloader as ['/usr/lib/xen-4.0/bin/pygrub', '--args=root=/dev/xvdb ', '--output=/var/run/xend/boot/xenbl.6040', '/dev/lv/vserv1-boot']. [2013-02-25 13:06:39 5470] DEBUG (XendDomainInfo:2845) _initDomain:shadow_memory=0x0, memory_static_max=0x80000000, memory_static_min=0x0. [2013-02-25 13:06:39 5470] INFO (image:182) buildDomain os=linux dom=39 vcpus=2 [2013-02-25 13:06:39 5470] DEBUG (image:721) domid = 39 [2013-02-25 13:06:39 5470] DEBUG (image:722) memsize = 2048 [2013-02-25 13:06:39 5470] DEBUG (image:723) image /var/run/xend/boot/boot_kernel.xj7W_t [2013-02-25 13:06:39 5470] DEBUG (image:724) store_evtchn = 1 [2013-02-25 13:06:39 5470] DEBUG (image:725) console_evtchn = 2 [2013-02-25 13:06:39 5470] DEBUG (image:726) cmdline root=UUID=ed71a39f-fd2e-4035-8557-493686baa151 ro root=/dev/xvdb [2013-02-25 13:06:39 5470] DEBUG (image:727) ramdisk /var/run/xend/boot/boot_ramdisk.QavuAo [2013-02-25 13:06:39 5470] DEBUG (image:728) vcpus = 2 [2013-02-25 13:06:39 5470] DEBUG (image:729) features [2013-02-25 13:06:39 5470] DEBUG (image:730) flags = 0 [2013-02-25 13:06:39 5470] DEBUG (image:731) superpages = 0 [2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd : {'uuid': '04d99772-cf27-aecf-2d1b-c73eaf657410', 'bootable': 1, 'driver': 'paravirtualised', 'dev': 'xvda', 'uname': 'phy:/dev/lv/vserv1-boot', 'mode': 'w'} [2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing {'virtual-device': '51712', 'device-type': 'disk', 'protocol': 'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/39/51712'} to /local/domain/39/device/vbd/51712. [2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing {'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51712', 'uuid': '04d99772-cf27-aecf-2d1b-c73eaf657410', 'bootable': '1', 'dev': 'xvda', 'state': '1', 'params': '/dev/lv/vserv1-boot', 'mode': 'w', 'online': '1', 'frontend-id': '39', 'type': 'phy'} to /local/domain/0/backend/vbd/39/51712. [2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd : {'uuid': 'e46cb89f-3e54-41d2-53bd-759ed6c690d2', 'bootable': 0, 'driver': 'paravirtualised', 'dev': 'xvdb', 'uname': 'phy:/dev/lv/vserv1-disk', 'mode': 'w'} [2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing {'virtual-device': '51728', 'device-type': 'disk', 'protocol': 'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/39/51728'} to /local/domain/39/device/vbd/51728. [2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing {'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51728', 'uuid': 'e46cb89f-3e54-41d2-53bd-759ed6c690d2', 'bootable': '0', 'dev': 'xvdb', 'state': '1', 'params': '/dev/lv/vserv1-disk', 'mode': 'w', 'online': '1', 'frontend-id': '39', 'type': 'phy'} to /local/domain/0/backend/vbd/39/51728. [2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd : {'uuid': 'e2d61860-7448-1843-3935-6b63c5d2878e', 'bootable': 0, 'driver': 'paravirtualised', 'dev': 'xvdc', 'uname': 'phy:/dev/lv/vserv1-swap', 'mode': 'w'} [2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing {'virtual-device': '51744', 'device-type': 'disk', 'protocol': 'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/39/51744'} to /local/domain/39/device/vbd/51744. [2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing {'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51744', 'uuid': 'e2d61860-7448-1843-3935-6b63c5d2878e', 'bootable': '0', 'dev': 'xvdc', 'state': '1', 'params': '/dev/lv/vserv1-swap', 'mode': 'w', 'online': '1', 'frontend-id': '39', 'type': 'phy'} to /local/domain/0/backend/vbd/39/51744. [2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd : {'uuid': 'd314a46e-1ce9-0e8d-b009-3f08e29735f5', 'bootable': 0, 'driver': 'paravirtualised', 'dev': 'xvdd', 'uname': 'phy:/dev/lv/vserv1mirror', 'mode': 'w'} [2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing {'virtual-device': '51760', 'device-type': 'disk', 'protocol': 'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend': '/local/domain/0/backend/vbd/39/51760'} to /local/domain/39/device/vbd/51760. [2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing {'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51760', 'uuid': 'd314a46e-1ce9-0e8d-b009-3f08e29735f5', 'bootable': '0', 'dev': 'xvdd', 'state': '1', 'params': '/dev/lv/vserv1mirror', 'mode': 'w', 'online': '1', 'frontend-id': '39', 'type': 'phy'} to /local/domain/0/backend/vbd/39/51760. [2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:3400) Storing VM details: {'on_xend_stop': 'ignore', 'shadow_memory': '0', 'uuid': '04541225-6d3c-3cae-a4c4-0b6d4ccfac7a', 'on_reboot': 'restart', 'start_time': '1361794000.37', 'on_poweroff': 'destroy', 'bootloader_args': '', 'on_xend_start': 'ignore', 'on_crash': 'restart', 'xend/restart_count': '0', 'vcpus': '2', 'vcpu_avail': '3', 'bootloader': '/usr/lib/xen-4.0/bin/pygrub', 'image': "(linux (kernel ) (args 'root=/dev/xvdb ') (superpages 0) (tsc_mode 0) (videoram 4) (pci ()) (nomigrate 0) (notes (HV_START_LOW 18446603336221196288) (FEATURES '!writable_page_tables|pae_pgdir_above_4gb') (VIRT_BASE 18446744071562067968) (GUEST_VERSION 2.6) (PADDR_OFFSET 0) (GUEST_OS linux) (HYPERCALL_PAGE 18446744071578882048) (LOADER generic) (SUSPEND_CANCEL 1) (PAE_MODE yes) (ENTRY 18446744071584289280) (XEN_VERSION xen-3.0)))", 'name': 'vserv1'} [2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:1804) Storing domain details: {'console/ring-ref': '2143834', 'image/entry': '18446744071584289280', 'console/port': '2', 'store/ring-ref': '2143835', 'image/loader': 'generic', 'vm': '/vm/04541225-6d3c-3cae-a4c4-0b6d4ccfac7a', 'control/platform-feature-multiprocessor-suspend': '1', 'image/hv-start-low': '18446603336221196288', 'image/guest-os': 'linux', 'cpu/1/availability': 'online', 'image/virt-base': '18446744071562067968', 'memory/target': '2097152', 'image/guest-version': '2.6', 'image/pae-mode': 'yes', 'description': '', 'console/limit': '1048576', 'image/paddr-offset': '0', 'image/hypercall-page': '18446744071578882048', 'image/suspend-cancel': '1', 'cpu/0/availability': 'online', 'image/features/pae-pgdir-above-4gb': '1', 'image/features/writable-page-tables': '0', 'console/type': 'xenconsoled', 'name': 'vserv1', 'domid': '39', 'image/xen-version': 'xen-3.0', 'store/port': '1'} [2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing {'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend': '/local/domain/0/backend/console/39/0'} to /local/domain/39/device/console/0. [2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing {'domain': 'vserv1', 'frontend': '/local/domain/39/device/console/0', 'uuid': 'c8819aed-c78f-02b8-0ef7-1600abd15add', 'frontend-id': '39', 'state': '1', 'location': '2', 'online': '1', 'protocol': 'vt100'} to /local/domain/0/backend/console/39/0. [2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:1891) XendDomainInfo.handleShutdownWatch [2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices vif2. [2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices vif. [2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices vscsi. [2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices vbd. [2013-02-25 13:06:40 5470] DEBUG (DevController:144) Waiting for 51712. [2013-02-25 13:06:40 5470] DEBUG (DevController:628) hotplugStatusCallback /local/domain/0/backend/vbd/39/51712/hotplug-status. From my point of view, either Xen hypervisor or the kernel seems to be broken, but it's hard to tell for me. I suspect the problem within the Xen kernel part of VIF code as a reboot of the dom0 solves this problem temporarily without touching the domUs. But within some hours (<6 hrs) the issue re-appears. Although I assume that xend is responsible for adding/removing VIFs a restart of xend doesn't help at all. That's why I assume a kernel problem within the dom0. I'm running 8 domUs at the moment, each of them is connected to the outer world through xenbr0 and connected to the internal world through xenbr1 and RFC1918 addresses. I'm running a mixed setup of routed and bridged config: (vif-script vif-bridge) (network-script network-route) But the server ran several years with that setup without any problems, so I don't think that's an issue. For now I'm forced to go back to a working kernel as I need to keep the server up and running. --- System information. --- Architecture: amd64 Kernel: Linux 2.6.32-5-xen-amd64 gate:~# dpkg -l | grep xen ii libxenstore3.0 4.0.1-5.6 Xenstore communications library for Xen ii linux-image-2.6.32-5-xen-amd64 2.6.32-48 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii xen-hypervisor-4.0-amd64 4.0.1-5.6 The Xen Hypervisor on AMD64 ii xen-linux-system-2.6-xen-amd64 2.6.32+29 Xen system with Linux 2.6 for 64-bit PCs (meta-package) ii xen-linux-system-2.6.32-5-xen-amd64 2.6.32-48 Xen system with Linux 2.6.32 on 64-bit PCs (meta-package) ii xen-tools 4.2-1 Tools to manage Xen virtual servers ii xen-utils-4.0 4.0.1-5.6 XEN administrative tools ii xen-utils-common 4.0.0-1 XEN administrative tools - common files ii xenstore-utils 4.0.1-5.6 Xenstore utilities for Xen ii xenwatch 0.5.4-2 Virtualization utilities, mostly for Xen -- Ciao... // Fon: 0381-2744150 Ingo \X/ http://blog.windfluechter.net Please don't share this address with Facebook or Google! gpg pubkey: http://www.juergensmann.de/ij_public_key.asc
Ian Campbell
2013-Feb-26 18:19 UTC
[Pkg-xen-devel] Bug#701744: Bug#701744: [xen] Update to hypervisor 4.0.1-5.6 or linux-image-2.6.32-5-xen-amd64 2.6.32-48 causes networking (VIF) failures
On Tue, 2013-02-26 at 18:42 +0100, Ingo Juergensmann wrote:> > Since the update last weekind in stable/squeeze I'm experiencing > problems with running Xen on amd64 and multiple domUs losing their > network connection/VIFs.The hypervisors involvement in the specifics of the networking is pretty minimal -- a kernel bug is much more likely IMHO. In particular the messages you are seeing look a lot like those which would result from http://wiki.xen.org/wiki/Security_Announcements#XSA_39_Linux_netback_DoS_via_malicious_guest_ring.. So, was the hypervisor upgrade also accompanied by a kernel update, in either the dom0 or guest domains? If so what versions were involved and where? Thanks, Ian -- Ian Campbell pain, n.: One thing, at least it proves that you're alive! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: <http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20130226/85801bbb/attachment.pgp>