Andrew Perry
2015-May-26 22:44 UTC
[Pkg-xen-devel] Bug#786936: xen-hypervisor-4.4-amd64: Upgrade dom0 from wheezy to jessie on Dell R610 results in dom0 unaccessible with xen_netback issue
Package: xen-hypervisor-4.4-amd64 Version: 4.4.1-9 Severity: critical Justification: breaks the whole system Dear Maintainer, After upgrading the R610 server from Debian 7 to Debian 8, the dom0 becomes unresponsive via ssh after an hour or so, although the domUs still remain accessible. Initially we thought it may be a disk space issue on / or /boot so action was taken to increase those petition sizes but it has no effect. We get the following trace in /var/log/syslog: May 26 09:18:59 servername kernel: [31526.937788] BUG: unable to handle kernel paging request at ffffc90013a4b158 May 26 09:18:59 servername kernel: [31526.937798] IP: [<ffffffffa06802a0>] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback] May 26 09:18:59 servername kernel: [31526.937807] PGD b243c067 PUD b243d067 PMD 8a56c067 PTE 0 May 26 09:18:59 servername kernel: [31526.937813] Oops: 0000 [#1] SMP May 26 09:18:59 servername kernel: [31526.937817] Modules linked in: dm_snapshot dm_bufio binfmt_misc xt_tcpudp xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc nls_utf8 nls_cp437 vfat fat joydev intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel ttm evdev aesni_intel ipmi_devintf iTCO_wdt iTCO_vendor_support aes_x86_64 drm_kms_helper acpi_power_meter dcdbas lrw gf128mul glue_helper tpm_tis tpm drm i2c_algo_bit ablk_helper processor i2c_core lpc_ich ipmi_si ipmi_msghandler i7core_edac thermal_sys cryptd mfd_core button psmouse pcspkr serio_raw shpchp wmi edac_core loop autofs4 ext4 crc16 mbcache jbd2 dm_mod hid_generic usbhid hid sg sr_mod cdrom ses sd_mod enclosure ata_generic crc32c_intel lpfc crc_t10dif crct10dif_generic ehci_pci uhci_hcd crct10dif_pclmul ata_piix ehci_hcd scsi_transport_fc libata megaraid_sas scsi_tgt usbcore scsi_mod usb_common crct10dif_common bnx2 May 26 09:18:59 servername kernel: [31526.937917] CPU: 0 PID: 1311 Comm: snmpd Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1 May 26 09:18:59 servername kernel: [31526.937922] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013 May 26 09:18:59 servername kernel: [31526.937927] task: ffff88008a86a250 ti: ffff880002b4c000 task.ti: ffff880002b4c000 May 26 09:18:59 servername kernel: [31526.937931] RIP: e030:[<ffffffffa06802a0>] [<ffffffffa06802a0>] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback] May 26 09:18:59 servername kernel: [31526.937939] RSP: e02b:ffff880002b4fd70 EFLAGS: 00010283 May 26 09:18:59 servername kernel: [31526.937942] RAX: ffffc90013a14f38 RBX: 000000000230f940 RCX: ffff92008ea28c88 May 26 09:18:59 servername kernel: [31526.937946] RDX: ffff88008ecadc00 RSI: ffffc90013a4b190 RDI: ffff88008da7c000 May 26 09:18:59 servername kernel: [31526.937949] RBP: ffff880002b4fe10 R08: ffffffffa06827e0 R09: 0000000000000006 May 26 09:18:59 servername kernel: [31526.937953] R10: 000000000010ebb8 R11: 0000000000000246 R12: 0000000000000005 May 26 09:18:59 servername kernel: [31526.937957] R13: ffff88008da7c000 R14: ffffffffa0682640 R15: ffff88008ecadc00 May 26 09:18:59 servername kernel: [31526.937965] FS: 00007f93bcc9e700(0000) GS:ffff8800b2a00000(0000) knlGS:0000000000000000 May 26 09:18:59 servername kernel: [31526.937969] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b May 26 09:18:59 servername kernel: [31526.937973] CR2: ffffc90013a4b158 CR3: 00000000899ff000 CR4: 0000000000002660 May 26 09:18:59 servername kernel: [31526.937977] Stack: May 26 09:18:59 servername kernel: [31526.937979] ffffffff814225f1 0000000400114813 00007fff3fff32a8 0000000000000000 May 26 09:18:59 servername kernel: [31526.937985] ffff880002b4ff18 0000001d3fff32a0 ffff880002b4fde0 ffffffff814039a6 May 26 09:18:59 servername kernel: [31526.937990] 000000050000001d ffff880000000005 ffffffff81420455 00007fff3fff3280 May 26 09:18:59 servername kernel: [31526.937995] Call Trace: May 26 09:18:59 servername kernel: [31526.938003] [<ffffffff814225f1>] ? dev_ethtool+0x921/0x1ac0 May 26 09:18:59 servername kernel: [31526.938009] [<ffffffff814039a6>] ? ___sys_recvmsg+0x136/0x2a0 May 26 09:18:59 servername kernel: [31526.938014] [<ffffffff81420455>] ? netdev_run_todo+0x55/0x2f0 May 26 09:18:59 servername kernel: [31526.938020] [<ffffffff8143310f>] ? dev_ioctl+0x19f/0x590 May 26 09:18:59 servername kernel: [31526.938026] [<ffffffff8118e148>] ? kfree+0x118/0x220 May 26 09:18:59 servername kernel: [31526.938033] [<ffffffff811e330a>] ? fsnotify_clear_marks_by_inode+0x2a/0x110 May 26 09:18:59 servername kernel: [31526.938038] [<ffffffff814011fd>] ? sock_do_ioctl+0x3d/0x50 May 26 09:18:59 servername kernel: [31526.938043] [<ffffffff81401718>] ? sock_ioctl+0x1e8/0x2c0 May 26 09:18:59 servername kernel: [31526.938048] [<ffffffff811ba2ff>] ? do_vfs_ioctl+0x2cf/0x4b0 May 26 09:18:59 servername kernel: [31526.938054] [<ffffffff8108510c>] ? task_work_run+0x9c/0xd0 May 26 09:18:59 servername kernel: [31526.938059] [<ffffffff811ba561>] ? SyS_ioctl+0x81/0xa0 May 26 09:18:59 servername kernel: [31526.938065] [<ffffffff8151110a>] ? int_signal+0x12/0x17 May 26 09:18:59 servername kernel: [31526.938070] [<ffffffff81510e4d>] ? system_call_fast_compare_end+0x10/0x15 May 26 09:18:59 servername kernel: [31526.938073] Code: 41 0f b7 30 48 8b 8f f8 08 00 00 48 8d 04 f5 00 00 00 00 48 c1 e6 06 48 29 c6 48 8d 04 31 4a 8d 8c 11 58 62 03 00 48 01 ce 31 c9 <48> 03 88 20 62 03 00 48 05 58 62 03 00 48 39 f0 75 ee 49 83 c0 May 26 09:18:59 servername kernel: [31526.938109] RIP [<ffffffffa06802a0>] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback] May 26 09:18:59 servername kernel: [31526.938116] RSP <ffff880002b4fd70> May 26 09:18:59 servername kernel: [31526.938118] CR2: ffffc90013a4b158 May 26 09:18:59 servername kernel: [31526.938124] ---[ end trace b709685b97b0c981 ] ---- System Information: Debian Release: 8.0 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/24 CPU cores) Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.us-ascii (charmap=UTF-8) (ignored: LC_ALL set to en_AU.UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) xen-hypervisor-4.4-amd64 depends on no packages. Versions of packages xen-hypervisor-4.4-amd64 recommends: ii xen-utils-4.4 4.4.1-9 xen-hypervisor-4.4-amd64 suggests no packages. -- no debconf information
Ian Campbell
2015-May-30 08:35 UTC
[Pkg-xen-devel] Bug#786936: Bug#786936: xen-hypervisor-4.4-amd64: Upgrade dom0 from wheezy to jessie on Dell R610 results in dom0 unaccessible with xen_netback issue
Control: reassign -1 linux-image-3.16.0-4-amd64 3.16.7-ckt9-3~deb8u1 On Wed, 2015-05-27 at 08:44 +1000, Andrew Perry wrote:> Package: xen-hypervisor-4.4-amd64 > Version: 4.4.1-9 > Severity: critical > Justification: breaks the whole system > > Dear Maintainer, > > After upgrading the R610 server from Debian 7 to Debian 8, the dom0 > becomes unresponsive via ssh after an hour or so, although the domUs > still remain accessible. > > Initially we thought it may be a disk space issue on / or /boot so > action was taken to increase those petition sizes but it has no > effect. > > We get the following trace in /var/log/syslog: > > May 26 09:18:59 servername kernel: [31526.937788] BUG: unable to handle kernel paging request at ffffc90013a4b158 > May 26 09:18:59 servername kernel: [31526.937798] IP: [<ffffffffa06802a0>] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback]This appears to be a dom0 kernel issue rather than a hypervisor issue, I've (hopefully) reassigned accordingly. While we work out a proper fix, since the error appears to be in the ethtool stats gathering code I suspect that there might be a workaround which would be to disable whichever code in dom0 (a monitoring daemon like nagios perhaps?) is calling this path.> May 26 09:18:59 servername kernel: [31526.937807] PGD b243c067 PUD b243d067 PMD 8a56c067 PTE 0 > May 26 09:18:59 servername kernel: [31526.937813] Oops: 0000 [#1] SMP > May 26 09:18:59 servername kernel: [31526.937817] Modules linked in: dm_snapshot dm_bufio binfmt_misc xt_tcpudp xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp llc nls_utf8 nls_cp437 vfat fat joydev intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel ttm evdev aesni_intel ipmi_devintf iTCO_wdt iTCO_vendor_support aes_x86_64 drm_kms_helper acpi_power_meter dcdbas lrw gf128mul glue_helper tpm_tis tpm drm i2c_algo_bit ablk_helper processor i2c_core lpc_ich ipmi_si ipmi_msghandler i7core_edac thermal_sys cryptd mfd_core button psmouse pcspkr serio_raw shpchp wmi edac_core loop autofs4 ext4 crc16 mbcache jbd2 dm_mod hid_generic usbhid hid sg sr_mod cdrom ses sd_mod enclosure ata_generic crc32c_intel lpfc crc_t10dif crct10dif_generic ehci_pci uhci_hcd crct10dif_pclmul ata_piix ehci_hcd scsi_transport_fc libata megaraid_sas scsi_tgt usbcore scsi_mod usb_common crct10dif_common bnx2 > May 26 09:18:59 servername kernel: [31526.937917] CPU: 0 PID: 1311 Comm: snmpd Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1 > May 26 09:18:59 servername kernel: [31526.937922] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013 > May 26 09:18:59 servername kernel: [31526.937927] task: ffff88008a86a250 ti: ffff880002b4c000 task.ti: ffff880002b4c000 > May 26 09:18:59 servername kernel: [31526.937931] RIP: e030:[<ffffffffa06802a0>] [<ffffffffa06802a0>] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback] > May 26 09:18:59 servername kernel: [31526.937939] RSP: e02b:ffff880002b4fd70 EFLAGS: 00010283 > May 26 09:18:59 servername kernel: [31526.937942] RAX: ffffc90013a14f38 RBX: 000000000230f940 RCX: ffff92008ea28c88 > May 26 09:18:59 servername kernel: [31526.937946] RDX: ffff88008ecadc00 RSI: ffffc90013a4b190 RDI: ffff88008da7c000 > May 26 09:18:59 servername kernel: [31526.937949] RBP: ffff880002b4fe10 R08: ffffffffa06827e0 R09: 0000000000000006 > May 26 09:18:59 servername kernel: [31526.937953] R10: 000000000010ebb8 R11: 0000000000000246 R12: 0000000000000005 > May 26 09:18:59 servername kernel: [31526.937957] R13: ffff88008da7c000 R14: ffffffffa0682640 R15: ffff88008ecadc00 > May 26 09:18:59 servername kernel: [31526.937965] FS: 00007f93bcc9e700(0000) GS:ffff8800b2a00000(0000) knlGS:0000000000000000 > May 26 09:18:59 servername kernel: [31526.937969] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > May 26 09:18:59 servername kernel: [31526.937973] CR2: ffffc90013a4b158 CR3: 00000000899ff000 CR4: 0000000000002660 > May 26 09:18:59 servername kernel: [31526.937977] Stack: > May 26 09:18:59 servername kernel: [31526.937979] ffffffff814225f1 0000000400114813 00007fff3fff32a8 0000000000000000 > May 26 09:18:59 servername kernel: [31526.937985] ffff880002b4ff18 0000001d3fff32a0 ffff880002b4fde0 ffffffff814039a6 > May 26 09:18:59 servername kernel: [31526.937990] 000000050000001d ffff880000000005 ffffffff81420455 00007fff3fff3280 > May 26 09:18:59 servername kernel: [31526.937995] Call Trace: > May 26 09:18:59 servername kernel: [31526.938003] [<ffffffff814225f1>] ? dev_ethtool+0x921/0x1ac0 > May 26 09:18:59 servername kernel: [31526.938009] [<ffffffff814039a6>] ? ___sys_recvmsg+0x136/0x2a0 > May 26 09:18:59 servername kernel: [31526.938014] [<ffffffff81420455>] ? netdev_run_todo+0x55/0x2f0 > May 26 09:18:59 servername kernel: [31526.938020] [<ffffffff8143310f>] ? dev_ioctl+0x19f/0x590 > May 26 09:18:59 servername kernel: [31526.938026] [<ffffffff8118e148>] ? kfree+0x118/0x220 > May 26 09:18:59 servername kernel: [31526.938033] [<ffffffff811e330a>] ? fsnotify_clear_marks_by_inode+0x2a/0x110 > May 26 09:18:59 servername kernel: [31526.938038] [<ffffffff814011fd>] ? sock_do_ioctl+0x3d/0x50 > May 26 09:18:59 servername kernel: [31526.938043] [<ffffffff81401718>] ? sock_ioctl+0x1e8/0x2c0 > May 26 09:18:59 servername kernel: [31526.938048] [<ffffffff811ba2ff>] ? do_vfs_ioctl+0x2cf/0x4b0 > May 26 09:18:59 servername kernel: [31526.938054] [<ffffffff8108510c>] ? task_work_run+0x9c/0xd0 > May 26 09:18:59 servername kernel: [31526.938059] [<ffffffff811ba561>] ? SyS_ioctl+0x81/0xa0 > May 26 09:18:59 servername kernel: [31526.938065] [<ffffffff8151110a>] ? int_signal+0x12/0x17 > May 26 09:18:59 servername kernel: [31526.938070] [<ffffffff81510e4d>] ? system_call_fast_compare_end+0x10/0x15 > May 26 09:18:59 servername kernel: [31526.938073] Code: 41 0f b7 30 48 8b 8f f8 08 00 00 48 8d 04 f5 00 00 00 00 48 c1 e6 06 48 29 c6 48 8d 04 31 4a 8d 8c 11 58 62 03 00 48 01 ce 31 c9 <48> 03 88 20 62 03 00 48 05 58 62 03 00 48 39 f0 75 ee 49 83 c0 > May 26 09:18:59 servername kernel: [31526.938109] RIP [<ffffffffa06802a0>] xenvif_get_ethtool_stats+0x50/0x80 [xen_netback] > May 26 09:18:59 servername kernel: [31526.938116] RSP <ffff880002b4fd70> > May 26 09:18:59 servername kernel: [31526.938118] CR2: ffffc90013a4b158 > May 26 09:18:59 servername kernel: [31526.938124] ---[ end trace b709685b97b0c981 ] > > ---- System Information: > Debian Release: 8.0 > APT prefers stable > APT policy: (500, 'stable') > Architecture: amd64 (x86_64) > > Kernel: Linux 3.16.0-4-amd64 (SMP w/24 CPU cores) > Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.us-ascii (charmap=UTF-8) (ignored: LC_ALL set to en_AU.UTF-8) > Shell: /bin/sh linked to /bin/dash > Init: systemd (via /run/systemd/system) > > xen-hypervisor-4.4-amd64 depends on no packages. > > Versions of packages xen-hypervisor-4.4-amd64 recommends: > ii xen-utils-4.4 4.4.1-9 > > xen-hypervisor-4.4-amd64 suggests no packages. > > -- no debconf information > > _______________________________________________ > Pkg-xen-devel mailing list > Pkg-xen-devel at lists.alioth.debian.org > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-xen-devel >
Debian Bug Tracking System
2015-May-30 08:39 UTC
[Pkg-xen-devel] Processed: Re: Bug#786936: xen-hypervisor-4.4-amd64: Upgrade dom0 from wheezy to jessie on Dell R610 results in dom0 unaccessible with xen_netback issue
Processing control commands:> reassign -1 linux-image-3.16.0-4-amd64 3.16.7-ckt9-3~deb8u1Bug #786936 [xen-hypervisor-4.4-amd64] xen-hypervisor-4.4-amd64: Upgrade dom0 from wheezy to jessie on Dell R610 results in dom0 unaccessible with xen_netback issue Bug reassigned from package 'xen-hypervisor-4.4-amd64' to 'linux-image-3.16.0-4-amd64'. No longer marked as found in versions xen/4.4.1-9. Ignoring request to alter fixed versions of bug #786936 to the same values previously set Bug #786936 [linux-image-3.16.0-4-amd64] xen-hypervisor-4.4-amd64: Upgrade dom0 from wheezy to jessie on Dell R610 results in dom0 unaccessible with xen_netback issue Marked as found in versions linux/3.16.7-ckt9-3~deb8u1. -- 786936: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786936 Debian Bug Tracking System Contact owner at bugs.debian.org with problems
Apparently Analagous Threads
- where is xen_blkback xen_netback.ko source code?
- lpxelinux hangs on R610 with recent BIOS, works with old BIOS
- Accessing console for Xen 4.0 with 2.6.31 pvops kernel on Dell Poweredge R610
- The strange case of xen_netback not returning ARP replies
- xenwatch: page allocation failure: order:4, mode:0x10c0d0 xen_netback:xenvif_alloc: Could not allocate netdev for vif16.0