I was experimenting with DomU redundancy and load balancing,
and I think this GPF started to show up after a couple of DomUs
with CARP and HAProxy were added that constantly generate
a strong flow of network traffic by pinging target machines
and each other as well. Or may be it is not related to CARP
and pinging, but just depends on traffic volume: the more VMs
added and running, the more chances that Dom0-DomU networking
will collapse, the critical point being 8 guest domains, while I need 10.
I can''t give exact steps to reproduce, as it happens randomly,
usually without any correlated user activity, after several hours
(or several minutes) of normal performance. But sometimes
it happens not so long after a balancer''s DomU startup or shutdown.
After GPF happens, all VMs loose their networking connectivity.
Dom0 is openSUSE 12.1 on AMD64 (Linux 3.1.0-1.2-xen)
with Xen version 4.1.2_05-1.9, which is patched as described
in openSUSE bug 727081 (bugzilla.novell.com/show_bug.cgi?id=727081).
Supposedly "offending" DomU is paravirtualized NetBSD 5.1.1
for AMD64 with recompiled kernel (CARP enabled, no more changes).
Other VMs are openSUSE 11.4 and 12.1 for AMD64.
Trace log in /var/log/messages always looks similar (varying digits
replaced with asterisks ***):
general protection fault: 0000 [#1] SMP
CPU {core-number}
Modules linked in: 8250 8250_pnp af_packet asus_wmi ata_generic
blkback_pagemap blkbk blktap bridge btrfs button cdrom dm_mod
domctl drm drm_kms_helper edd eeepc_wmi ehci_hcd evtchn fuse
gntdev hid hwmon i2c_algo_bit i2c_core i2c_i801 i915
iTCO_vendor_support iTCO_wdt linear llc lzo_compress mei(C)
microcode netbk parport parport_pc pata_via pci_hotplug pcspkr
ppdev processor r8169 rfkill serial_core [serio_raw] sg
snd snd_hda_codec snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_intel snd_hwdep snd_mixer_oss snd_page_alloc snd_pcm
snd_pcm_oss snd_seq snd_seq_device snd_timer soundcore
sparse_keymap sr_mod stp thermal_sys uas usbbk usbcore
usbhid usb_storage video wmi xenblk xenbus_be xennet zlib_deflate
Pid: {process-id}, comm: netback/{0/1} Tainted: G
C 3.1.0-1.2-xen #1 System manufacturer System Product Name/P8H67-M
RIP: e030:[<ffffffff803e7451>] [<ffffffff803e7451>]
skb_release_data.part.47+0x61/0xc0
RSP: e02b:ffff880******d40 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880********0 RCX: ffff880******000
RDX: {..RCX.+.0e80..} RSI: 00000000000000** RDI: 00***c**00000000
RBP: {.....RBX......} R08: {..RCX.-.cff0..} R09: 0000000*********
R10: 000000000000000* R11: {.task.+.0470..} R12: ffff880026a51000
R13: ffff880********0 R14: ffffc900048****0 R15: 0000000000000001
FS: 00007f*******7*0(0000) GS:ffff880******000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000***********0 CR3: 0000000******000 CR4: 0000000000042660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netback/{0/1} (pid: {process-id}, threadinfo ffff880******000,
task ffff880********0)
Stack:
0000000000000000 {.....RBX......} 0000000000000000 ffffffff803e7511
{.....RBX......} ffffffffa0***d2c {.....task.....} {thread.+.1e00.}
{thread.+.1db0.} {.R14.-.22a40..} ffffc9000000000* 0000000000000000
Call Trace:
[<ffffffff803e7511>] __kfree_skb+0x11/0x20
[<ffffffffa0***d2c>] net_rx_action+0x66c/0x9c0 [netbk]
[<ffffffffa0***72a>] netbk_action_thread+0x5a/0x270 [netbk]
[<ffffffff8006438e>] kthread+0x7e/0x90
[<ffffffff8050f814>] kernel_thread_helper+0x4/0x10
Code: 48 8b 7c 02 08 e8 90 69 cf ff 8b 95 d0 00 00 00
48 8b 8d d8 00 00 00 48 01 ca 0f b7 02 39 c3 7c
d1 f6 42 0c 10 74 1e 48 8b 7a 30
RIP [<ffffffff803e7451>] skb_release_data.part.47+0x61/0xc0
RSP <ffff880******d40>
---[ end trace **************** ]---
Preceeding and subsequent messages don''t seem to be related with GPF,
time gap is from minutes to half an hour or even more. But if this could give
some insight, I will post them, too.
On Fri, Feb 03, 2012 at 07:32:40PM +0300, Anton Samsonov wrote:> I was experimenting with DomU redundancy and load balancing, > and I think this GPF started to show up after a couple of DomUs > with CARP and HAProxy were added that constantly generate > a strong flow of network traffic by pinging target machines > and each other as well. Or may be it is not related to CARP > and pinging, but just depends on traffic volume: the more VMs > added and running, the more chances that Dom0-DomU networking > will collapse, the critical point being 8 guest domains, while I need 10. > > I can''t give exact steps to reproduce, as it happens randomly, > usually without any correlated user activity, after several hours > (or several minutes) of normal performance. But sometimes > it happens not so long after a balancer''s DomU startup or shutdown. > After GPF happens, all VMs loose their networking connectivity. > > Dom0 is openSUSE 12.1 on AMD64 (Linux 3.1.0-1.2-xen)Do you get the same issue with a pv-ops dom0? So also 3.1, but from kernel.org?> with Xen version 4.1.2_05-1.9, which is patched as described > in openSUSE bug 727081 (bugzilla.novell.com/show_bug.cgi?id=727081). > Supposedly "offending" DomU is paravirtualized NetBSD 5.1.1 > for AMD64 with recompiled kernel (CARP enabled, no more changes).What is CARP?> Other VMs are openSUSE 11.4 and 12.1 for AMD64. > > > Trace log in /var/log/messages always looks similar (varying digits > replaced with asterisks ***): > > > general protection fault: 0000 [#1] SMP > CPU {core-number} > Modules linked in: 8250 8250_pnp af_packet asus_wmi ata_generic > blkback_pagemap blkbk blktap bridge btrfs button cdrom dm_mod > domctl drm drm_kms_helper edd eeepc_wmi ehci_hcd evtchn fuse > gntdev hid hwmon i2c_algo_bit i2c_core i2c_i801 i915 > iTCO_vendor_support iTCO_wdt linear llc lzo_compress mei(C) > microcode netbk parport parport_pc pata_via pci_hotplug pcspkr > ppdev processor r8169 rfkill serial_core [serio_raw] sg > snd snd_hda_codec snd_hda_codec_hdmi snd_hda_codec_realtek > snd_hda_intel snd_hwdep snd_mixer_oss snd_page_alloc snd_pcm > snd_pcm_oss snd_seq snd_seq_device snd_timer soundcore > sparse_keymap sr_mod stp thermal_sys uas usbbk usbcore > usbhid usb_storage video wmi xenblk xenbus_be xennet zlib_deflate > > Pid: {process-id}, comm: netback/{0/1} Tainted: G > C 3.1.0-1.2-xen #1 System manufacturer System Product Name/P8H67-M > RIP: e030:[<ffffffff803e7451>] [<ffffffff803e7451>] > skb_release_data.part.47+0x61/0xc0 > RSP: e02b:ffff880******d40 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: ffff880********0 RCX: ffff880******000 > RDX: {..RCX.+.0e80..} RSI: 00000000000000** RDI: 00***c**00000000 > RBP: {.....RBX......} R08: {..RCX.-.cff0..} R09: 0000000********* > R10: 000000000000000* R11: {.task.+.0470..} R12: ffff880026a51000 > R13: ffff880********0 R14: ffffc900048****0 R15: 0000000000000001 > FS: 00007f*******7*0(0000) GS:ffff880******000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000***********0 CR3: 0000000******000 CR4: 0000000000042660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process netback/{0/1} (pid: {process-id}, threadinfo ffff880******000, > task ffff880********0) > Stack: > 0000000000000000 {.....RBX......} 0000000000000000 ffffffff803e7511 > {.....RBX......} ffffffffa0***d2c {.....task.....} {thread.+.1e00.} > {thread.+.1db0.} {.R14.-.22a40..} ffffc9000000000* 0000000000000000Hm, that is a pretty neat stack output. Wonder which patch of theirs does that.> Call Trace: > [<ffffffff803e7511>] __kfree_skb+0x11/0x20 > [<ffffffffa0***d2c>] net_rx_action+0x66c/0x9c0 [netbk] > [<ffffffffa0***72a>] netbk_action_thread+0x5a/0x270 [netbk] > [<ffffffff8006438e>] kthread+0x7e/0x90 > [<ffffffff8050f814>] kernel_thread_helper+0x4/0x10 > Code: 48 8b 7c 02 08 e8 90 69 cf ff 8b 95 d0 00 00 00 > 48 8b 8d d8 00 00 00 48 01 ca 0f b7 02 39 c3 7c > d1 f6 42 0c 10 74 1e 48 8b 7a 30 > RIP [<ffffffff803e7451>] skb_release_data.part.47+0x61/0xc0 > RSP <ffff880******d40> > ---[ end trace **************** ]--- > > > Preceeding and subsequent messages don''t seem to be related with GPF, > time gap is from minutes to half an hour or even more. But if this could give > some insight, I will post them, too. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
2012/2/10 Konrad Rzeszutek Wilk <konrad@darnok.org>:
AS>> Dom0 is openSUSE 12.1 on AMD64 (Linux 3.1.0-1.2-xen)
KRW> Do you get the same issue with a pv-ops dom0? So also 3.1, but
from kernel.org?
Unfortunately, I''m not skilled at compiling the kernel myself. I tried
building the newest 3.2.6
with all Xen options (which I could find by "Xen" keyword) enabled,
but the resulting system
didn''t have netback.ko module at all, barely booted, and xm was not
able to communicate
with the hypervisor.
As to vanilla kernel package provided by openSUSE, it is not Xen-enabled.
Meanwhile, an update was released, so I was testing 3.1.9-1.4-xen for
about a week,
though the outcome is still negative.
AS>> Supposedly "offending" DomU is paravirtualized NetBSD 5.1.1
AS>> for AMD64 with recompiled kernel (CARP enabled, no more changes).
KRW> What is CARP?
CARP is Common Address Redundancy Protocol, a "non-patented version of
VRRP",
used for high availability and load balancing. It is supported in
NetBSD kernel (although
user-space implementation, uCARP, exist as well), but is not compiled
by default.
All my work to enable it was a quiet simple recompilation with following config:
include "arch/amd64/conf/XEN3_DOMU"
pseudo-device carp
It looks like GPFs happen only when those load-balancing DomUs are running;
at least, then they are shutoff, no fault is observed in a whole day,
but then they run,
fault can happen even after some minutes of Dom0 uptime, especially while
DomUs are stopping or starting.
KRW> Hm, that is a pretty neat stack output. Wonder which patch of
theirs does that.
It was not verbatim dump, but a generalized text. If you are
interested, here is an excerpt
from /var/log/messages for penultimate GPF (with date and hostname removed):
===[ Preceeding entries ]==
(Those can be absolutely unrelated to GPF, but all 3 recent faults
after kernel update
were happening during VMs shutdown, either massive or singular.)
21:18:33 avahi-daemon[3086]: Withdrawing workstation service for vif10.0.
21:18:33 kernel: [29722.267359] br1: port 10(vif10.0) entering forwarding state
21:18:33 kernel: [29722.267443] br1: port 10(vif10.0) entering disabled state
21:18:33 logger: /etc/xen/scripts/vif-bridge: offline type_if=vif
XENBUS_PATH=backend/vif/10/0
21:18:33 logger: /etc/xen/scripts/vif-bridge: brctl delif br1 vif10.0 failed
21:18:33 logger: /etc/xen/scripts/vif-bridge: ifconfig vif10.0 down failed
21:18:33 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge
offline for vif10.0, bridge br1.
21:18:33 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vkbd/10/0
21:18:33 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/console/10/0
21:18:33 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vfb/10/0
21:18:33 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/10/51712
21:18:33 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vif/10/0
21:18:33 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vbd/10/51712
21:18:53 avahi-daemon[3086]: Withdrawing workstation service for vif9.0.
21:18:53 kernel: [29742.222676] br1: port 9(vif9.0) entering forwarding state
21:18:53 kernel: [29742.222779] br1: port 9(vif9.0) entering disabled state
21:18:53 logger: /etc/xen/scripts/vif-bridge: offline type_if=vif
XENBUS_PATH=backend/vif/9/0
21:18:53 logger: /etc/xen/scripts/vif-bridge: brctl delif br1 vif9.0 failed
21:18:53 logger: /etc/xen/scripts/vif-bridge: ifconfig vif9.0 down failed
21:18:53 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge
offline for vif9.0, bridge br1.
21:18:53 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vkbd/9/0
21:18:53 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/console/9/0
21:18:53 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vfb/9/0
21:18:53 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vif/9/0
21:18:53 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/9/51712
21:18:53 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vbd/9/51712
21:19:13 avahi-daemon[3086]: Withdrawing workstation service for vif8.0.
21:19:13 kernel: [29762.605500] br1: port 8(vif8.0) entering forwarding state
21:19:13 kernel: [29762.605572] br1: port 8(vif8.0) entering disabled state
21:19:13 logger: /etc/xen/scripts/vif-bridge: offline type_if=vif
XENBUS_PATH=backend/vif/8/0
21:19:13 logger: /etc/xen/scripts/vif-bridge: brctl delif br1 vif8.0 failed
21:19:13 logger: /etc/xen/scripts/vif-bridge: ifconfig vif8.0 down failed
21:19:13 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge
offline for vif8.0, bridge br1.
21:19:13 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vkbd/8/0
21:19:13 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/console/8/0
21:19:13 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vfb/8/0
21:19:13 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vif/8/0
21:19:13 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/8/51712
21:19:13 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vbd/8/51712
21:19:26 avahi-daemon[3086]: Withdrawing workstation service for vif7.0.
21:19:26 kernel: [29775.558990] br1: port 7(vif7.0) entering forwarding state
21:19:26 kernel: [29775.559105] br1: port 7(vif7.0) entering disabled state
21:19:26 logger: /etc/xen/scripts/vif-bridge: offline type_if=vif
XENBUS_PATH=backend/vif/7/0
21:19:26 logger: /etc/xen/scripts/vif-bridge: brctl delif br1 vif7.0 failed
21:19:26 logger: /etc/xen/scripts/vif-bridge: ifconfig vif7.0 down failed
21:19:26 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge
offline for vif7.0, bridge br1.
21:19:26 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vkbd/7/0
21:19:26 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/console/7/0
21:19:26 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vfb/7/0
21:19:26 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/7/51712
21:19:26 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vif/7/0
21:19:26 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vbd/7/51712
===[ Fault alert itself ]==
21:19:37 kernel: [29786.610984] general protection fault: 0000 [#1] SMP
21:19:37 kernel: [29786.610992] CPU 0
21:19:37 kernel: [29786.610993] Modules linked in: fuse ip6t_LOG
xt_tcpudp xt_pkttype xt_physdev ipt_LOG xt_limit nfsd lockd nfs_acl
auth_rpcgss sunrpc usbbk netbk blkbk blkback_pagemap blktap domctl
xenbus_be gntdev evtchn af_packet bridge stp llc edd ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables
microcode snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device eeepc_wmi
asus_wmi sparse_keymap rfkill usb_storage ppdev pci_hotplug uas
8250_pnp sg i2c_i801 sr_mod wmi parport_pc snd_hda_codec_hdmi
snd_hda_codec_realtek parport r8169 pcspkr mei(C) 8250 serial_core
iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm snd_timer snd soundcore snd_page_alloc usbhid hid dm_mod
linear btrfs zlib_deflate lzo_compress i915 drm_kms_helper drm
i2c_algo_bit ehci_hcd usbcor
21:19:37 kernel: e i2c_core button video processor thermal_sys hwmon
xenblk cdrom xennet ata_generic pata_via
21:19:37 kernel: [29786.611076]
21:19:37 kernel: [29786.611078] Pid: 3461, comm: netback/1 Tainted: G
C 3.1.9-1.4-xen #1 System manufacturer System Product
Name/P8H67-M
21:19:37 kernel: [29786.611084] RIP: e030:[<ffffffff803e7f81>]
[<ffffffff803e7f81>] skb_release_data.part.46+0x61/0xc0
21:19:37 kernel: [29786.611092] RSP: e02b:ffff8802c8339d40 EFLAGS: 00010202
21:19:37 kernel: [29786.611095] RAX: 0000000000000000 RBX:
ffff88007ccf39c0 RCX: ffff8800e70db000
21:19:37 kernel: [29786.611098] RDX: ffff8800e70dbe80 RSI:
000000000000001f RDI: 0028f49c00000000
21:19:37 kernel: [29786.611101] RBP: ffff88007ccf39c0 R08:
ffff8800e70d0010 R09: 000000000000004e
21:19:37 kernel: [29786.611103] R10: 0000000000000003 R11:
ffff8802d0074c30 R12: ffff8802bb22f000
21:19:37 kernel: [29786.611106] R13: ffff88027ea382c0 R14:
ffffc900048cb960 R15: 0000000000000001
21:19:37 kernel: [29786.611114] FS: 00007f303913f700(0000)
GS:ffff8802de3c2000(0000) knlGS:0000000000000000
21:19:37 kernel: [29786.611117] CS: e033 DS: 0000 ES: 0000 CR0:
000000008005003b
21:19:37 kernel: [29786.611119] CR2: 00000000006b6e30 CR3:
00000002c93fb000 CR4: 0000000000042660
21:19:37 kernel: [29786.611126] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
21:19:37 kernel: [29786.611131] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
21:19:37 kernel: [29786.611136] Process netback/1 (pid: 3461,
threadinfo ffff8802c8338000, task ffff8802d00747c0)
21:19:37 kernel: [29786.611140] Stack:
21:19:37 kernel: [29786.611144] 0000000000000000 ffff88007ccf39c0
0000000000000000 ffffffff803e8041
21:19:37 kernel: [29786.611151] ffff88007ccf39c0 ffffffffa059fd3c
ffff8802d00747c0 ffff8802c8339e00
21:19:37 kernel: [29786.611157] ffff8802c8339db0 ffffc900048a8f20
ffffc90000000002 0000000000000000
21:19:37 kernel: [29786.611164] Call Trace:
21:19:37 kernel: [29786.611173] [<ffffffff803e8041>]
__kfree_skb+0x11/0x20
21:19:37 kernel: [29786.611182] [<ffffffffa059fd3c>]
net_rx_action+0x66c/0x9c0 [netbk]
21:19:37 kernel: [29786.611201] [<ffffffffa05a173a>]
netbk_action_thread+0x5a/0x270 [netbk]
21:19:37 kernel: [29786.611211] [<ffffffff8006444e>] kthread+0x7e/0x90
21:19:37 kernel: [29786.611220] [<ffffffff80510d24>]
kernel_thread_helper+0x4/0x10
21:19:37 kernel: [29786.611225] Code: 48 8b 7c 02 08 e8 a0 60 cf ff 8b
95 d0 00 00 00 48 8b 8d d8 00 00 00 48 01 ca 0f b7 02 39 c3 7c d1 f6
42 0c 10 74 1e 48 8b 7a 30
21:19:37 kernel: [29786.611265] RIP [<ffffffff803e7f81>]
skb_release_data.part.46+0x61/0xc0
21:19:37 kernel: [29786.611271] RSP <ffff8802c8339d40>
21:19:37 kernel: [29786.671491] ---[ end trace 6875b40b2a9f1d46 ]---
(Note that numbers after "+" in call trace did not changed after
kernel update,
as compared to previously posted, although absolute addresses did changed.)
===[ Subsequent entries ]==
(Again, sometimes those can be absolutely unrelated to GPF, and happen
minutes after.)
21:19:38 avahi-daemon[3086]: Withdrawing workstation service for vif6.0.
21:19:38 kernel: [29787.904571] br1: port 6(vif6.0) entering forwarding state
21:19:38 kernel: [29787.904649] br1: port 6(vif6.0) entering disabled state
21:19:38 logger: /etc/xen/scripts/vif-bridge: offline type_if=vif
XENBUS_PATH=backend/vif/6/0
21:19:38 logger: /etc/xen/scripts/vif-bridge: brctl delif br1 vif6.0 failed
21:19:38 logger: /etc/xen/scripts/vif-bridge: ifconfig vif6.0 down failed
21:19:38 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge
offline for vif6.0, bridge br1.
21:19:39 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vkbd/6/0
21:19:39 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/console/6/0
21:19:39 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vfb/6/0
21:19:39 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/6/51712
21:19:39 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vif/6/0
21:19:39 logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vbd/6/51712
21:19:58 kernel: [29807.860561] br1: port 5(vif5.0) entering forwarding state
On Wed, Feb 15, 2012 at 07:29:54PM +0300, Anton Samsonov wrote:> 2012/2/10 Konrad Rzeszutek Wilk <konrad@darnok.org>: > > AS>> Dom0 is openSUSE 12.1 on AMD64 (Linux 3.1.0-1.2-xen) > KRW> Do you get the same issue with a pv-ops dom0? So also 3.1, but > from kernel.org? > > Unfortunately, I''m not skilled at compiling the kernel myself. I tried > building the newest 3.2.6 > with all Xen options (which I could find by "Xen" keyword) enabled, > but the resulting system > didn''t have netback.ko module at all, barely booted, and xm was not > able to communicate > with the hypervisor. >See this wiki page for all the common troubleshooting steps when xend does not start: http://wiki.xen.org/wiki/Xen_Common_Problems#Starting_xend_fails.3F About compiling the dom0 kernel, see this wiki page: http://wiki.xen.org/wiki/XenParavirtOps#Configure_kernel_for_dom0_support Hopefully those help.. -- Pasi
2012/2/15 Pasi Kärkkäinen <pasik@iki.fi>:
AS>> Unfortunately, I''m not skilled at compiling the kernel
myself.
AS>> I tried building the newest 3.2.6 with all Xen options enabled,
AS>> but the resulting system didn''t have netback.ko module at
all,
AS>> barely booted, and xm was not able to communicate with VMM.
PK> About compiling the dom0 kernel, see this wiki page:
PK> http://wiki.xen.org/wiki/XenParavirtOps#Configure_kernel_for_dom0_support
Thanks, it looks like I just messed some "y" with "m" and
vice versa
("m" is presented as a meaningless bullet in GUI). By the way, that
how-to
contains dubious lines for CONFIG_XEN_DEV_EVTCHN and CONFIG_XEN_GNTDEV.
Well, now the system boots more eagerly, although the kernel still
seems to be slightly incompatible with distro''s environment and my
hardware.
But at least xend is now responding and is able to run DomUs as usual.
I started and stopped all the swarm of VMs several times (without letting them
to run for some time), and observed no GPF. But instead of this I get
screen garbling: while a DomU is starting or stopping, the whole graphical
desktop is sometimes painted with either black or not-so-random garbage,
and even mouse pointer can become garbled; I have to move / resize windows
to get them repainted. Network connectivity between Dom0 and [subsequently
started] DomUs does not break though.
On one hand, I am not sure whether the video driver is not to be
blamed for glitches,
because graphics already does not work as usual: it is not hardware-accelerated
with my custom kernel (while it is with stock kernel), and the screen is garbled
on Xorg startup, before login promt is displayed. On the other hand, this is not
in any way normal, as Xen operations must not interfere with Dom0''s
desktop
(or was it direct VRAM corruption?).
This happens even when "suspicious" domains (NetBSD with CARP) are not
started: on a freshly booted Dom0, just having 4 essential DomUs is enough
to get that screen garbling when shutting down 1 or 2 of them for the
first time.
But when I return to stock kernel, I can run a dozen of such DomUs (including
those NetBSD load-balancers), starting and stopping them many times
without a problem. Recently, no GPF occurred when only 1 out of 2 balancers
is started, or none of them started at all; or it just needs much more uptime
to accumulate memory corruption for a GPF.
On Tue, Feb 21, 2012 at 06:06:14PM +0300, Anton Samsonov wrote:> 2012/2/15 Pasi Kärkkäinen <pasik@iki.fi>: > > AS>> Unfortunately, I''m not skilled at compiling the kernel myself. > AS>> I tried building the newest 3.2.6 with all Xen options enabled, > AS>> but the resulting system didn''t have netback.ko module at all, > AS>> barely booted, and xm was not able to communicate with VMM. > > PK> About compiling the dom0 kernel, see this wiki page: > PK> http://wiki.xen.org/wiki/XenParavirtOps#Configure_kernel_for_dom0_support > > Thanks, it looks like I just messed some "y" with "m" and vice versa > ("m" is presented as a meaningless bullet in GUI). By the way, that how-to > contains dubious lines for CONFIG_XEN_DEV_EVTCHN and CONFIG_XEN_GNTDEV. > > Well, now the system boots more eagerly, although the kernel still > seems to be slightly incompatible with distro''s environment and my hardware. > But at least xend is now responding and is able to run DomUs as usual. > > > I started and stopped all the swarm of VMs several times (without letting them > to run for some time), and observed no GPF. But instead of this I get > screen garbling: while a DomU is starting or stopping, the whole graphical > desktop is sometimes painted with either black or not-so-random garbage, > and even mouse pointer can become garbled; I have to move / resize windows > to get them repainted. Network connectivity between Dom0 and [subsequently > started] DomUs does not break though. > > On one hand, I am not sure whether the video driver is not to be > blamed for glitches, > because graphics already does not work as usual: it is not hardware-accelerated > with my custom kernel (while it is with stock kernel), and the screen is garbled > on Xorg startup, before login promt is displayed. On the other hand, this is notSo... I am curious, what graphic card do you have and do you get any of these Red Hat BZs? RH BZ# 742032, 787403, and 745574?> in any way normal, as Xen operations must not interfere with Dom0''s desktop > (or was it direct VRAM corruption?).It is complicated. There is a bug in 3.2 when using radeon or nouveau for a lengthy time of period that ends up "corrupting" memory. The workaround is to provide ''nopat'' on the argument line.> > This happens even when "suspicious" domains (NetBSD with CARP) are not > started: on a freshly booted Dom0, just having 4 essential DomUs is enough > to get that screen garbling when shutting down 1 or 2 of them for the > first time.Hmm, that is weird. Never seen that before. Can you include more details on your machine?> > > But when I return to stock kernel, I can run a dozen of such DomUs (including > those NetBSD load-balancers), starting and stopping them many times > without a problem. Recently, no GPF occurred when only 1 out of 2 balancers > is started, or none of them started at all; or it just needs much more uptime > to accumulate memory corruption for a GPF. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
2012/2/21 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: AS>> With custom-built kernel, I didn''t yet see any GPF, but screen garbling AS>> happens almost every time when a DomU is starting or stopping: AS>> the whole graphical is painted with either black or not-so-random garbage. KRW> So... I am curious, what graphic card do you have and do you get any of KRW> these Red Hat BZs? RH BZ# 742032, 787403, and 745574? KRW> There is a bug in 3.2 when using radeon or nouveau for a lengthy time KRW> that ends up "corrupting" memory. The workaround is ''nopat'' kernel arg. My idea is that my custom-built kernel can''t be considered a trusted proving ground, as it is of very low quality. Video issue is just the most obvious example. Another indisputable example is how the Dom0 reboots: instead of simple CPU restart, the whole system goes into soft-off for several seconds, then wakes back. When I boot this kernel in bare-metal mode (without Xen VMM), none of those happens: GUI is accelerated (at least in 2D; I don''t use OpenGL desktop), screen is not garbled at login and logout dialogs, system reboots quickly. Anyway, I tried your solution with "nopat". It didn''t worked: with 4 DomUs running for a minute and then shut down in reverse order (4th, 3rd, 2nd, 1st), the screen went black right between the 3rd VM was completely shut and 2nd VM was requested to shut. There was no "lengthy time" of Dom0 running, my video adapter is neither nVidia nor ATi, but an integrated Intel HD Graphics 2000 using i915 driver, and I see no similarities to the Red Hat bugs mentioned by you. KRW> Can you include more details on your machine? My guess is that it is not just my hardware that causes GPF, but either a bug in netback module, or a compiler issue for specific combination of Xen (and/or particularly netback) together with openSUSE build technology. As an example of the latter, look again at the Novell BZ #727081 mentioned in the original post — the comment #30 says: "The compiler apparently makes use of the 128-byte area called ''red zone'' in the ABI, and this is incompatible with xc_cpuid_x86.c:cpuid() using pushes and pops around the cpuid instruction". The consequence is that, on some machines, libxenguest segfaults when you try to start a DomU. With Core i7-920 there is no problem, but with Core i5-2300 I faced that issue, and wonder whether the same incompatibility can take place in netback module. I though the traceback gives some hints on where to debug. My specs are: MB: Asus P8H67-M (Intel H67 chipset) CPU: Intel Core i5 model 2300 (Turbo mode disabled) RAM: 12GB DDR3-1333 non-ECC (recently checked by MemTest86+ 4.20) Video: Intel HD Graphics 2000 (integrated into CPU) Network: dedicated soft-bridge for most DomUs, + bridged Realtek RTL8111E for gateway DomU (not with CARP)
On Wed, Feb 22, 2012 at 03:17:24PM +0300, Anton Samsonov wrote:> 2012/2/21 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>: > > AS>> With custom-built kernel, I didn't yet see any GPF, but screen garbling > AS>> happens almost every time when a DomU is starting or stopping: > AS>> the whole graphical is painted with either black or not-so-random garbage. > > KRW> So... I am curious, what graphic card do you have and do you get any of > KRW> these Red Hat BZs? RH BZ# 742032, 787403, and 745574? > KRW> There is a bug in 3.2 when using radeon or nouveau for a lengthy time > KRW> that ends up "corrupting" memory. The workaround is 'nopat' kernel arg. > > My idea is that my custom-built kernel can't be considered a trusted > proving ground, > as it is of very low quality. Video issue is just the most obvious > example. Another > indisputable example is how the Dom0 reboots: instead of simple CPU restart, > the whole system goes into soft-off for several seconds, then wakes back. > When I boot this kernel in bare-metal mode (without Xen VMM), none of those > happens: GUI is accelerated (at least in 2D; I don't use OpenGL desktop), > screen is not garbled at login and logout dialogs, system reboots quickly. > > Anyway, I tried your solution with "nopat". It didn't worked: with 4 > DomUs running > for a minute and then shut down in reverse order (4th, 3rd, 2nd, 1st), > the screen > went black right between the 3rd VM was completely shut and 2nd VM was > requested to shut. There was no "lengthy time" of Dom0 running, my video adapter > is neither nVidia nor ATi, but an integrated Intel HD Graphics 2000 > using i915 driver, > and I see no similarities to the Red Hat bugs mentioned by you. > > > KRW> Can you include more details on your machine? > > My guess is that it is not just my hardware that causes GPF, but either > a bug in netback module, or a compiler issue for specific combination of Xen > (and/or particularly netback) together with openSUSE build technology. > > As an example of the latter, look again at the Novell BZ #727081 mentioned > in the original post — the comment #30 says: "The compiler apparently makes use > of the 128-byte area called 'red zone' in the ABI, and this is incompatible > with xc_cpuid_x86.c:cpuid() using pushes and pops around the cpuid instruction". > The consequence is that, on some machines, libxenguest segfaults when you > try to start a DomU. With Core i7-920 there is no problem, but with Core i5-2300 > I faced that issue, and wonder whether the same incompatibility can take place > in netback module. I though the traceback gives some hints on where to debug. > > My specs are: > MB: Asus P8H67-M (Intel H67 chipset) > CPU: Intel Core i5 model 2300 (Turbo mode disabled) > RAM: 12GB DDR3-1333 non-ECC (recently checked by MemTest86+ 4.20)Do you have CONFIG_DMAR enabled in your config?> Video: Intel HD Graphics 2000 (integrated into CPU) > Network: dedicated soft-bridge for most DomUs, > + bridged Realtek RTL8111E for gateway DomU (not with CARP)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> Anton Samsonov <avscomputing@gmail.com> 02/22/12 11:46 PM >>> >As an example of the latter, look again at the Novell BZ #727081 mentioned >in the original post — the comment #30 says: "The compiler apparently makes use >of the 128-byte area called 'red zone' in the ABI, and this is incompatible >with xc_cpuid_x86.c:cpuid() using pushes and pops around the cpuid instruction". >The consequence is that, on some machines, libxenguest segfaults when you >try to start a DomU. With Core i7-920 there is no problem, but with Core i5-2300 >I faced that issue, and wonder whether the same incompatibility can take place >in netback module. I though the traceback gives some hints on where to debug.That's impossible - use of the red zone is disallowed in the kernel via compiler option. And the problem you cite was a source code problem, not a compiler one (the fact that it had an effect only on some systems was attributed to the function in question only getting run when a specific hardware feature was available iirc). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel