Hi, After recently installing a CentOS 5 domU (PV) on an snv_89 dom0, the guest seems rather unstable. Yesterday I got two core dumps in /var/xen/dump, and noticed in the guest''s console a couple of soft lockups: BUG: soft lockup detected on CPU#1! Call Trace: <IRQ> [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7 [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2 [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60 [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105 [<ffffffff80288753>] _local_bh_enable+0x61/0xc5 [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5 [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0 [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30 [<ffffffff8044936f>] inet6_hash_connect+0xcb/0x2ea [<ffffffff88115fb6>] :ipv6:tcp_v6_connect+0x530/0x6f6 [<ffffffff802335d9>] lock_sock+0xa7/0xb2 [<ffffffff80258914>] inet_stream_connect+0x94/0x236 [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d [<ffffffff803f6405>] sys_connect+0x7e/0xae [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180 [<ffffffff8025d2f1>] tracesys+0xa7/0xb2 BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7 [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2 [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60 [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105 [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5 [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0 [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30 [<ffffffff8041e10e>] inet_hash_connect+0xc8/0x41c [<ffffffff80427780>] tcp_v4_connect+0x372/0x69f [<ffffffff80230882>] sock_recvmsg+0x101/0x120 [<ffffffff88115c4a>] :ipv6:tcp_v6_connect+0x1c4/0x6f6 [<ffffffff80219c31>] vsnprintf+0x559/0x59e [<ffffffff802335d9>] lock_sock+0xa7/0xb2 [<ffffffff8025b5fe>] cache_alloc_refill+0x13c/0x4ba [<ffffffff80258914>] inet_stream_connect+0x94/0x236 [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d [<ffffffff803f6405>] sys_connect+0x7e/0xae [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180 [<ffffffff8025d2f1>] tracesys+0xa7/0xb2 I''ve been googling around for answers, but the Redhat bug most frequently linked seems to relate to live migration, which I''ve not done. This guest was installed directly using virt-install. Right now I cannot get into my guest-- it''s not responding on the network or the console. A ''virsh shutdown'' looks like it worked, but the console remains unresponsive. It looks like it''s just spinning, based on the Time value in ''xm list'': # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 12051 8 r----- 1497.0 zimbra 5 4096 2 r----- 67931.2 The last time this happened I had to reboot the whole server, which seems drastic. Is there a better way to regain control over the guest? I also need to figure out how to fix the soft lockups. I''m running the latest available mainline CentOS kernel via yum update. My research so far seems to indicate that this occurs when an IRQ takes too long to respond. Maybe I need to pin the guest to particular CPUs, instead of letting dom0 dynamically assign them? Any advice in this area would be appreciated. Thanks, Eric
On Thu, May 29, 2008 at 10:41:45AM -0400, Eric Sproul wrote:> After recently installing a CentOS 5 domU (PV) on an snv_89 dom0, the guest > seems rather unstable. Yesterday I got two core dumps in /var/xen/dump, and > noticed in the guest''s console a couple of soft lockups:Looks like a Linux bug.> # xm list > Name ID Mem VCPUs State Time(s) > Domain-0 0 12051 8 r----- 1497.0 > zimbra 5 4096 2 r----- 67931.2 > > The last time this happened I had to reboot the whole server, which seems > drastic. Is there a better way to regain control over the guest?xm destroy zimbra regards john
These calls are almost all Linux kernel calls. Bug(s) in their kernel PV drivers perhaps. James On May 29, 2008, at 9:41 AM, Eric Sproul wrote:> Hi, > After recently installing a CentOS 5 domU (PV) on an snv_89 dom0, > the guest > seems rather unstable. Yesterday I got two core dumps in /var/xen/ > dump, and > noticed in the guest''s console a couple of soft lockups: > > BUG: soft lockup detected on CPU#1! > > Call Trace: > <IRQ> [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7 > [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2 > [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60 > [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105 > [<ffffffff80288753>] _local_bh_enable+0x61/0xc5 > [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5 > [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0 > [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c > <EOI> [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30 > [<ffffffff8044936f>] inet6_hash_connect+0xcb/0x2ea > [<ffffffff88115fb6>] :ipv6:tcp_v6_connect+0x530/0x6f6 > [<ffffffff802335d9>] lock_sock+0xa7/0xb2 > [<ffffffff80258914>] inet_stream_connect+0x94/0x236 > [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d > [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d > [<ffffffff803f6405>] sys_connect+0x7e/0xae > [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180 > [<ffffffff8025d2f1>] tracesys+0xa7/0xb2 > > BUG: soft lockup detected on CPU#0! > > Call Trace: > <IRQ> [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7 > [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2 > [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60 > [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105 > [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5 > [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0 > [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c > <EOI> [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30 > [<ffffffff8041e10e>] inet_hash_connect+0xc8/0x41c > [<ffffffff80427780>] tcp_v4_connect+0x372/0x69f > [<ffffffff80230882>] sock_recvmsg+0x101/0x120 > [<ffffffff88115c4a>] :ipv6:tcp_v6_connect+0x1c4/0x6f6 > [<ffffffff80219c31>] vsnprintf+0x559/0x59e > [<ffffffff802335d9>] lock_sock+0xa7/0xb2 > [<ffffffff8025b5fe>] cache_alloc_refill+0x13c/0x4ba > [<ffffffff80258914>] inet_stream_connect+0x94/0x236 > [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d > [<ffffffff803f6405>] sys_connect+0x7e/0xae > [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180 > [<ffffffff8025d2f1>] tracesys+0xa7/0xb2 > > I''ve been googling around for answers, but the Redhat bug most > frequently linked > seems to relate to live migration, which I''ve not done. This guest > was > installed directly using virt-install. > > Right now I cannot get into my guest-- it''s not responding on the > network or the > console. A ''virsh shutdown'' looks like it worked, but the console > remains > unresponsive. It looks like it''s just spinning, based on the Time > value in ''xm > list'': > > # xm list > Name ID Mem VCPUs > State Time(s) > Domain-0 0 12051 8 > r----- 1497.0 > zimbra 5 4096 2 > r----- 67931.2 > > The last time this happened I had to reboot the whole server, which > seems > drastic. Is there a better way to regain control over the guest? > > I also need to figure out how to fix the soft lockups. I''m running > the latest > available mainline CentOS kernel via yum update. My research so far > seems to > indicate that this occurs when an IRQ takes too long to respond. > Maybe I need > to pin the guest to particular CPUs, instead of letting dom0 > dynamically assign > them? Any advice in this area would be appreciated. > > Thanks, > Eric > _______________________________________________ > xen-discuss mailing list > xen-discuss@opensolaris.org
Eric Sproul wrote:> # xm list > Name ID Mem VCPUs State Time(s) > Domain-0 0 12051 8 r----- 1497.0 > zimbra 5 4096 2 r----- 67931.2Did you already try to use the zimbra domU with only one VCPU? Does it lock up when only one VCPU is used for the linux domU? This message posted from opensolaris.org
Jürgen Keil wrote:> Did you already try to use the zimbra domU with > only one VCPU? Does it lock up when only one > VCPU is used for the linux domU?I have not tried that yet. I''m currently reconfiguring dom0 to use only 2GB (my guests are on zvols) and 2 pinned vpcus, as recommended in another thread. I''m also pinning the "zimbra" guest''s 2 vcpus, keeping them separate from dom0. I need more than 1 vcpu in this guest, so even if it eliminates the problem, it won''t be a good solution for my needs. Hopefully the pinning and the dom0 limits will take care of it, otherwise I''ll go bug CentOS people. :) Thanks, Eric
Eric Sproul wrote:> I have not tried that yet. I''m currently reconfiguring dom0 to use only 2GB (my > guests are on zvols) and 2 pinned vpcus, as recommended in another thread. I''m > also pinning the "zimbra" guest''s 2 vcpus, keeping them separate from dom0.Had some interesting results from trying to pin vcpus for dom0. I initially used ''xm vcpu-pin'' for Domain-0, which appeared to work, but did not persist across a reboot. Then I found the Xen boot options, so now I do: kernel$ /boot/$ISADIR/xen.gz dom0_mem=4G dom0_max_vcpus=2 dom0_vcpus_pin=true So far, so good. I haven''t (yet) seen any more soft lockups in the guest, but time will tell. :) Eric