thr3ads.net - xen discuss - soft lockups with CentOS 5 domU [May 2008]

If this information is useful, please help other people find it:
Share via:

Eric Sproul

2008-May-29 14:41 UTC

soft lockups with CentOS 5 domU

Hi,
After recently installing a CentOS 5 domU (PV) on an snv_89 dom0, the guest
seems rather unstable.  Yesterday I got two core dumps in /var/xen/dump, and
noticed in the guest''s console a couple of soft lockups:

BUG: soft lockup detected on CPU#1!

Call Trace:
 <IRQ>  [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7
 [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2
 [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60
 [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105
 [<ffffffff80288753>] _local_bh_enable+0x61/0xc5
 [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5
 [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0
 [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30
 [<ffffffff8044936f>] inet6_hash_connect+0xcb/0x2ea
 [<ffffffff88115fb6>] :ipv6:tcp_v6_connect+0x530/0x6f6
 [<ffffffff802335d9>] lock_sock+0xa7/0xb2
 [<ffffffff80258914>] inet_stream_connect+0x94/0x236
 [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
 [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
 [<ffffffff803f6405>] sys_connect+0x7e/0xae
 [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180
 [<ffffffff8025d2f1>] tracesys+0xa7/0xb2

BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ>  [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7
 [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2
 [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60
 [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105
 [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5
 [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0
 [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30
 [<ffffffff8041e10e>] inet_hash_connect+0xc8/0x41c
 [<ffffffff80427780>] tcp_v4_connect+0x372/0x69f
 [<ffffffff80230882>] sock_recvmsg+0x101/0x120
 [<ffffffff88115c4a>] :ipv6:tcp_v6_connect+0x1c4/0x6f6
 [<ffffffff80219c31>] vsnprintf+0x559/0x59e
 [<ffffffff802335d9>] lock_sock+0xa7/0xb2
 [<ffffffff8025b5fe>] cache_alloc_refill+0x13c/0x4ba
 [<ffffffff80258914>] inet_stream_connect+0x94/0x236
 [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
 [<ffffffff803f6405>] sys_connect+0x7e/0xae
 [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180
 [<ffffffff8025d2f1>] tracesys+0xa7/0xb2

I''ve been googling around for answers, but the Redhat bug most
frequently linked
seems to relate to live migration, which I''ve not done.  This guest was
installed directly using virt-install.

Right now I cannot get into my guest-- it''s not responding on the
network or the
console.  A ''virsh shutdown'' looks like it worked, but the
console remains
unresponsive.  It looks like it''s just spinning, based on the Time
value in ''xm
list'':

# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0 12051     8     r-----   1497.0
zimbra                                       5  4096     2     r-----  67931.2

The last time this happened I had to reboot the whole server, which seems
drastic.  Is there a better way to regain control over the guest?

I also need to figure out how to fix the soft lockups.  I''m running the
latest
available mainline CentOS kernel via yum update.  My research so far seems to
indicate that this occurs when an IRQ takes too long to respond.  Maybe I need
to pin the guest to particular CPUs, instead of letting dom0 dynamically assign
them?  Any advice in this area would be appreciated.

Thanks,
Eric

John Levon

2008-May-29 15:05 UTC

head link

Re: soft lockups with CentOS 5 domU

On Thu, May 29, 2008 at 10:41:45AM -0400, Eric Sproul wrote:
> After recently installing a CentOS 5 domU (PV) on an snv_89 dom0, the guest
> seems rather unstable.  Yesterday I got two core dumps in /var/xen/dump,
and
> noticed in the guest''s console a couple of soft lockups:
Looks like a Linux bug.
> # xm list
> Name                                        ID   Mem VCPUs      State  
Time(s)
> Domain-0                                     0 12051     8     r-----  
1497.0
> zimbra                                       5  4096     2     r----- 
67931.2
> 
> The last time this happened I had to reboot the whole server, which seems
> drastic.  Is there a better way to regain control over the guest?
xm destroy zimbra

regards
john

James Cornell

2008-May-29 16:08 UTC

head link

Re: soft lockups with CentOS 5 domU

These calls are almost all Linux kernel calls.  Bug(s) in their kernel  
PV drivers perhaps.

James
On May 29, 2008, at 9:41 AM, Eric Sproul wrote:
> Hi,
> After recently installing a CentOS 5 domU (PV) on an snv_89 dom0,  
> the guest
> seems rather unstable.  Yesterday I got two core dumps in /var/xen/ 
> dump, and
> noticed in the guest''s console a couple of soft lockups:
>
> BUG: soft lockup detected on CPU#1!
>
> Call Trace:
> <IRQ>  [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7
> [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2
> [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60
> [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105
> [<ffffffff80288753>] _local_bh_enable+0x61/0xc5
> [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5
> [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0
> [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
> <EOI>  [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30
> [<ffffffff8044936f>] inet6_hash_connect+0xcb/0x2ea
> [<ffffffff88115fb6>] :ipv6:tcp_v6_connect+0x530/0x6f6
> [<ffffffff802335d9>] lock_sock+0xa7/0xb2
> [<ffffffff80258914>] inet_stream_connect+0x94/0x236
> [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
> [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
> [<ffffffff803f6405>] sys_connect+0x7e/0xae
> [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180
> [<ffffffff8025d2f1>] tracesys+0xa7/0xb2
>
> BUG: soft lockup detected on CPU#0!
>
> Call Trace:
> <IRQ>  [<ffffffff802aae32>] softlockup_tick+0xd5/0xe7
> [<ffffffff8026cb4a>] timer_interrupt+0x396/0x3f2
> [<ffffffff80210afe>] handle_IRQ_event+0x2d/0x60
> [<ffffffff802ab1ba>] __do_IRQ+0xa4/0x105
> [<ffffffff8026a90e>] do_IRQ+0xe7/0xf5
> [<ffffffff80396a89>] evtchn_do_upcall+0x86/0xe0
> [<ffffffff8025d8ce>] do_hypervisor_callback+0x1e/0x2c
> <EOI>  [<ffffffff802619bd>] .text.lock.spinlock+0x2/0x30
> [<ffffffff8041e10e>] inet_hash_connect+0xc8/0x41c
> [<ffffffff80427780>] tcp_v4_connect+0x372/0x69f
> [<ffffffff80230882>] sock_recvmsg+0x101/0x120
> [<ffffffff88115c4a>] :ipv6:tcp_v6_connect+0x1c4/0x6f6
> [<ffffffff80219c31>] vsnprintf+0x559/0x59e
> [<ffffffff802335d9>] lock_sock+0xa7/0xb2
> [<ffffffff8025b5fe>] cache_alloc_refill+0x13c/0x4ba
> [<ffffffff80258914>] inet_stream_connect+0x94/0x236
> [<ffffffff8020ab49>] kmem_cache_alloc+0x62/0x6d
> [<ffffffff803f6405>] sys_connect+0x7e/0xae
> [<ffffffff802a84a6>] audit_syscall_entry+0x14d/0x180
> [<ffffffff8025d2f1>] tracesys+0xa7/0xb2
>
> I''ve been googling around for answers, but the Redhat bug most  
> frequently linked
> seems to relate to live migration, which I''ve not done.  This
guest
> was
> installed directly using virt-install.
>
> Right now I cannot get into my guest-- it''s not responding on the
> network or the
> console.  A ''virsh shutdown'' looks like it worked, but
the console
> remains
> unresponsive.  It looks like it''s just spinning, based on the Time
> value in ''xm
> list'':
>
> # xm list
> Name                                        ID   Mem VCPUs       
> State   Time(s)
> Domain-0                                     0 12051     8      
> r-----   1497.0
> zimbra                                       5  4096     2      
> r-----  67931.2
>
> The last time this happened I had to reboot the whole server, which  
> seems
> drastic.  Is there a better way to regain control over the guest?
>
> I also need to figure out how to fix the soft lockups.  I''m
running
> the latest
> available mainline CentOS kernel via yum update.  My research so far  
> seems to
> indicate that this occurs when an IRQ takes too long to respond.   
> Maybe I need
> to pin the guest to particular CPUs, instead of letting dom0  
> dynamically assign
> them?  Any advice in this area would be appreciated.
>
> Thanks,
> Eric
> _______________________________________________
> xen-discuss mailing list
> xen-discuss@opensolaris.org

Jürgen Keil

2008-May-29 16:16 UTC

head link

Re: soft lockups with CentOS 5 domU

Eric Sproul wrote:
 > # xm list
> Name                                        ID   Mem VCPUs      State  
Time(s)
> Domain-0                                     0 12051     8      r-----  
1497.0
> zimbra                                       5  4096     2      r----- 
67931.2
Did you already try to use the zimbra domU with 
only one VCPU?  Does it lock up when only one
VCPU is used for the linux domU?
 
 
This message posted from opensolaris.org

Eric Sproul

2008-May-29 16:26 UTC

head link

Re: soft lockups with CentOS 5 domU

Jürgen Keil wrote:> Did you already try to use the zimbra domU with 
> only one VCPU?  Does it lock up when only one
> VCPU is used for the linux domU?
I have not tried that yet.  I''m currently reconfiguring dom0 to use
only 2GB (my
guests are on zvols) and 2 pinned vpcus, as recommended in another thread. 
I''m
also pinning the "zimbra" guest''s 2 vcpus, keeping them
separate from dom0.

I need more than 1 vcpu in this guest, so even if it eliminates the problem, it
won''t be a good solution for my needs.  Hopefully the pinning and the
dom0
limits will take care of it, otherwise I''ll go bug CentOS people.  :)

Thanks,
Eric

Eric Sproul

2008-May-29 18:23 UTC

head link

Re: soft lockups with CentOS 5 domU

Eric Sproul wrote:> I have not tried that yet.  I''m currently reconfiguring dom0 to
use only 2GB (my
> guests are on zvols) and 2 pinned vpcus, as recommended in another thread. 
I''m
> also pinning the "zimbra" guest''s 2 vcpus, keeping them
separate from dom0.
Had some interesting results from trying to pin vcpus for dom0.  I initially
used ''xm vcpu-pin'' for Domain-0, which appeared to work, but
did not persist
across a reboot.  Then I found the Xen boot options, so now I do:

kernel$ /boot/$ISADIR/xen.gz dom0_mem=4G dom0_max_vcpus=2 dom0_vcpus_pin=true

So far, so good.  I haven''t (yet) seen any more soft lockups in the
guest, but
time will tell.  :)

Eric

xen discuss - May 2008 - soft lockups with CentOS 5 domU

soft lockups with CentOS 5 domU

Re: soft lockups with CentOS 5 domU

Re: soft lockups with CentOS 5 domU

Re: soft lockups with CentOS 5 domU

Re: soft lockups with CentOS 5 domU

Re: soft lockups with CentOS 5 domU