thr3ads.net - Xen devel - [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Graham, Simon

2007-Apr-21 21:45 UTC

[Xen-devel] 3.0.4: Soft lockup in netfront in SMP build

Just run into a (real) soft lockup running 3.0.4 - stack is at the end of this,
but basically:

. network_open acquires the rx spin lock with spin_lock() and then checks for
  work on the queue and calls (I think) netif_rx_schedule with the lock still
held which
  can call into the hypervisor.
. An interrupt is delivered to the bottom half of netfront which ends up calling
netif_poll
  which blocks attempting to acquire the rx spin lock.

Oops! 

I see from the unstable tree that this code was recently modified to use
spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code -
clearly we cant merge this changeset into 3.0.4.

I haven''t looked too closely at all of the code yet, but I''m
wondering if a judicious change of spin_lock to spin_lock_bh in netfront would
be the best approach?

Simon

BUG: soft lockup detected on CPU#0!
Pid: 2973, comm:            ifconfig
EIP: 0061:[<c035868a>] CPU: 0
EIP is at _spin_lock+0xa/0x20
 EFLAGS: 00000286    Tainted: GF      (2.6.16.38-xen #1)
EAX: c13b840c EBX: c13b8000 ECX: 00000040 EDX: cf783dc0
ESI: 00000000 EDI: c1233274 EBP: cf783d28 DS: 007b ES: 007b
CR0: 8005003b CR2: b7eea198 CR3: 103f2000 CR4: 00000660
 [<c010595d>] show_trace+0xd/0x10
 [<c010344f>] show_regs+0x18f/0x1c0
 [<c01491f4>] softlockup_tick+0xe4/0x100
 [<c012c713>] do_timer+0x43/0xf0
 [<c010936a>] timer_interrupt+0x1fa/0x670
 [<c01494da>] handle_IRQ_event+0x6a/0xb0
 [<c01495a9>] __do_IRQ+0x89/0xf0
 [<c0106ee8>] do_IRQ+0x38/0x70
 [<c028dee0>] evtchn_do_upcall+0x90/0x110
 [<c01056a1>] hypervisor_callback+0x3d/0x48
 [<c02a5db2>] netif_poll+0x32/0x630
 [<c02dace8>] net_rx_action+0x148/0x230
 [<c01279d5>] __do_softirq+0x95/0x130
 [<c0127af5>] do_softirq+0x85/0xa0
 [<c0127bda>] irq_exit+0x3a/0x50
 [<c0106eed>] do_IRQ+0x3d/0x70
 [<c028dee0>] evtchn_do_upcall+0x90/0x110
 [<c01056a1>] hypervisor_callback+0x3d/0x48
 [<c02a461b>] network_open+0x13b/0x210
 [<c02d9564>] dev_open+0x74/0x90
 [<c02db482>] dev_change_flags+0x52/0x110
 [<c0320187>] devinet_ioctl+0x4f7/0x5a0
 [<c0322174>] inet_ioctl+0x94/0xc0
 [<c02cf84d>] sock_ioctl+0xed/0x250
 [<c01842d6>] do_ioctl+0x76/0x90
 [<c0184479>] vfs_ioctl+0x59/0x1d0
 [<c0184657>] sys_ioctl+0x67/0x80
 [<c01054cd>] syscall_call+0x7/0xb
crash>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2007-Apr-21 22:14 UTC

head link

Re: [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build

Graham, Simon wrote:> Just run into a (real) soft lockup running 3.0.4 - stack is at the end of
this, but basically:
>
> . network_open acquires the rx spin lock with spin_lock() and then checks
for
>   work on the queue and calls (I think) netif_rx_schedule with the lock
still held which
>   can call into the hypervisor.
> . An interrupt is delivered to the bottom half of netfront which ends up
calling netif_poll
>   which blocks attempting to acquire the rx spin lock.
>
> Oops! 
>
> I see from the unstable tree that this code was recently modified to use
spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code -
clearly we cant merge this changeset into 3.0.4.
>
> I haven''t looked too closely at all of the code yet, but
I''m wondering if a judicious change of spin_lock to spin_lock_bh in
netfront would be the best approach?
>   
I found a few locking problems when I ran netfront with lockdep
enabled.  Fixes were committed to xen-unstable in 14844:abea8d171503 and
14851:22460cfaca71.  I was wondering if there had been any real cases of
these deadlocking.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Apr-22 09:55 UTC

head link

Re: [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build

On 21/4/07 22:45, "Graham, Simon" <Simon.Graham@stratus.com>
wrote:
> Oops! 
> 
> I see from the unstable tree that this code was recently modified to use
> spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code
-
> clearly we cant merge this changeset into 3.0.4.
> 
> I haven''t looked too closely at all of the code yet, but
I''m wondering if a
> judicious change of spin_lock to spin_lock_bh in netfront would be the best
> approach?
Actually the fixes are localised in changesets 14844 and 14851. These should
be easily portable to the kernel(s) of your choice. Thanks are due again to
Jeremy for finding these nasty lurking deadlocks (albeit with automated
assistance :-).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2007 - 3.0.4: Soft lockup in netfront in SMP build

[Xen-devel] 3.0.4: Soft lockup in netfront in SMP build

Re: [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build

Re: [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build