Graham, Simon
2007-Apr-21 21:45 UTC
[Xen-devel] 3.0.4: Soft lockup in netfront in SMP build
Just run into a (real) soft lockup running 3.0.4 - stack is at the end of this, but basically: . network_open acquires the rx spin lock with spin_lock() and then checks for work on the queue and calls (I think) netif_rx_schedule with the lock still held which can call into the hypervisor. . An interrupt is delivered to the bottom half of netfront which ends up calling netif_poll which blocks attempting to acquire the rx spin lock. Oops! I see from the unstable tree that this code was recently modified to use spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code - clearly we cant merge this changeset into 3.0.4. I haven''t looked too closely at all of the code yet, but I''m wondering if a judicious change of spin_lock to spin_lock_bh in netfront would be the best approach? Simon BUG: soft lockup detected on CPU#0! Pid: 2973, comm: ifconfig EIP: 0061:[<c035868a>] CPU: 0 EIP is at _spin_lock+0xa/0x20 EFLAGS: 00000286 Tainted: GF (2.6.16.38-xen #1) EAX: c13b840c EBX: c13b8000 ECX: 00000040 EDX: cf783dc0 ESI: 00000000 EDI: c1233274 EBP: cf783d28 DS: 007b ES: 007b CR0: 8005003b CR2: b7eea198 CR3: 103f2000 CR4: 00000660 [<c010595d>] show_trace+0xd/0x10 [<c010344f>] show_regs+0x18f/0x1c0 [<c01491f4>] softlockup_tick+0xe4/0x100 [<c012c713>] do_timer+0x43/0xf0 [<c010936a>] timer_interrupt+0x1fa/0x670 [<c01494da>] handle_IRQ_event+0x6a/0xb0 [<c01495a9>] __do_IRQ+0x89/0xf0 [<c0106ee8>] do_IRQ+0x38/0x70 [<c028dee0>] evtchn_do_upcall+0x90/0x110 [<c01056a1>] hypervisor_callback+0x3d/0x48 [<c02a5db2>] netif_poll+0x32/0x630 [<c02dace8>] net_rx_action+0x148/0x230 [<c01279d5>] __do_softirq+0x95/0x130 [<c0127af5>] do_softirq+0x85/0xa0 [<c0127bda>] irq_exit+0x3a/0x50 [<c0106eed>] do_IRQ+0x3d/0x70 [<c028dee0>] evtchn_do_upcall+0x90/0x110 [<c01056a1>] hypervisor_callback+0x3d/0x48 [<c02a461b>] network_open+0x13b/0x210 [<c02d9564>] dev_open+0x74/0x90 [<c02db482>] dev_change_flags+0x52/0x110 [<c0320187>] devinet_ioctl+0x4f7/0x5a0 [<c0322174>] inet_ioctl+0x94/0xc0 [<c02cf84d>] sock_ioctl+0xed/0x250 [<c01842d6>] do_ioctl+0x76/0x90 [<c0184479>] vfs_ioctl+0x59/0x1d0 [<c0184657>] sys_ioctl+0x67/0x80 [<c01054cd>] syscall_call+0x7/0xb crash> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2007-Apr-21 22:14 UTC
Re: [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build
Graham, Simon wrote:> Just run into a (real) soft lockup running 3.0.4 - stack is at the end of this, but basically: > > . network_open acquires the rx spin lock with spin_lock() and then checks for > work on the queue and calls (I think) netif_rx_schedule with the lock still held which > can call into the hypervisor. > . An interrupt is delivered to the bottom half of netfront which ends up calling netif_poll > which blocks attempting to acquire the rx spin lock. > > Oops! > > I see from the unstable tree that this code was recently modified to use spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code - clearly we cant merge this changeset into 3.0.4. > > I haven''t looked too closely at all of the code yet, but I''m wondering if a judicious change of spin_lock to spin_lock_bh in netfront would be the best approach? >I found a few locking problems when I ran netfront with lockdep enabled. Fixes were committed to xen-unstable in 14844:abea8d171503 and 14851:22460cfaca71. I was wondering if there had been any real cases of these deadlocking. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Apr-22 09:55 UTC
Re: [Xen-devel] 3.0.4: Soft lockup in netfront in SMP build
On 21/4/07 22:45, "Graham, Simon" <Simon.Graham@stratus.com> wrote:> Oops! > > I see from the unstable tree that this code was recently modified to use > spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code - > clearly we cant merge this changeset into 3.0.4. > > I haven''t looked too closely at all of the code yet, but I''m wondering if a > judicious change of spin_lock to spin_lock_bh in netfront would be the best > approach?Actually the fixes are localised in changesets 14844 and 14851. These should be easily portable to the kernel(s) of your choice. Thanks are due again to Jeremy for finding these nasty lurking deadlocks (albeit with automated assistance :-). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel