Christian Borntraeger
2008-Feb-05 05:36 UTC
[PATCH] virtio_net: Fix open <-> interrupt race
I got the following oops during interface ifup. Unfortunately its not easily reproducable so I cant say for sure that my fix fixes this problem, but I am confident and I think its correct anyway: <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234! <4>illegal operation: 0001 [#1] PREEMPT SMP <4>Modules linked in: <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91 <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8) <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34) <4> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344 <4> 000000000e980900 00000000008848b0 000000000084e748 0000000000000000 <4> 000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8 <4> 000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8 <4>Krnl Code: 0000000000466368: e3b0b0700004 lg %r11,112(%r11) <4> 000000000046636e: 07fe bcr 15,%r14 <4> 0000000000466370: a7f40001 brc 15,466372 <4> >0000000000466374: a7f4fff6 brc 15,466360 <4> 0000000000466378: eb7ff0500024 stmg %r7,%r15,80(%r15) <4> 000000000046637e: a7f13e00 tmll %r15,15872 <4> 0000000000466382: b90400ef lgr %r14,%r15 <4> 0000000000466386: a7840001 brc 8,466388 <4>Call Trace: <4>([<000201500f85c000>] 0x201500f85c000) <4> [<0000000000466556>] vring_interrupt+0x72/0x88 <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44 <4> [<000000000010d22c>] do_extint+0xbc/0xf8 <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a <4> [<000000000010a182>] cpu_idle+0x216/0x238 <4>([<000000000010a162>] cpu_idle+0x1f6/0x238) <4> [<0000000000568656>] rest_init+0xaa/0xb8 <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490 <4> [<0000000000100020>] _stext+0x20/0x80 <4> <4> <0>Kernel panic - not syncing: Fatal exception in interrupt <4> After looking at the code and the dump I think the following scenario happened: Ifup was running on cpu2 and the interrupt arrived on cpu0. Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was called by vring_interrupt, executed netif_rx_schedule_prep, which succeeded and therefore called disable_cb. This triggered the BUG_ON, as interrupts were already disabled by cpu 2. I think the proper solution is to make the call to disable_cb depend on the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep in the same way as skb_recv_done. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> --- drivers/net/virtio_net.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) Index: kvm/drivers/net/virtio_net.c ==================================================================--- kvm.orig/drivers/net/virtio_net.c +++ kvm/drivers/net/virtio_net.c @@ -321,10 +321,12 @@ static int virtnet_open(struct net_devic /* If all buffers were filled by other side before we napi_enabled, we * won't get another interrupt, so process any outstanding packets - * now. virtnet_poll wants re-enable the queue, so we disable here. */ - vi->rvq->vq_ops->disable_cb(vi->rvq); - netif_rx_schedule(vi->dev, &vi->napi); - + * now. virtnet_poll wants re-enable the queue, so we disable here. + * We synchronize against interrupts via NAPI_STATE_SCHED */ + if (netif_rx_schedule_prep(dev, &vi->napi)) { + vi->rvq->vq_ops->disable_cb(vi->rvq); + __netif_rx_schedule(dev, &vi->napi); + } return 0; } -- IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Herbert Kircher Sitz der Gesellschaft: B?blingen Registergericht: Amtsgericht Stuttgart, HRB 243294
On Wednesday 06 February 2008 00:36:10 Christian Borntraeger wrote:> I got the following oops during interface ifup. Unfortunately its not > easily reproducable so I cant say for sure that my fix fixes this > problem, but I am confident and I think its correct anyway:Looks fine, and certainly shouldn't hurt. Please push this out. Acked-by: Rusty Russell <rusty@rustcorp.com.au> Thanks, Rusty.
Possibly Parallel Threads
- [PATCH] virtio_net: Fix open <-> interrupt race
- virtio_net: another race with virtio_net and enable_cb
- virtio_net: another race with virtio_net and enable_cb
- [PATCH] virtio_net/virtio_ring: fix race in enable_cb
- [PATCH] virtio_net/virtio_ring: fix race in enable_cb