On 01/25/2012 08:49 PM, Roderick Colenbrander wrote:> Hi Daniel and Konrad,
>
> In an email thread at the end of last year (''Questions about GPLPV
> stability tests''), I mentioned a certain bug in the gntdev driver.
I
> think I found the bug and would like your opinion. Since Daniel did a
> lot of work on gntdev recently, I think he is most familiar with the
> code.
>
> The systems on which I see this bug run Xen 4.1.2 and Linux 2.6.32.48
> (pvops), but the same bug should be in Linux 3.x since the code is
> similar.
>
> Let me summarize the problem. At some point in time a DomU is
> shutdown. Apparently shutdown takes too long according to Xend,
> causing it to SIGKILL qemu-dm:
> [2012-01-16 14:54:34 3419] INFO (XendDomainInfo:2078) Domain has
> shutdown: name=win7-1 id=7581 reason=poweroff.
> [2012-01-16 14:54:34 3419] DEBUG (XendDomainInfo:3071)
> XendDomainInfo.destroy: domid=7581
> [2012-01-16 14:54:36 3419] DEBUG (XendDomainInfo:2401) Destroying device
model
> [2012-01-16 14:54:46 3419] WARNING (image:649) DeviceModel 31742 took
> more than 10s to terminate: sending SIGKILL
>
> This triggers the following Oops:
> [533804.785589] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000008
> [533804.793913] IP: [<ffffffff814bdba9>] _spin_lock+0x5/0x15
> [533804.799556] PGD 0
> [533804.801896] Oops: 0002 [#1] SMP
> [533804.805448] last sysfs file:
> /sys/devices/pci0000:00/0000:00:03.0/0000:03:00.0/class
> [533804.813736] CPU 0
> [533804.816066] Modules linked in: dm_snapshot dm_mod ipmi_si ipmi_devintf
> [533804.822914] Pid: 21632, comm: qemu-dm Not tainted 2.6.32.48 #1 X8STi
> [533804.829595] RIP: e030:[<ffffffff814bdba9>]
[<ffffffff814bdba9>]
> _spin_lock+0x5/0x15
> [533804.837873] RSP: e02b:ffff88005f187c50 EFLAGS: 00010202
> [533804.843513] RAX: 0000000000000100 RBX: ffff88007d6923c0 RCX:
> 0000000000000003
> [533804.851192] RDX: ffff88007deb3180 RSI: ffff88007d6923c0 RDI:
> 0000000000000008
> [533804.858871] RBP: ffff88007e260128 R08: 0000000000000000 R09:
> 0000000000000000
> [533804.866552] R10: ffff88007121eb40 R11: ffffffff811b862b R12:
> ffff88007fac1e40
> [533804.874541] R13: ffff88007e3c3340 R14: ffff88007e3c3340 R15:
> ffff88007fdfbc00
> [533804.882243] FS: 00007f5cdcefe6f0(0000) GS:ffff880028038000(0000)
> knlGS:0000000000000000
> [533804.890874] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [533804.896948] CR2: 0000000000000008 CR3: 0000000001001000 CR4:
> 0000000000002660
> [533804.904627] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [533804.912306] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [533804.919985] Process qemu-dm (pid: 21632, threadinfo
> ffff88005f186000, task ffff880074f4e270)
> [533804.928971] Stack:
> [533804.931312] ffffffff810a9ad0 ffff88007deb3180 ffff88007e260100
> ffff88007e260100
> [533804.938762] <0> ffffffff8124222d 0000000000000008
0000000000000008
> ffff88007deb3180
> [533804.946900] <0> ffffffff810b245e ffff88007fac1e40
ffff88007deb3180
> 0000000000000000
> [533804.955465] Call Trace:
> [533804.958244] [<ffffffff810a9ad0>] ?
mmu_notifier_unregister+0x27/0xa5
> [533804.965019] [<ffffffff8124222d>] ? gntdev_release+0xc3/0xd1
> [533804.971007] [<ffffffff810b245e>] ? __fput+0x100/0x1af
> [533804.976477] [<ffffffff810af998>] ? filp_close+0x5b/0x62
> [533804.982119] [<ffffffff81042989>] ? put_files_struct+0x64/0xc1
> [533804.988280] [<ffffffff810441f0>] ? do_exit+0x209/0x68d
> [533804.993836] [<ffffffff810441f8>] ? do_exit+0x211/0x68d
> [533804.999390] [<ffffffff810446e9>] ? do_group_exit+0x75/0x9c
> [533805.005294] [<ffffffff8104cf1d>] ?
get_signal_to_deliver+0x30d/0x338
> [533805.012063] [<ffffffff81010f7a>] ? do_notify_resume+0x8a/0x6b1
> [533805.018310] [<ffffffff810bdb3a>] ?
poll_select_copy_remaining+0xd0/0xf3
> [533805.025339] [<ffffffff81011c8e>] ? int_signal+0x12/0x17
> [533805.030980] Code: 00 00 00 01 74 05 e8 67 1c d2 ff 48 89 d0 5e c3
> ff 14 25 20 2d 6a 81 f0 81 2f 00 00 00 01 74 05 e8 4d 1c d2 ff c3 b8
> 00 01 00 00 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c3 f0 81
2f
> 00 00
> [533805.050454] RIP [<ffffffff814bdba9>] _spin_lock+0x5/0x15
> [533805.056182] RSP <ffff88005f187c50>
> [533805.059997] CR2: 0000000000000008
> [533805.063638] ---[ end trace 85ee7cbec9ce72ac ]---
> [533805.068584] Fixing recursive fault but reboot is needed!
>
>
> Below I attached the in my opinion relevant part of the gntdev driver:
> static int gntdev_open(struct inode *inode, struct file *flip)
> {
> ...
> ...
> priv->mm = get_task_mm(current);
> if (!priv->mm) {
> kfree(priv);
> return -ENOMEM;
> }
> priv->mn.ops = &gntdev_mmu_ops;
> mmu_notifier_register(&priv->mn, priv->mm);
> mmput(priv->mm);
>
> flip->private_data = priv;
> ...
> }
>
> static int gntdev_release(struct inode *inode, struct file *flip)
> {
> struct gntdev_priv *priv = flip->private_data;
> ...
> mmu_notifier_unregister(&priv->mn, priv->mm);
> kfree(priv);
> return 0;
> }
>
> I think the bug is due to gntdev not handling reference counting
> (mm->mm_users) in combination with a race condition. The function
> gntdev_open calls get_task_mm (which increases mm_users) but straight
> after storing ''mm'' in ''priv'' it
performs a mmput which again decreases
> the reference count. Since mmu_notifier_register also increases the
> reference count, the count is definitely positive after gntdev_open. I
> suspect that due to a race condition (after 10 seconds, xend performed
> a SIGKILL) the mm structure already got freed somehow. This causes a
> bug in mmu_notifier_unregister.
>
> If this is correct then the the ''mmput'' line should be
moved to
> gntdev_release and placed after mmu_notifier_unregister. What do you
> think? If this is correct, I will send a patch for this.
>
> Regards,
> Roderick Colenbrander
If this is correct, I think it''s a bug in mmu_notifier_register - it
already increments mm->mm_count and there is a note in the function
description suggesting that this suffices to run mmu_notifier_unregister
later.
I''m not very familiar with the MMU notifier (my changes were made for
the
case where it was not used) so it might be assuming the extra reference
you are proposing - especially if that change fixes the bug.
--
Daniel De Graaf
National Security Agency