thr3ads.net - Xen devel - Crashing kernel with dom0/libxc gnttab/gntshr [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Vincent Bernardoff

2013-Jul-30 10:50 UTC

Crashing kernel with dom0/libxc gnttab/gntshr

Hi,

The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux 
kernel) crash with the attached dmesg output.

The program just shares a page from dom0 to dom0, then map the page, 
then unshare the page, and the unsharing makes the kernel crash. I ran 
into this issue while implementing a native OCaml vchan driver.

I''m very much interested in advices/help.

Cheers,

Vincent

--------------030509080808090003030006
Content-Type: text/x-csrc; name="libxc_gntshr_bug2.c"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="libxc_gntshr_bug2.c"

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <xenctrl.h>
#include <sys/mman.h>

int main(int argc, char** argv)
{
  void* map_shr;
  void* map_tab;
  uint32_t ref;
  int ret;

  xc_gntshr *shr_h = xc_gntshr_open(NULL, 0);
  if (shr_h == NULL)
    {
      perror("xc_gntshr_open");
      exit(EXIT_FAILURE);
    }

  xc_gnttab *tab_h = xc_gnttab_open(NULL, 0);
  if (tab_h == NULL)
    {
      perror("xc_gnttab_open");
      exit(EXIT_FAILURE);
    }

  map_shr = xc_gntshr_share_pages(shr_h, 0, 1, &ref, 1);
  if (map_shr == NULL)
    {
      perror("xc_gntshr_share_pages");
      exit(EXIT_FAILURE);
    }

  map_tab = xc_gnttab_map_grant_ref(tab_h, 0, ref, PROT_READ|PROT_WRITE);
  if (map_tab == NULL)
    {
      perror("xc_gnttab_map_grant_ref");
      exit(EXIT_FAILURE);
    }

  /* Now we unshare the page */
  ret = xc_gntshr_munmap(shr_h, map_shr, 1);
  if (ret != 0)
    {
      perror("xc_gntshr_munmap");
      exit(EXIT_FAILURE);
    }

  /* At this point, the kernel should complain… */

  return 0;
}

--------------030509080808090003030006
Content-Type: text/x-log; name="dmesg.log"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="dmesg.log"

[  299.710029] FS:  00007fe69748f700(0000) GS:ffff88011ba40000(0000)
knlGS:0000000000000000
[  299.710029] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  299.710029] CR2: 00007fe696d78f30 CR3: 00000000c34fe000 CR4: 0000000000002660
[  299.710029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  299.710029] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  299.876698] Process a.out (pid: 922, threadinfo ffff8800cc3c6000, task
ffff8800c34829e0)
[  299.876698] Stack:
[  299.876698]  ffff8800cc2dc5b0 ffff8800cc3c7d88 ffff88000251bc60
ffff88000251b980
[  299.876698]  ffff88000251b960 ffff88000251b990 ffff8800c34829e0
ffff8800cc3c7dd8
[  299.876698]  ffffffffa03e847f ffff88000251b990 ffff880114d50a80
0000000000000000
[  299.876698] Call Trace:
[  299.876698]  [<ffffffffa03e847f>] ? mn_release+0x4f/0x130 [xen_gntdev]
[  299.876698]  [<ffffffff8116b0c4>] ? __mmu_notifier_release+0x44/0xc0
[  299.876698]  [<ffffffff81153d09>] ? exit_mmap+0x149/0x170
[  299.876698]  [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  299.876698]  [<ffffffff810b5c3a>] ? exit_robust_list+0x6a/0x130
[  299.876698]  [<ffffffff81055209>] ? mmput+0x59/0x120
[  299.876698]  [<ffffffff8105d97f>] ? do_exit+0x27f/0xab0
[  299.876698]  [<ffffffff81152b90>] ? do_munmap+0x2b0/0x3e0
[  299.876698]  [<ffffffff8105e22f>] ? do_group_exit+0x3f/0xa0
[  299.876698]  [<ffffffff8105e2a4>] ? sys_exit_group+0x14/0x20
[  299.876698]  [<ffffffff814da89d>] ? system_call_fastpath+0x1a/0x1f
[  299.876698] Code: 00 00 00 d8 02 3c cc 00 88 ff ff ff ff ff ff ff ff ff ff 60
7d 3c cc 00 88 ff ff 30 e0 00 00 00 00 00 00 82 02 01 00 00 00 00 00 <70>
7d 3c cc 00 88 ff ff 2b e0 00 00 00 00 00 00 b0 c5 2d
cc 00
[  299.876698] RIP  [<ffff8800cc3c7d60>] 0xffff8800cc3c7d5f
[  299.876698]  RSP <ffff8800cc3c7d70>
[  299.964961] ---[ end trace 2cc41b9c64237359 ]---
[  299.964962] Fixing recursive fault but reboot is needed!
[  299.964963] BUG: scheduling while atomic: a.out/922/0x00000002
[  299.964985] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_analog
snd_hda_intel snd_hda_codec iTCO_wdt gpio_ich iTCO_vendor_support ppdev evdev
dcdbas radeon mperf psmouse tg3 coretemp microcode serio_
raw pcspkr snd_hwdep snd_pcm ttm snd_page_alloc snd_timer drm_kms_helper
i2c_i801 snd x38_edac edac_core ptp pps_core lpc_ich libphy drm i2c_algo_bit
i2c_core soundcore parport_pc parport button processor xenf
s xen_privcmd xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev
xen_evtchn nfs lockd sunrpc fscache ext4 crc16 mbcache jbd2 hid_generic usbhid
hid sr_mod cdrom sd_mod ahci libahci libata scsi_mod ehc
i_pci uhci_hcd ehci_hcd usbcore usb_common
[  299.964987] Pid: 922, comm: a.out Tainted: G    B D      3.9.9-1-ARCH #1
[  299.964987] Call Trace:
[  299.964991]  [<ffffffff814cabcb>] __schedule_bug+0x4d/0x5b
[  299.964994]  [<ffffffff814d1ae6>] __schedule+0x936/0x940
[  299.964997]  [<ffffffff81059a29>] ? console_trylock+0x19/0x70
[  299.964999]  [<ffffffff814d2c86>] ? _raw_spin_unlock+0x36/0x40
[  299.965002]  [<ffffffff8105a3c6>] ? vprintk_emit+0x176/0x4c0
[  299.965004]  [<ffffffff814ca7ff>] ? printk+0x54/0x56
[  299.965007]  [<ffffffff814d1b19>] schedule+0x29/0x70
[  299.965009]  [<ffffffff8105e129>] do_exit+0xa29/0xab0
[  299.965012]  [<ffffffff8105b731>] ? kmsg_dump+0xc1/0xd0
[  299.965015]  [<ffffffff814d42c3>] oops_end+0xa3/0xe0
[  299.965019]  [<ffffffff81018deb>] die+0x4b/0x70
[  299.965021]  [<ffffffff814d3be0>] do_trap+0x60/0x170
[  299.965024]  [<ffffffff810163d5>] do_invalid_op+0x95/0xb0
[  299.965027]  [<ffffffff810085ec>] ? xen_batched_set_pte+0xdc/0x200
[  299.965030]  [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  299.965032]  [<ffffffff814d2ca2>] ?
_raw_spin_unlock_irqrestore+0x12/0x50
[  299.965035]  [<ffffffff814dbb1e>] invalid_op+0x1e/0x30
[  299.965038]  [<ffffffffa03e847f>] ? mn_release+0x4f/0x130 [xen_gntdev]
[  299.965042]  [<ffffffff8116b0c4>] ? __mmu_notifier_release+0x44/0xc0
[  299.965045]  [<ffffffff81153d09>] ? exit_mmap+0x149/0x170
[  299.965047]  [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50
[  299.965050]  [<ffffffff810b5c3a>] ? exit_robust_list+0x6a/0x130
[  299.965055]  [<ffffffff81055209>] ? mmput+0x59/0x120
[  299.965057]  [<ffffffff8105d97f>] ? do_exit+0x27f/0xab0
[  299.965060]  [<ffffffff81152b90>] ? do_munmap+0x2b0/0x3e0
[  299.965062]  [<ffffffff8105e22f>] ? do_group_exit+0x3f/0xa0
[  299.965065]  [<ffffffff8105e2a4>] ? sys_exit_group+0x14/0x20
[  299.965067]  [<ffffffff814da89d>] ? system_call_fastpath+0x1a/0x1f

--------------030509080808090003030006
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--------------030509080808090003030006--

Ian Campbell

2013-Jul-30 10:59 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On Tue, 2013-07-30 at 11:50 +0100, Vincent Bernardoff
wrote:> Hi,
> 
> The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux 
> kernel) crash with the attached dmesg output.
The dmesg output seems to start halfway through a crash message, which
means it is missing the PC etc and may not be the first crash in any
case.

Please could you configure a serial console and try and capture the
first crash message in its entirety. Bonus points if you can avoid
linewrapping the dmesg too ;-)
> The program just shares a page from dom0 to dom0,
Not just from dom0 to dom0 but actually within the same process. I''m
not
sure that matters but it is a bit unusual. Are you able to repro this
with two separate processes acting as front vs. backend?

The reason I ask is that it isn''t clear if the crash is the process
with
its front or back "hat" on, separating the two out would be useful.
>  then map the page, 
> then unshare the page, and the unsharing makes the kernel crash. I ran 
> into this issue while implementing a native OCaml vchan driver.
> 
> I''m very much interested in advices/help.
> 
> Cheers,
> 
> Vincent
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Vincent Bernardoff

2013-Jul-30 13:41 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Vincent Bernardoff

2013-Jul-30 15:50 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

I also have a bug using tools/libvchan/vchan-node1:

When killing the server node (sudo ./vchan-node1 server read 0 
/local/domain/0/vchan) before the client node (sudo ./vchan-node1 client 
write 0 /local/domain/0/vchan), the following dmesg error appears.

I''m using Xen unstable (master branch) and stock Archlinux
3.10.3-1-ARCH
kernel.

Use the following script (setup.sh) if you want to try reproducing it 
with vchan-node1, vchan-node1 indeed needs some xenstore keys to be 
written in order to work correctly.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2013-Jul-30 15:55 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Adding Daniel who maintains vchan and I think the kernel side of the
driver in question too to the CC.

On Tue, 2013-07-30 at 16:50 +0100, Vincent Bernardoff
wrote:> I also have a bug using tools/libvchan/vchan-node1:
> 
> When killing the server node (sudo ./vchan-node1 server read 0 
> /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client 
> write 0 /local/domain/0/vchan), the following dmesg error appears.
> 
> I''m using Xen unstable (master branch) and stock Archlinux
3.10.3-1-ARCH
> kernel.
> 
> Use the following script (setup.sh) if you want to try reproducing it 
> with vchan-node1, vchan-node1 indeed needs some xenstore keys to be 
> written in order to work correctly.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

David Vrabel

2013-Jul-30 16:58 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On 30/07/13 16:50, Vincent Bernardoff wrote:> I also have a bug using tools/libvchan/vchan-node1:
> 
> When killing the server node (sudo ./vchan-node1 server read 0
> /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client
> write 0 /local/domain/0/vchan), the following dmesg error appears.
Does this only happen if both client and server are in the same domain?
 Have you tested it using two domains? Did it work?
> I''m using Xen unstable (master branch) and stock Archlinux
3.10.3-1-ARCH
> kernel.
> 
> Use the following script (setup.sh) if you want to try reproducing it
> with vchan-node1, vchan-node1 indeed needs some xenstore keys to be
> written in order to work correctly.
[  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
pmd:b9b5c067
[  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
   (null) index:0xffffffffffffffff

I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
 This has looked up the page using the PTE it is trying to clear.  Has
it found the correct page?  Since the MFN is currently mapped into the
same domain, has the m2p_override stuff confused the look up and it is
checking the grantee page not the granter?

David

Daniel De Graaf

2013-Jul-30 21:03 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On 07/30/2013 12:58 PM, David Vrabel wrote:
[...]>
> [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> pmd:b9b5c067
> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
>     (null) index:0xffffffffffffffff
>
> I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
>   This has looked up the page using the PTE it is trying to clear.  Has
> it found the correct page?  Since the MFN is currently mapped into the
> same domain, has the m2p_override stuff confused the look up and it is
> checking the grantee page not the granter?
>
> David
I think something like this is happening, since while reproducing this
on my test system, some linked list corruption was found that I believe
to be the cause of this problem. The gnttab_map_refs function on PV uses
m2p_add_override on the page, which threads page->lru to an
m2p_overrides list. However, something else is using page->lru during
the use of gntdev, as shown by the following debug patch:

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 3c8803f..198e57e 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
  	if (err)
  		return err;
  
+	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
+		&map->pages[0]->lru,
+		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
+
  	for (i = 0; i < map->count; i++) {
  		if (map->map_ops[i].status)
  			err = -EINVAL;
@@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int
offset, int pages)
  		}
  	}
  
+	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
+		&map->pages[0]->lru,
+		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
  	err = gnttab_unmap_refs(map->unmap_ops + offset,
  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
			pages);

Output:
[   88.610644] map page0 lru: ffffea0001dee160
prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
[   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167
pmd:2541a067
[   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:         
(null) index:0xffffffffffffffff
[   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
[   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:          (null)
mapping:ffff8800692974a0 index:0
[   88.611547] vma->vm_ops->fault:           (null)
[   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
[...backtrace cropped...]
[   88.614301] unmap page0 lru: ffffea0001dee160
prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938

The initial map is a linked list with only that element, so the address
0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
found by zap_pte_range is not a valid struct page.

The struct page* being used by the gntalloc device was 0xffffea0000952740,
for reference; it''s not a direct collision between the page used by the
gntdev and gntalloc devices.

Not sure what the best fix is for this at the moment.

-- 
Daniel De Graaf
National Security Agency

Stefano Stabellini

2013-Aug-02 13:50 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On Tue, 30 Jul 2013, Daniel De Graaf wrote:> On 07/30/2013 12:58 PM, David Vrabel wrote:
> [...]
> > 
> > [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> > pmd:b9b5c067
> > [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
> >     (null) index:0xffffffffffffffff
> > 
> > I think this is the test for page_mapcount(page) < 0 in
zap_pte_range().
> >   This has looked up the page using the PTE it is trying to clear. 
Has
> > it found the correct page?  Since the MFN is currently mapped into the
> > same domain, has the m2p_override stuff confused the look up and it is
> > checking the grantee page not the granter?
> > 
> > David
> 
> I think something like this is happening, since while reproducing this
> on my test system, some linked list corruption was found that I believe
> to be the cause of this problem. The gnttab_map_refs function on PV uses
> m2p_add_override on the page, which threads page->lru to an
> m2p_overrides list. However, something else is using page->lru during
> the use of gntdev, as shown by the following debug patch:
I have never managed to prove that something else is trying to use
page->lru while the m2p_override is using it.

Jeremy, at the time the code was written, you were pretty confident
that page->lru couldn''t be used by anybody else.
Why was that?


> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 3c8803f..198e57e 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
>  	if (err)
>  		return err;
>  +	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
> +		&map->pages[0]->lru,
> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
> +
>  	for (i = 0; i < map->count; i++) {
>  		if (map->map_ops[i].status)
>  			err = -EINVAL;
> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map,
int
> offset, int pages)
>  		}
>  	}
>  +	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
> +		&map->pages[0]->lru,
> +		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
> +		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
>  	err = gnttab_unmap_refs(map->unmap_ops + offset,
>  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages
> + offset,
> 			pages);
> 
> Output:
> [   88.610644] map page0 lru: ffffea0001dee160
> prev=ffffffff82f2d510:ffffea0001dee160
next=ffffffff82f2d510:ffffea0001dee160
> [   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167
> pmd:2541a067
> [   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:
> (null) index:0xffffffffffffffff
> [   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
> [   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:
> (null) mapping:ffff8800692974a0 index:0
> [   88.611547] vma->vm_ops->fault:           (null)
> [   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
> [...backtrace cropped...]
> [   88.614301] unmap page0 lru: ffffea0001dee160
> prev=ffff8800254c9d08:ffff88001ea0b120
next=ffff8800254c9d08:ffff88001ea0b938
> 
> The initial map is a linked list with only that element, so the address
> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
> found by zap_pte_range is not a valid struct page.
> 
> The struct page* being used by the gntalloc device was 0xffffea0000952740,
> for reference; it''s not a direct collision between the page used
by the
> gntdev and gntalloc devices.
> 
> Not sure what the best fix is for this at the moment.
> 
> -- 
> Daniel De Graaf
> National Security Agency
>

Ian Campbell

2013-Aug-02 14:10 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On Fri, 2013-08-02 at 14:50 +0100, Stefano Stabellini
wrote:> On Tue, 30 Jul 2013, Daniel De Graaf wrote:
> > On 07/30/2013 12:58 PM, David Vrabel wrote:
> > [...]
> > > 
> > > [  902.729307] BUG: Bad page map in process vchan-node1 
pte:12bfff167
> > > pmd:b9b5c067
> > > [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
> > >     (null) index:0xffffffffffffffff
> > > 
> > > I think this is the test for page_mapcount(page) < 0 in
zap_pte_range().
> > >   This has looked up the page using the PTE it is trying to
clear.  Has
> > > it found the correct page?  Since the MFN is currently mapped
into the
> > > same domain, has the m2p_override stuff confused the look up and
it is
> > > checking the grantee page not the granter?
> > > 
> > > David
> > 
> > I think something like this is happening, since while reproducing this
> > on my test system, some linked list corruption was found that I
believe
> > to be the cause of this problem. The gnttab_map_refs function on PV
uses
> > m2p_add_override on the page, which threads page->lru to an
> > m2p_overrides list. However, something else is using page->lru
during
> > the use of gntdev, as shown by the following debug patch:
> 
> I have never managed to prove that something else is trying to use
> page->lru while the m2p_override is using it.
Isn''t it very much dependent on the actual original owner of the page?

A lot of these fields are free to use by the code which actually called
alloc_page, but for a facility like the m2p_override which can consume
pages from a variety of sources you''d need to be careful about what
each
of those callers was doing.

Ian.

Jeremy Fitzhardinge

2013-Aug-02 16:49 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On 08/02/2013 06:50 AM, Stefano Stabellini wrote:> On Tue, 30 Jul 2013, Daniel De Graaf wrote:
>> On 07/30/2013 12:58 PM, David Vrabel wrote:
>> [...]
>>> [  902.729307] BUG: Bad page map in process vchan-node1 
pte:12bfff167
>>> pmd:b9b5c067
>>> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
>>>     (null) index:0xffffffffffffffff
>>>
>>> I think this is the test for page_mapcount(page) < 0 in
zap_pte_range().
>>>   This has looked up the page using the PTE it is trying to clear. 
Has
>>> it found the correct page?  Since the MFN is currently mapped into
the
>>> same domain, has the m2p_override stuff confused the look up and it
is
>>> checking the grantee page not the granter?
>>>
>>> David
>> I think something like this is happening, since while reproducing this
>> on my test system, some linked list corruption was found that I believe
>> to be the cause of this problem. The gnttab_map_refs function on PV
uses
>> m2p_add_override on the page, which threads page->lru to an
>> m2p_overrides list. However, something else is using page->lru
during
>> the use of gntdev, as shown by the following debug patch:
> I have never managed to prove that something else is trying to use
> page->lru while the m2p_override is using it.
>
> Jeremy, at the time the code was written, you were pretty confident
> that page->lru couldn''t be used by anybody else.
> Why was that?
Hm. Probably the reasoning was that page->lru was only used for pages
which in the pagecache, mapped from files, and m2p pages are never
mapped from files. But maybe something else has decided to use lru for
non-mapped pages (transparent hugepage? page dedup?), or are m2p pages
getting into the pagecache somehow?

    J
>
>
>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 3c8803f..198e57e 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
>>  	if (err)
>>  		return err;
>>  +	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
>> +		&map->pages[0]->lru,
>> +		map->pages[0]->lru.prev,
map->pages[0]->lru.prev->next,
>> +		map->pages[0]->lru.next,
map->pages[0]->lru.next->prev);
>> +
>>  	for (i = 0; i < map->count; i++) {
>>  		if (map->map_ops[i].status)
>>  			err = -EINVAL;
>> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map
*map, int
>> offset, int pages)
>>  		}
>>  	}
>>  +	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
>> +		&map->pages[0]->lru,
>> +		map->pages[0]->lru.prev,
map->pages[0]->lru.prev->next,
>> +		map->pages[0]->lru.next,
map->pages[0]->lru.next->prev);
>>  	err = gnttab_unmap_refs(map->unmap_ops + offset,
>>  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages
>> + offset,
>> 			pages);
>>
>> Output:
>> [   88.610644] map page0 lru: ffffea0001dee160
>> prev=ffffffff82f2d510:ffffea0001dee160
next=ffffffff82f2d510:ffffea0001dee160
>> [   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167
>> pmd:2541a067
>> [   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:
>> (null) index:0xffffffffffffffff
>> [   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
>> [   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:
>> (null) mapping:ffff8800692974a0 index:0
>> [   88.611547] vma->vm_ops->fault:           (null)
>> [   88.611555] vma->vm_file->f_op->mmap:
gntalloc_mmap+0x0/0x1d0
>> [...backtrace cropped...]
>> [   88.614301] unmap page0 lru: ffffea0001dee160
>> prev=ffff8800254c9d08:ffff88001ea0b120
next=ffff8800254c9d08:ffff88001ea0b938
>>
>> The initial map is a linked list with only that element, so the address
>> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page
being
>> found by zap_pte_range is not a valid struct page.
>>
>> The struct page* being used by the gntalloc device was
0xffffea0000952740,
>> for reference; it''s not a direct collision between the page
used by the
>> gntdev and gntalloc devices.
>>
>> Not sure what the best fix is for this at the moment.
>>
>> -- 
>> Daniel De Graaf
>> National Security Agency
>>

Stefano Stabellini

2013-Aug-02 17:02 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On Fri, 2 Aug 2013, Jeremy Fitzhardinge wrote:> On 08/02/2013 06:50 AM, Stefano Stabellini wrote:
> > On Tue, 30 Jul 2013, Daniel De Graaf wrote:
> >> On 07/30/2013 12:58 PM, David Vrabel wrote:
> >> [...]
> >>> [  902.729307] BUG: Bad page map in process vchan-node1 
pte:12bfff167
> >>> pmd:b9b5c067
> >>> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1
mapping:
> >>>     (null) index:0xffffffffffffffff
> >>>
> >>> I think this is the test for page_mapcount(page) < 0 in
zap_pte_range().
> >>>   This has looked up the page using the PTE it is trying to
clear.  Has
> >>> it found the correct page?  Since the MFN is currently mapped
into the
> >>> same domain, has the m2p_override stuff confused the look up
and it is
> >>> checking the grantee page not the granter?
> >>>
> >>> David
> >> I think something like this is happening, since while reproducing
this
> >> on my test system, some linked list corruption was found that I
believe
> >> to be the cause of this problem. The gnttab_map_refs function on
PV uses
> >> m2p_add_override on the page, which threads page->lru to an
> >> m2p_overrides list. However, something else is using page->lru
during
> >> the use of gntdev, as shown by the following debug patch:
> > I have never managed to prove that something else is trying to use
> > page->lru while the m2p_override is using it.
> >
> > Jeremy, at the time the code was written, you were pretty confident
> > that page->lru couldn''t be used by anybody else.
> > Why was that?
> 
> Hm. Probably the reasoning was that page->lru was only used for pages
> which in the pagecache, mapped from files, and m2p pages are never
> mapped from files. But maybe something else has decided to use lru for
> non-mapped pages (transparent hugepage? page dedup?), or are m2p pages
> getting into the pagecache somehow?
> 
I think it could be the latter.
For example we have recently changed QEMU not to use O_DIRECT on foreign
grants to work around a network bug in the kernel.
It might be possible that these pages end up in the pagecache after they
have been already added to the m2p.


> >
> >
> >
> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> >> index 3c8803f..198e57e 100644
> >> --- a/drivers/xen/gntdev.c
> >> +++ b/drivers/xen/gntdev.c
> >> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map
*map)
> >>  	if (err)
> >>  		return err;
> >>  +	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
> >> +		&map->pages[0]->lru,
> >> +		map->pages[0]->lru.prev,
map->pages[0]->lru.prev->next,
> >> +		map->pages[0]->lru.next,
map->pages[0]->lru.next->prev);
> >> +
> >>  	for (i = 0; i < map->count; i++) {
> >>  		if (map->map_ops[i].status)
> >>  			err = -EINVAL;
> >> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct
grant_map *map, int
> >> offset, int pages)
> >>  		}
> >>  	}
> >>  +	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
> >> +		&map->pages[0]->lru,
> >> +		map->pages[0]->lru.prev,
map->pages[0]->lru.prev->next,
> >> +		map->pages[0]->lru.next,
map->pages[0]->lru.next->prev);
> >>  	err = gnttab_unmap_refs(map->unmap_ops + offset,
> >>  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages
> >> + offset,
> >> 			pages);
> >>
> >> Output:
> >> [   88.610644] map page0 lru: ffffea0001dee160
> >> prev=ffffffff82f2d510:ffffea0001dee160
next=ffffffff82f2d510:ffffea0001dee160
> >> [   88.611515] BUG: Bad page map in process a.out 
pte:8000000077b85167
> >> pmd:2541a067
> >> [   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:
> >> (null) index:0xffffffffffffffff
> >> [   88.611532] page flags:
0x1000000000000814(referenced|dirty|private)
> >> [   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:
> >> (null) mapping:ffff8800692974a0 index:0
> >> [   88.611547] vma->vm_ops->fault:           (null)
> >> [   88.611555] vma->vm_file->f_op->mmap:
gntalloc_mmap+0x0/0x1d0
> >> [...backtrace cropped...]
> >> [   88.614301] unmap page0 lru: ffffea0001dee160
> >> prev=ffff8800254c9d08:ffff88001ea0b120
next=ffff8800254c9d08:ffff88001ea0b938
> >>
> >> The initial map is a linked list with only that element, so the
address
> >> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page
being
> >> found by zap_pte_range is not a valid struct page.
> >>
> >> The struct page* being used by the gntalloc device was
0xffffea0000952740,
> >> for reference; it''s not a direct collision between the
page used by the
> >> gntdev and gntalloc devices.
> >>
> >> Not sure what the best fix is for this at the moment.
> >>
> >> -- 
> >> Daniel De Graaf
> >> National Security Agency
> >>
>

Ian Campbell

2013-Aug-03 10:06 UTC

head link

Re: Crashing kernel with dom0/libxc gnttab/gntshr

On Fri, 2013-08-02 at 18:02 +0100, Stefano Stabellini
wrote:> On Fri, 2 Aug 2013, Jeremy Fitzhardinge wrote:
> > On 08/02/2013 06:50 AM, Stefano Stabellini wrote:
> > > Jeremy, at the time the code was written, you were pretty
confident
> > > that page->lru couldn''t be used by anybody else.
> > > Why was that?
> > 
> > Hm. Probably the reasoning was that page->lru was only used for
pages
> > which in the pagecache, mapped from files, and m2p pages are never
> > mapped from files. But maybe something else has decided to use lru for
> > non-mapped pages (transparent hugepage? page dedup?), or are m2p pages
> > getting into the pagecache somehow?
> > 
> 
> I think it could be the latter.
> For example we have recently changed QEMU not to use O_DIRECT on foreign
> grants to work around a network bug in the kernel.
> It might be possible that these pages end up in the pagecache after they
> have been already added to the m2p.
Vincent''s test programs (one posted at the root of this thread and
another a multiprocess version a few mails in) doesn''t do any explicit
I/O on the shared pages at all, it literally doesn''t touch them.

The test program is:
	allocate
	share
	map
	unmap
	crash

The second version moves the map/unmap/crash into a separate process
(achieved with fork). I suppose it might still be interesting to split
into two completely separate executables to check for weird cross talk
between share and map in related (i.e. parent-child) processes.

I hope the gntshr interface locks pages down so that we aren''t worrying
about swapping etc, but this doesn''t appear to be at all probabilistic
in any case.

Ian.

Xen devel - Jul 2013 - Crashing kernel with dom0/libxc gnttab/gntshr

Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr

Re: Crashing kernel with dom0/libxc gnttab/gntshr