Hi, The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux kernel) crash with the attached dmesg output. The program just shares a page from dom0 to dom0, then map the page, then unshare the page, and the unsharing makes the kernel crash. I ran into this issue while implementing a native OCaml vchan driver. I''m very much interested in advices/help. Cheers, Vincent --------------030509080808090003030006 Content-Type: text/x-csrc; name="libxc_gntshr_bug2.c" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="libxc_gntshr_bug2.c" #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <xenctrl.h> #include <sys/mman.h> int main(int argc, char** argv) { void* map_shr; void* map_tab; uint32_t ref; int ret; xc_gntshr *shr_h = xc_gntshr_open(NULL, 0); if (shr_h == NULL) { perror("xc_gntshr_open"); exit(EXIT_FAILURE); } xc_gnttab *tab_h = xc_gnttab_open(NULL, 0); if (tab_h == NULL) { perror("xc_gnttab_open"); exit(EXIT_FAILURE); } map_shr = xc_gntshr_share_pages(shr_h, 0, 1, &ref, 1); if (map_shr == NULL) { perror("xc_gntshr_share_pages"); exit(EXIT_FAILURE); } map_tab = xc_gnttab_map_grant_ref(tab_h, 0, ref, PROT_READ|PROT_WRITE); if (map_tab == NULL) { perror("xc_gnttab_map_grant_ref"); exit(EXIT_FAILURE); } /* Now we unshare the page */ ret = xc_gntshr_munmap(shr_h, map_shr, 1); if (ret != 0) { perror("xc_gntshr_munmap"); exit(EXIT_FAILURE); } /* At this point, the kernel should complain… */ return 0; } --------------030509080808090003030006 Content-Type: text/x-log; name="dmesg.log" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dmesg.log" [ 299.710029] FS: 00007fe69748f700(0000) GS:ffff88011ba40000(0000) knlGS:0000000000000000 [ 299.710029] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 299.710029] CR2: 00007fe696d78f30 CR3: 00000000c34fe000 CR4: 0000000000002660 [ 299.710029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 299.710029] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 299.876698] Process a.out (pid: 922, threadinfo ffff8800cc3c6000, task ffff8800c34829e0) [ 299.876698] Stack: [ 299.876698] ffff8800cc2dc5b0 ffff8800cc3c7d88 ffff88000251bc60 ffff88000251b980 [ 299.876698] ffff88000251b960 ffff88000251b990 ffff8800c34829e0 ffff8800cc3c7dd8 [ 299.876698] ffffffffa03e847f ffff88000251b990 ffff880114d50a80 0000000000000000 [ 299.876698] Call Trace: [ 299.876698] [<ffffffffa03e847f>] ? mn_release+0x4f/0x130 [xen_gntdev] [ 299.876698] [<ffffffff8116b0c4>] ? __mmu_notifier_release+0x44/0xc0 [ 299.876698] [<ffffffff81153d09>] ? exit_mmap+0x149/0x170 [ 299.876698] [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50 [ 299.876698] [<ffffffff810b5c3a>] ? exit_robust_list+0x6a/0x130 [ 299.876698] [<ffffffff81055209>] ? mmput+0x59/0x120 [ 299.876698] [<ffffffff8105d97f>] ? do_exit+0x27f/0xab0 [ 299.876698] [<ffffffff81152b90>] ? do_munmap+0x2b0/0x3e0 [ 299.876698] [<ffffffff8105e22f>] ? do_group_exit+0x3f/0xa0 [ 299.876698] [<ffffffff8105e2a4>] ? sys_exit_group+0x14/0x20 [ 299.876698] [<ffffffff814da89d>] ? system_call_fastpath+0x1a/0x1f [ 299.876698] Code: 00 00 00 d8 02 3c cc 00 88 ff ff ff ff ff ff ff ff ff ff 60 7d 3c cc 00 88 ff ff 30 e0 00 00 00 00 00 00 82 02 01 00 00 00 00 00 <70> 7d 3c cc 00 88 ff ff 2b e0 00 00 00 00 00 00 b0 c5 2d cc 00 [ 299.876698] RIP [<ffff8800cc3c7d60>] 0xffff8800cc3c7d5f [ 299.876698] RSP <ffff8800cc3c7d70> [ 299.964961] ---[ end trace 2cc41b9c64237359 ]--- [ 299.964962] Fixing recursive fault but reboot is needed! [ 299.964963] BUG: scheduling while atomic: a.out/922/0x00000002 [ 299.964985] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_analog snd_hda_intel snd_hda_codec iTCO_wdt gpio_ich iTCO_vendor_support ppdev evdev dcdbas radeon mperf psmouse tg3 coretemp microcode serio_ raw pcspkr snd_hwdep snd_pcm ttm snd_page_alloc snd_timer drm_kms_helper i2c_i801 snd x38_edac edac_core ptp pps_core lpc_ich libphy drm i2c_algo_bit i2c_core soundcore parport_pc parport button processor xenf s xen_privcmd xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn nfs lockd sunrpc fscache ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom sd_mod ahci libahci libata scsi_mod ehc i_pci uhci_hcd ehci_hcd usbcore usb_common [ 299.964987] Pid: 922, comm: a.out Tainted: G B D 3.9.9-1-ARCH #1 [ 299.964987] Call Trace: [ 299.964991] [<ffffffff814cabcb>] __schedule_bug+0x4d/0x5b [ 299.964994] [<ffffffff814d1ae6>] __schedule+0x936/0x940 [ 299.964997] [<ffffffff81059a29>] ? console_trylock+0x19/0x70 [ 299.964999] [<ffffffff814d2c86>] ? _raw_spin_unlock+0x36/0x40 [ 299.965002] [<ffffffff8105a3c6>] ? vprintk_emit+0x176/0x4c0 [ 299.965004] [<ffffffff814ca7ff>] ? printk+0x54/0x56 [ 299.965007] [<ffffffff814d1b19>] schedule+0x29/0x70 [ 299.965009] [<ffffffff8105e129>] do_exit+0xa29/0xab0 [ 299.965012] [<ffffffff8105b731>] ? kmsg_dump+0xc1/0xd0 [ 299.965015] [<ffffffff814d42c3>] oops_end+0xa3/0xe0 [ 299.965019] [<ffffffff81018deb>] die+0x4b/0x70 [ 299.965021] [<ffffffff814d3be0>] do_trap+0x60/0x170 [ 299.965024] [<ffffffff810163d5>] do_invalid_op+0x95/0xb0 [ 299.965027] [<ffffffff810085ec>] ? xen_batched_set_pte+0xdc/0x200 [ 299.965030] [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50 [ 299.965032] [<ffffffff814d2ca2>] ? _raw_spin_unlock_irqrestore+0x12/0x50 [ 299.965035] [<ffffffff814dbb1e>] invalid_op+0x1e/0x30 [ 299.965038] [<ffffffffa03e847f>] ? mn_release+0x4f/0x130 [xen_gntdev] [ 299.965042] [<ffffffff8116b0c4>] ? __mmu_notifier_release+0x44/0xc0 [ 299.965045] [<ffffffff81153d09>] ? exit_mmap+0x149/0x170 [ 299.965047] [<ffffffff814d2a8a>] ? _raw_spin_lock_irqsave+0x1a/0x50 [ 299.965050] [<ffffffff810b5c3a>] ? exit_robust_list+0x6a/0x130 [ 299.965055] [<ffffffff81055209>] ? mmput+0x59/0x120 [ 299.965057] [<ffffffff8105d97f>] ? do_exit+0x27f/0xab0 [ 299.965060] [<ffffffff81152b90>] ? do_munmap+0x2b0/0x3e0 [ 299.965062] [<ffffffff8105e22f>] ? do_group_exit+0x3f/0xa0 [ 299.965065] [<ffffffff8105e2a4>] ? sys_exit_group+0x14/0x20 [ 299.965067] [<ffffffff814da89d>] ? system_call_fastpath+0x1a/0x1f --------------030509080808090003030006 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------030509080808090003030006--
On Tue, 2013-07-30 at 11:50 +0100, Vincent Bernardoff wrote:> Hi, > > The attached program makes my kernel (3.9.9-1-ARCH, stock Archlinux > kernel) crash with the attached dmesg output.The dmesg output seems to start halfway through a crash message, which means it is missing the PC etc and may not be the first crash in any case. Please could you configure a serial console and try and capture the first crash message in its entirety. Bonus points if you can avoid linewrapping the dmesg too ;-)> The program just shares a page from dom0 to dom0,Not just from dom0 to dom0 but actually within the same process. I''m not sure that matters but it is a bit unusual. Are you able to repro this with two separate processes acting as front vs. backend? The reason I ask is that it isn''t clear if the crash is the process with its front or back "hat" on, separating the two out would be useful.> then map the page, > then unshare the page, and the unsharing makes the kernel crash. I ran > into this issue while implementing a native OCaml vchan driver. > > I''m very much interested in advices/help. > > Cheers, > > Vincent > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Vincent Bernardoff
2013-Jul-30 13:41 UTC
Re: Crashing kernel with dom0/libxc gnttab/gntshr
Vincent Bernardoff
2013-Jul-30 15:50 UTC
Re: Crashing kernel with dom0/libxc gnttab/gntshr
I also have a bug using tools/libvchan/vchan-node1: When killing the server node (sudo ./vchan-node1 server read 0 /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client write 0 /local/domain/0/vchan), the following dmesg error appears. I''m using Xen unstable (master branch) and stock Archlinux 3.10.3-1-ARCH kernel. Use the following script (setup.sh) if you want to try reproducing it with vchan-node1, vchan-node1 indeed needs some xenstore keys to be written in order to work correctly. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Adding Daniel who maintains vchan and I think the kernel side of the driver in question too to the CC. On Tue, 2013-07-30 at 16:50 +0100, Vincent Bernardoff wrote:> I also have a bug using tools/libvchan/vchan-node1: > > When killing the server node (sudo ./vchan-node1 server read 0 > /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client > write 0 /local/domain/0/vchan), the following dmesg error appears. > > I''m using Xen unstable (master branch) and stock Archlinux 3.10.3-1-ARCH > kernel. > > Use the following script (setup.sh) if you want to try reproducing it > with vchan-node1, vchan-node1 indeed needs some xenstore keys to be > written in order to work correctly. > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 30/07/13 16:50, Vincent Bernardoff wrote:> I also have a bug using tools/libvchan/vchan-node1: > > When killing the server node (sudo ./vchan-node1 server read 0 > /local/domain/0/vchan) before the client node (sudo ./vchan-node1 client > write 0 /local/domain/0/vchan), the following dmesg error appears.Does this only happen if both client and server are in the same domain? Have you tested it using two domains? Did it work?> I''m using Xen unstable (master branch) and stock Archlinux 3.10.3-1-ARCH > kernel. > > Use the following script (setup.sh) if you want to try reproducing it > with vchan-node1, vchan-node1 indeed needs some xenstore keys to be > written in order to work correctly.[ 902.729307] BUG: Bad page map in process vchan-node1 pte:12bfff167 pmd:b9b5c067 [ 902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping: (null) index:0xffffffffffffffff I think this is the test for page_mapcount(page) < 0 in zap_pte_range(). This has looked up the page using the PTE it is trying to clear. Has it found the correct page? Since the MFN is currently mapped into the same domain, has the m2p_override stuff confused the look up and it is checking the grantee page not the granter? David
On 07/30/2013 12:58 PM, David Vrabel wrote: [...]> > [ 902.729307] BUG: Bad page map in process vchan-node1 pte:12bfff167 > pmd:b9b5c067 > [ 902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping: > (null) index:0xffffffffffffffff > > I think this is the test for page_mapcount(page) < 0 in zap_pte_range(). > This has looked up the page using the PTE it is trying to clear. Has > it found the correct page? Since the MFN is currently mapped into the > same domain, has the m2p_override stuff confused the look up and it is > checking the grantee page not the granter? > > DavidI think something like this is happening, since while reproducing this on my test system, some linked list corruption was found that I believe to be the cause of this problem. The gnttab_map_refs function on PV uses m2p_add_override on the page, which threads page->lru to an m2p_overrides list. However, something else is using page->lru during the use of gntdev, as shown by the following debug patch: diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 3c8803f..198e57e 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map) if (err) return err; + printk("map page0 lru: %p prev=%p:%p next=%p:%p\n", + &map->pages[0]->lru, + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); + for (i = 0; i < map->count; i++) { if (map->map_ops[i].status) err = -EINVAL; @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages) } } + printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n", + &map->pages[0]->lru, + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); err = gnttab_unmap_refs(map->unmap_ops + offset, use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset, pages); Output: [ 88.610644] map page0 lru: ffffea0001dee160 prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160 [ 88.611515] BUG: Bad page map in process a.out pte:8000000077b85167 pmd:2541a067 [ 88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping: (null) index:0xffffffffffffffff [ 88.611532] page flags: 0x1000000000000814(referenced|dirty|private) [ 88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma: (null) mapping:ffff8800692974a0 index:0 [ 88.611547] vma->vm_ops->fault: (null) [ 88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 [...backtrace cropped...] [ 88.614301] unmap page0 lru: ffffea0001dee160 prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938 The initial map is a linked list with only that element, so the address 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being found by zap_pte_range is not a valid struct page. The struct page* being used by the gntalloc device was 0xffffea0000952740, for reference; it''s not a direct collision between the page used by the gntdev and gntalloc devices. Not sure what the best fix is for this at the moment. -- Daniel De Graaf National Security Agency
Stefano Stabellini
2013-Aug-02 13:50 UTC
Re: Crashing kernel with dom0/libxc gnttab/gntshr
On Tue, 30 Jul 2013, Daniel De Graaf wrote:> On 07/30/2013 12:58 PM, David Vrabel wrote: > [...] > > > > [ 902.729307] BUG: Bad page map in process vchan-node1 pte:12bfff167 > > pmd:b9b5c067 > > [ 902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping: > > (null) index:0xffffffffffffffff > > > > I think this is the test for page_mapcount(page) < 0 in zap_pte_range(). > > This has looked up the page using the PTE it is trying to clear. Has > > it found the correct page? Since the MFN is currently mapped into the > > same domain, has the m2p_override stuff confused the look up and it is > > checking the grantee page not the granter? > > > > David > > I think something like this is happening, since while reproducing this > on my test system, some linked list corruption was found that I believe > to be the cause of this problem. The gnttab_map_refs function on PV uses > m2p_add_override on the page, which threads page->lru to an > m2p_overrides list. However, something else is using page->lru during > the use of gntdev, as shown by the following debug patch:I have never managed to prove that something else is trying to use page->lru while the m2p_override is using it. Jeremy, at the time the code was written, you were pretty confident that page->lru couldn''t be used by anybody else. Why was that?> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index 3c8803f..198e57e 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map) > if (err) > return err; > + printk("map page0 lru: %p prev=%p:%p next=%p:%p\n", > + &map->pages[0]->lru, > + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, > + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); > + > for (i = 0; i < map->count; i++) { > if (map->map_ops[i].status) > err = -EINVAL; > @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int > offset, int pages) > } > } > + printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n", > + &map->pages[0]->lru, > + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, > + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); > err = gnttab_unmap_refs(map->unmap_ops + offset, > use_ptemod ? map->kmap_ops + offset : NULL, map->pages > + offset, > pages); > > Output: > [ 88.610644] map page0 lru: ffffea0001dee160 > prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160 > [ 88.611515] BUG: Bad page map in process a.out pte:8000000077b85167 > pmd:2541a067 > [ 88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping: > (null) index:0xffffffffffffffff > [ 88.611532] page flags: 0x1000000000000814(referenced|dirty|private) > [ 88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma: > (null) mapping:ffff8800692974a0 index:0 > [ 88.611547] vma->vm_ops->fault: (null) > [ 88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 > [...backtrace cropped...] > [ 88.614301] unmap page0 lru: ffffea0001dee160 > prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938 > > The initial map is a linked list with only that element, so the address > 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being > found by zap_pte_range is not a valid struct page. > > The struct page* being used by the gntalloc device was 0xffffea0000952740, > for reference; it''s not a direct collision between the page used by the > gntdev and gntalloc devices. > > Not sure what the best fix is for this at the moment. > > -- > Daniel De Graaf > National Security Agency >
On Fri, 2013-08-02 at 14:50 +0100, Stefano Stabellini wrote:> On Tue, 30 Jul 2013, Daniel De Graaf wrote: > > On 07/30/2013 12:58 PM, David Vrabel wrote: > > [...] > > > > > > [ 902.729307] BUG: Bad page map in process vchan-node1 pte:12bfff167 > > > pmd:b9b5c067 > > > [ 902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping: > > > (null) index:0xffffffffffffffff > > > > > > I think this is the test for page_mapcount(page) < 0 in zap_pte_range(). > > > This has looked up the page using the PTE it is trying to clear. Has > > > it found the correct page? Since the MFN is currently mapped into the > > > same domain, has the m2p_override stuff confused the look up and it is > > > checking the grantee page not the granter? > > > > > > David > > > > I think something like this is happening, since while reproducing this > > on my test system, some linked list corruption was found that I believe > > to be the cause of this problem. The gnttab_map_refs function on PV uses > > m2p_add_override on the page, which threads page->lru to an > > m2p_overrides list. However, something else is using page->lru during > > the use of gntdev, as shown by the following debug patch: > > I have never managed to prove that something else is trying to use > page->lru while the m2p_override is using it.Isn''t it very much dependent on the actual original owner of the page? A lot of these fields are free to use by the code which actually called alloc_page, but for a facility like the m2p_override which can consume pages from a variety of sources you''d need to be careful about what each of those callers was doing. Ian.
Jeremy Fitzhardinge
2013-Aug-02 16:49 UTC
Re: Crashing kernel with dom0/libxc gnttab/gntshr
On 08/02/2013 06:50 AM, Stefano Stabellini wrote:> On Tue, 30 Jul 2013, Daniel De Graaf wrote: >> On 07/30/2013 12:58 PM, David Vrabel wrote: >> [...] >>> [ 902.729307] BUG: Bad page map in process vchan-node1 pte:12bfff167 >>> pmd:b9b5c067 >>> [ 902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping: >>> (null) index:0xffffffffffffffff >>> >>> I think this is the test for page_mapcount(page) < 0 in zap_pte_range(). >>> This has looked up the page using the PTE it is trying to clear. Has >>> it found the correct page? Since the MFN is currently mapped into the >>> same domain, has the m2p_override stuff confused the look up and it is >>> checking the grantee page not the granter? >>> >>> David >> I think something like this is happening, since while reproducing this >> on my test system, some linked list corruption was found that I believe >> to be the cause of this problem. The gnttab_map_refs function on PV uses >> m2p_add_override on the page, which threads page->lru to an >> m2p_overrides list. However, something else is using page->lru during >> the use of gntdev, as shown by the following debug patch: > I have never managed to prove that something else is trying to use > page->lru while the m2p_override is using it. > > Jeremy, at the time the code was written, you were pretty confident > that page->lru couldn''t be used by anybody else. > Why was that?Hm. Probably the reasoning was that page->lru was only used for pages which in the pagecache, mapped from files, and m2p pages are never mapped from files. But maybe something else has decided to use lru for non-mapped pages (transparent hugepage? page dedup?), or are m2p pages getting into the pagecache somehow? J> > > >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index 3c8803f..198e57e 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map) >> if (err) >> return err; >> + printk("map page0 lru: %p prev=%p:%p next=%p:%p\n", >> + &map->pages[0]->lru, >> + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, >> + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); >> + >> for (i = 0; i < map->count; i++) { >> if (map->map_ops[i].status) >> err = -EINVAL; >> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int >> offset, int pages) >> } >> } >> + printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n", >> + &map->pages[0]->lru, >> + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, >> + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); >> err = gnttab_unmap_refs(map->unmap_ops + offset, >> use_ptemod ? map->kmap_ops + offset : NULL, map->pages >> + offset, >> pages); >> >> Output: >> [ 88.610644] map page0 lru: ffffea0001dee160 >> prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160 >> [ 88.611515] BUG: Bad page map in process a.out pte:8000000077b85167 >> pmd:2541a067 >> [ 88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping: >> (null) index:0xffffffffffffffff >> [ 88.611532] page flags: 0x1000000000000814(referenced|dirty|private) >> [ 88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma: >> (null) mapping:ffff8800692974a0 index:0 >> [ 88.611547] vma->vm_ops->fault: (null) >> [ 88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 >> [...backtrace cropped...] >> [ 88.614301] unmap page0 lru: ffffea0001dee160 >> prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938 >> >> The initial map is a linked list with only that element, so the address >> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being >> found by zap_pte_range is not a valid struct page. >> >> The struct page* being used by the gntalloc device was 0xffffea0000952740, >> for reference; it''s not a direct collision between the page used by the >> gntdev and gntalloc devices. >> >> Not sure what the best fix is for this at the moment. >> >> -- >> Daniel De Graaf >> National Security Agency >>
Stefano Stabellini
2013-Aug-02 17:02 UTC
Re: Crashing kernel with dom0/libxc gnttab/gntshr
On Fri, 2 Aug 2013, Jeremy Fitzhardinge wrote:> On 08/02/2013 06:50 AM, Stefano Stabellini wrote: > > On Tue, 30 Jul 2013, Daniel De Graaf wrote: > >> On 07/30/2013 12:58 PM, David Vrabel wrote: > >> [...] > >>> [ 902.729307] BUG: Bad page map in process vchan-node1 pte:12bfff167 > >>> pmd:b9b5c067 > >>> [ 902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping: > >>> (null) index:0xffffffffffffffff > >>> > >>> I think this is the test for page_mapcount(page) < 0 in zap_pte_range(). > >>> This has looked up the page using the PTE it is trying to clear. Has > >>> it found the correct page? Since the MFN is currently mapped into the > >>> same domain, has the m2p_override stuff confused the look up and it is > >>> checking the grantee page not the granter? > >>> > >>> David > >> I think something like this is happening, since while reproducing this > >> on my test system, some linked list corruption was found that I believe > >> to be the cause of this problem. The gnttab_map_refs function on PV uses > >> m2p_add_override on the page, which threads page->lru to an > >> m2p_overrides list. However, something else is using page->lru during > >> the use of gntdev, as shown by the following debug patch: > > I have never managed to prove that something else is trying to use > > page->lru while the m2p_override is using it. > > > > Jeremy, at the time the code was written, you were pretty confident > > that page->lru couldn''t be used by anybody else. > > Why was that? > > Hm. Probably the reasoning was that page->lru was only used for pages > which in the pagecache, mapped from files, and m2p pages are never > mapped from files. But maybe something else has decided to use lru for > non-mapped pages (transparent hugepage? page dedup?), or are m2p pages > getting into the pagecache somehow? >I think it could be the latter. For example we have recently changed QEMU not to use O_DIRECT on foreign grants to work around a network bug in the kernel. It might be possible that these pages end up in the pagecache after they have been already added to the m2p.> > > > > > > >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > >> index 3c8803f..198e57e 100644 > >> --- a/drivers/xen/gntdev.c > >> +++ b/drivers/xen/gntdev.c > >> @@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map) > >> if (err) > >> return err; > >> + printk("map page0 lru: %p prev=%p:%p next=%p:%p\n", > >> + &map->pages[0]->lru, > >> + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, > >> + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); > >> + > >> for (i = 0; i < map->count; i++) { > >> if (map->map_ops[i].status) > >> err = -EINVAL; > >> @@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int > >> offset, int pages) > >> } > >> } > >> + printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n", > >> + &map->pages[0]->lru, > >> + map->pages[0]->lru.prev, map->pages[0]->lru.prev->next, > >> + map->pages[0]->lru.next, map->pages[0]->lru.next->prev); > >> err = gnttab_unmap_refs(map->unmap_ops + offset, > >> use_ptemod ? map->kmap_ops + offset : NULL, map->pages > >> + offset, > >> pages); > >> > >> Output: > >> [ 88.610644] map page0 lru: ffffea0001dee160 > >> prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160 > >> [ 88.611515] BUG: Bad page map in process a.out pte:8000000077b85167 > >> pmd:2541a067 > >> [ 88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping: > >> (null) index:0xffffffffffffffff > >> [ 88.611532] page flags: 0x1000000000000814(referenced|dirty|private) > >> [ 88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma: > >> (null) mapping:ffff8800692974a0 index:0 > >> [ 88.611547] vma->vm_ops->fault: (null) > >> [ 88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0 > >> [...backtrace cropped...] > >> [ 88.614301] unmap page0 lru: ffffea0001dee160 > >> prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938 > >> > >> The initial map is a linked list with only that element, so the address > >> 0xffffffff82f2d510 is the m2p_overrides entry. This means the page being > >> found by zap_pte_range is not a valid struct page. > >> > >> The struct page* being used by the gntalloc device was 0xffffea0000952740, > >> for reference; it''s not a direct collision between the page used by the > >> gntdev and gntalloc devices. > >> > >> Not sure what the best fix is for this at the moment. > >> > >> -- > >> Daniel De Graaf > >> National Security Agency > >> >
On Fri, 2013-08-02 at 18:02 +0100, Stefano Stabellini wrote:> On Fri, 2 Aug 2013, Jeremy Fitzhardinge wrote: > > On 08/02/2013 06:50 AM, Stefano Stabellini wrote: > > > Jeremy, at the time the code was written, you were pretty confident > > > that page->lru couldn''t be used by anybody else. > > > Why was that? > > > > Hm. Probably the reasoning was that page->lru was only used for pages > > which in the pagecache, mapped from files, and m2p pages are never > > mapped from files. But maybe something else has decided to use lru for > > non-mapped pages (transparent hugepage? page dedup?), or are m2p pages > > getting into the pagecache somehow? > > > > I think it could be the latter. > For example we have recently changed QEMU not to use O_DIRECT on foreign > grants to work around a network bug in the kernel. > It might be possible that these pages end up in the pagecache after they > have been already added to the m2p.Vincent''s test programs (one posted at the root of this thread and another a multiprocess version a few mails in) doesn''t do any explicit I/O on the shared pages at all, it literally doesn''t touch them. The test program is: allocate share map unmap crash The second version moves the map/unmap/crash into a separate process (achieved with fork). I suppose it might still be interesting to split into two completely separate executables to check for weird cross talk between share and map in related (i.e. parent-child) processes. I hope the gntshr interface locks pages down so that we aren''t worrying about swapping etc, but this doesn''t appear to be at all probabilistic in any case. Ian.