Our QA team reported an issue that "Destroying VMX with 4G memory may
make xend hang on IA-32e" with xenlinux complains:
Eeek! page_mapcount(page) went negative! (-1)
  page->flags = 14
  page->count = 0
  page->mapping = 0000000000000000
This bug is caused by a race condition in xenlinux exit_mmap:
void exit_mmap(struct mm_struct *mm)
{
	struct mmu_gather *tlb;
	struct vm_area_struct *vma = mm->mmap;
	unsigned long nr_accounted = 0;
	unsigned long end;
#ifdef arch_exit_mmap
	arch_exit_mmap(mm);
#endif
	lru_add_drain();
	flush_cache_mm(mm);
	tlb = tlb_gather_mmu(mm, 1);
	/* Don''t update_hiwater_rss(mm) here, do_exit already did */
	/* Use -1 here to ensure all VMAs in the mm are unmapped */
	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
	vm_unacct_memory(nr_accounted);
	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
	tlb_finish_mmu(tlb, 0, end);
...
here, arch_exit_mmap will then unpin the pgtable of QEMU-DM and put the
pages residing in this pgtable. This leads to the pages mapped by
xc_map_foreign_range are returned back to xen heap. If these pages are
allocated by DOM0 before unmap_vmas is executed, the bug jumps out and
bites us since it will fail the sanity check in zap_pte_range and
page_remove_rmap.
2 possible solutions are:
1) call arch_exit_mmap after unmap_vmas.
2) unmap foreign mapped pages before calling arch_exit_mmap, and then we
can do the normal cleanup jobs.
Any comments?
thanks
-Xin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
On 1 Aug 2006, at 10:21, Li, Xin B wrote:> 2 possible solutions are: > 1) call arch_exit_mmap after unmap_vmas. > 2) unmap foreign mapped pages before calling arch_exit_mmap, and then > we > can do the normal cleanup jobs.A longer term plan is to add an explicit flag to PTEs that are foreign mappings (and so are not refcounted). However, I think the easiest solution for now is to add a flag to mmu_context structure indicating whether this mm has ever had any foreign mappings (so you would set it in direct_remap_pfn_range()). If it has then we do not call mm_unpin() from _arch_exit_mmap() -- doing so is only an optimisation which we can simply not do for the very small number of mm''s that have foreign mappings. Do you mind creating a patch to do this? I can send you more details if you like. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Keir, Why not just reuse VM_FOREIGN(blktap uses this flag to represent foreign mapped vma) flag to unmap the foreign mapped pages before call arch_exit_mmap? This could be a simple alternative. _______________________________________________________ Best Regards, hanzhu Keir Fraser 写道:> > On 1 Aug 2006, at 10:21, Li, Xin B wrote: > >> 2 possible solutions are: >> 1) call arch_exit_mmap after unmap_vmas. >> 2) unmap foreign mapped pages before calling arch_exit_mmap, and then we >> can do the normal cleanup jobs. > > A longer term plan is to add an explicit flag to PTEs that are foreign > mappings (and so are not refcounted). However, I think the easiest > solution for now is to add a flag to mmu_context structure indicating > whether this mm has ever had any foreign mappings (so you would set it > in direct_remap_pfn_range()). If it has then we do not call mm_unpin() > from _arch_exit_mmap() -- doing so is only an optimisation which we > can simply not do for the very small number of mm''s that have foreign > mappings. > > Do you mind creating a patch to do this? I can send you more details > if you like. > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1 Aug 2006, at 11:24, hanzhu wrote:> Why not just reuse VM_FOREIGN(blktap uses this flag to represent > foreign mapped vma) flag to unmap the foreign mapped pages before call > arch_exit_mmap? This could be a simple alternative.VM_FOREIGN has its own semantics (e.g., need to fill in vma->vm_private_data appropriately). A mm-wide flag in mmu_context would be quite simple to add, and also entirely hidden in arch code. It''s then easily removed when we implement a solution based on software bits in PTEs. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Do you mind creating a patch to do this? I can send you more >details if you like.Sure, pls send more info on this. Thanks -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1 Aug 2006, at 11:39, Li, Xin B wrote:>> Do you mind creating a patch to do this? I can send you more >> details if you like. > > Sure, pls send more info on this.1. Add an ''int has_foreign_mappings'' to mmu_context structure 2. Ensure the field is initialised in init_new_context() (e.g., memset-zero the whole structure; already done on x86/64) 3. Set the field in direct_remap_pfn_range() 4. Check the field in _arch_exit_mmap() to avoid calling mm_unpin(). That''s it. Just needs testing. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ben Thomas
2006-Aug-01  13:02 UTC
Re: [Xen-devel] Re: A race condition in xenlinux exit_mmap
Xin, I''m attaching a patch that we''ve been using since late May/early June to address an "Eeek" issues. Since we applied the patch, we haven''t seen the issue. As this was some time ago, I cannot recall if this is the same problem that you''re seeing now. The patch wasn''t submitted, as it isn''t particularly clean. It''s one of the many "some day soon" patches that we need to get resubmitted after a bit more work. I attach it here, not because I believe that it is "the answer", but as a data point for you. -b On 8/1/06, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> > > On 1 Aug 2006, at 11:39, Li, Xin B wrote: > > >> Do you mind creating a patch to do this? I can send you more > >> details if you like. > > > > Sure, pls send more info on this. > > 1. Add an ''int has_foreign_mappings'' to mmu_context structure > 2. Ensure the field is initialised in init_new_context() (e.g., > memset-zero the whole structure; already done on x86/64) > 3. Set the field in direct_remap_pfn_range() > 4. Check the field in _arch_exit_mmap() to avoid calling mm_unpin(). > > That''s it. Just needs testing. > > -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This patch just fail the sanity check. It should fix the bug. However, it didn''t fix the root cause. I''m afraid Keir will not allow to add another VM_XXX flag. _______________________________________________________ Best Regards, hanzhu Ben Thomas 写道:> Xin, > > I''m attaching a patch that we''ve been using since late May/early June to > address an "Eeek" issues. Since we applied the patch, we haven''t seen the > issue. As this was some time ago, I cannot recall if this is the same > problem that you''re seeing now. The patch wasn''t submitted, as it isn''t > particularly clean. It''s one of the many "some day soon" patches that we > need to get resubmitted after a bit more work. I attach it here, not > because > I believe that it is "the answer", but as a data point for you. > > -b > > > On 8/1/06, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote: >> >> >> On 1 Aug 2006, at 11:39, Li, Xin B wrote: >> >> >> Do you mind creating a patch to do this? I can send you more >> >> details if you like. >> > >> > Sure, pls send more info on this. >> >> 1. Add an ''int has_foreign_mappings'' to mmu_context structure >> 2. Ensure the field is initialised in init_new_context() (e.g., >> memset-zero the whole structure; already done on x86/64) >> 3. Set the field in direct_remap_pfn_range() >> 4. Check the field in _arch_exit_mmap() to avoid calling mm_unpin(). >> >> That''s it. Just needs testing. >> >> -- Keir >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > ------------------------------------------------------------------------ > > # HG changeset patch > # User lively@dlively2 > # Node ID 2d63622421e93895f9f5c99d900e80e78943b0c6 > # Parent 0b79b3c194b17dcf4d7916e33628feb321cc1e05 > Robert''s workaround for the infamous Eeek! page_mapcount(page) went negative! bug. > > [Bug id:] 3914 > > [Reviewed By:] Ben & Dave L > > diff -r 0b79b3c194b1 -r 2d63622421e9 linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c > --- a/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c Sat Jun 10 13:23:11 2006 -0400 > +++ b/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c Tue Jun 13 11:05:44 2006 -0400 > @@ -244,7 +244,7 @@ static int privcmd_mmap(struct file * fi > static int privcmd_mmap(struct file * file, struct vm_area_struct * vma) > { > /* DONTCOPY is essential for Xen as copy_page_range is broken. */ > - vma->vm_flags |= VM_RESERVED | VM_IO | VM_DONTCOPY; > + vma->vm_flags |= VM_RESERVED | VM_IO | VM_DONTCOPY | VM_PRIVCMD; > > return 0; > } > diff -r 0b79b3c194b1 -r 2d63622421e9 linux-2.6-xen-sparse/include/linux/mm.h > --- a/linux-2.6-xen-sparse/include/linux/mm.h Sat Jun 10 13:23:11 2006 -0400 > +++ b/linux-2.6-xen-sparse/include/linux/mm.h Tue Jun 13 11:05:44 2006 -0400 > @@ -169,6 +169,7 @@ extern unsigned int kobjsize(const void > #ifdef CONFIG_XEN > #define VM_FOREIGN 0x04000000 /* Has pages belonging to another VM */ > #endif > +#define VM_PRIVCMD 0x08000000 /* Pages belong to privcmd mmap */ > > #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */ > #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS > diff -r 0b79b3c194b1 -r 2d63622421e9 linux-2.6-xen-sparse/mm/memory.c > --- a/linux-2.6-xen-sparse/mm/memory.c Sat Jun 10 13:23:11 2006 -0400 > +++ b/linux-2.6-xen-sparse/mm/memory.c Tue Jun 13 11:05:44 2006 -0400 > @@ -409,6 +409,10 @@ struct page *vm_normal_page(struct vm_ar > print_bad_pte(vma, pte, addr); > return NULL; > } > + > + /* This vma points to foreign pages */ > + if (vma->vm_flags & VM_PRIVCMD) > + return NULL; > > /* > * NOTE! We still have PageReserved() pages in the page >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ben Thomas
2006-Aug-01  13:23 UTC
Re: [Xen-devel] Re: A race condition in xenlinux exit_mmap
I agree. This is why we never submitted it, and I wanted you to understand that it was a "data point" and not an answer. regards, -b hanzhu wrote:> This patch just fail the sanity check. It should fix the bug. However, > it didn''t fix the root cause. > I''m afraid Keir will not allow to add another VM_XXX flag. > > _______________________________________________________ > Best Regards, > hanzhu > > > > Ben Thomas 写道: > >> Xin, >> >> I''m attaching a patch that we''ve been using since late May/early June to >> address an "Eeek" issues. Since we applied the patch, we haven''t seen the >> issue. As this was some time ago, I cannot recall if this is the same >> problem that you''re seeing now. The patch wasn''t submitted, as it isn''t >> particularly clean. It''s one of the many "some day soon" patches that we >> need to get resubmitted after a bit more work. I attach it here, not >> because >> I believe that it is "the answer", but as a data point for you. >> >> -b > /* >> * NOTE! We still have PageReserved() pages in the page-- ------------------------------------------------------------------------ Ben Thomas Virtual Iron Software bthomas@virtualiron.com Tower 1, Floor 2 978-849-1214 900 Chelmsford Street Lowell, MA 01851 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> 1. Add an ''int has_foreign_mappings'' to mmu_context structure > 2. Ensure the field is initialised in init_new_context() (e.g., >memset-zero the whole structure; already done on x86/64) > 3. Set the field in direct_remap_pfn_range() > 4. Check the field in _arch_exit_mmap() to avoid calling mm_unpin(). > >That''s it. Just needs testing. >Patch attached, and it passed on our patch test system. -Xin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin B
2006-Aug-01  13:53 UTC
RE: [Xen-devel] Re: A race condition in xenlinux exit_mmap
yes, elegant fix is not easy. -Xin ________________________________ From: bjthomas3@gmail.com [mailto:bjthomas3@gmail.com] On Behalf Of Ben Thomas Sent: 2006年8月1日 21:03 To: Li, Xin B Cc: hanzhu@sjtu.edu.cn; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] Re: A race condition in xenlinux exit_mmap Xin, I''m attaching a patch that we''ve been using since late May/early June to address an "Eeek" issues. Since we applied the patch, we haven''t seen the issue. As this was some time ago, I cannot recall if this is the same problem that you''re seeing now. The patch wasn''t submitted, as it isn''t particularly clean. It''s one of the many "some day soon" patches that we need to get resubmitted after a bit more work. I attach it here, not because I believe that it is "the answer", but as a data point for you. -b On 8/1/06, Keir Fraser <Keir.Fraser@cl.cam.ac.uk > wrote: On 1 Aug 2006, at 11:39, Li, Xin B wrote: >> Do you mind creating a patch to do this? I can send you more >> details if you like. > > Sure, pls send more info on this. 1. Add an ''int has_foreign_mappings'' to mmu_context structure 2. Ensure the field is initialised in init_new_context() (e.g., memset-zero the whole structure; already done on x86/64) 3. Set the field in direct_remap_pfn_range() 4. Check the field in _arch_exit_mmap() to avoid calling mm_unpin(). That''s it. Just needs testing. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 1 Aug 2006, at 14:46, Li, Xin B wrote:>> That''s it. Just needs testing. >> > > Patch attached, and it passed on our patch test system. >thanks! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>---a/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/mmu.h Sat Jul 29 14:05:59 2006 +0100>+++b/linux-2.6-xen-sparse/include/asm-x86_64/mach-xen/asm/mmu.h Tue Aug 01 20:32:25 2006 +0800>@@ -18,6 +18,7 @@ typedef struct { > #ifdef CONFIG_XEN > unsigned pinned:1; > struct list_head unpinned; >+ int has_foreign_mappings; > #endif > } mm_context_t;May I suggest using a bit field adjacent to the already existing one here? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel