Konrad Rzeszutek Wilk
2011-Oct-25 20:07 UTC
[Xen-devel] [PATCH] MMU bug-fixes in generic code that are mostly used by Xen.
I was wondering if you could help. I''ve this bug-fix: [PATCH 1/3] x86/paravirt: PTE updates in k(un)map_atomic need to be that you picked up some time ago in your tree and then dropped. I am not sure why it was dropped but perhaps it is b/c I also had that patch in my linux-next and your tool decided to drop it. Anyhow, was wondering if you would be OK giving it your Ack or just pulling it in your tree for 3.2. These two: [PATCH 2/3] xen: use generic functions instead of xen_{alloc, [PATCH 3/3] xen: map foreign pages for shared rings by updating the remove what git commit d2fe97c3315a6a406540f74651e7430d9d51e671 Author: David Vrabel <david.vrabel@citrix.com> Date: Thu Sep 29 16:53:32 2011 +0100 xen: map foreign pages for shared rings by updating the PTEs directly added in 3.1 with a more selective way instead of using the big hammer. I was wondering if you would be OK ACK-ing those two or sticking them in your tree for 3.2. All of the patches are in: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-akpm-3.2 and the last two have also been sitting for some time in linux-next (and the first one was there but it looks like I forgot to put it back there as I had thought that Linus had picked it up.. argh). arch/ia64/include/asm/xen/grant_table.h | 29 -------------- arch/ia64/xen/grant-table.c | 62 ------------------------------- arch/x86/include/asm/xen/grant_table.h | 7 --- arch/x86/mm/highmem_32.c | 2 + arch/x86/xen/grant-table.c | 2 +- drivers/xen/xenbus/xenbus_client.c | 15 +++++-- include/linux/vmalloc.h | 2 +- include/xen/grant_table.h | 1 - mm/vmalloc.c | 27 ++++++------- 9 files changed, 27 insertions(+), 120 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 20:07 UTC
[Xen-devel] [PATCH 1/3] x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode.
This patch fixes an outstanding issue that has been reported since 2.6.37. Under a heavy loaded machine processing "fork()" calls could keepover with: BUG: unable to handle kernel paging request at f573fc8c IP: [<c01abc54>] swap_count_continued+0x104/0x180 *pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1 EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3 EIP is at swap_count_continued+0x104/0x180 .. snip.. Call Trace: [<c01ac222>] ? __swap_duplicate+0xc2/0x160 [<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0 [<c01ac2e4>] ? swap_duplicate+0x14/0x40 [<c01a0a6b>] ? copy_pte_range+0x45b/0x500 [<c01a0ca5>] ? copy_page_range+0x195/0x200 [<c01328c6>] ? dup_mmap+0x1c6/0x2c0 [<c0132cf8>] ? dup_mm+0xa8/0x130 [<c013376a>] ? copy_process+0x98a/0xb30 [<c013395f>] ? do_fork+0x4f/0x280 [<c01573b3>] ? getnstimeofday+0x43/0x100 [<c010f770>] ? sys_clone+0x30/0x40 [<c06c048d>] ? ptregs_clone+0x15/0x48 [<c06bfb71>] ? syscall_call+0x7/0xb The problem is that in copy_page_range we turn lazy mode on, and then in swap_entry_free we call swap_count_continued which ends up in: map = kmap_atomic(page, KM_USER0) + offset; and then later we touch *map. Since we are running in batched mode (lazy) we don''t actually set up the PTE mappings and the kmap_atomic is not done synchronously and ends up trying to dereference a page that has not been set. Looking at kmap_atomic_prot_pfn, it uses ''arch_flush_lazy_mmu_mode'' and doing the same in kmap_atomic_prot and __kunmap_atomic makes the problem go away. Interestingly, git commit b8bcfe997e46150fedcc3f5b26b846400122fdd9 removed part of this to fix an interrupt issue - but it went to far and did not consider this scenario. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: x86@kernel.org CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: stable@kernel.org [v1: Redid the commit description per Jeremy''s apt suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/mm/highmem_32.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c index b499626..f4f29b1 100644 --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -45,6 +45,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot) vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); BUG_ON(!pte_none(*(kmap_pte-idx))); set_pte(kmap_pte-idx, mk_pte(page, prot)); + arch_flush_lazy_mmu_mode(); return (void *)vaddr; } @@ -88,6 +89,7 @@ void __kunmap_atomic(void *kvaddr) */ kpte_clear_flush(kmap_pte-idx, vaddr); kmap_atomic_idx_pop(); + arch_flush_lazy_mmu_mode(); } #ifdef CONFIG_DEBUG_HIGHMEM else { -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 20:07 UTC
[Xen-devel] [PATCH 2/3] xen: use generic functions instead of xen_{alloc, free}_vm_area()
From: David Vrabel <david.vrabel@citrix.com> Replace calls to the Xen-specific xen_alloc_vm_area() and xen_free_vm_area() functions with the generic equivalent (alloc_vm_area() and free_vm_area()). On x86, these were identical already. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/ia64/include/asm/xen/grant_table.h | 29 -------------- arch/ia64/xen/grant-table.c | 62 ------------------------------- arch/x86/include/asm/xen/grant_table.h | 7 --- arch/x86/xen/grant-table.c | 2 +- drivers/xen/xenbus/xenbus_client.c | 6 +- include/xen/grant_table.h | 1 - 6 files changed, 4 insertions(+), 103 deletions(-) delete mode 100644 arch/ia64/include/asm/xen/grant_table.h delete mode 100644 arch/x86/include/asm/xen/grant_table.h diff --git a/arch/ia64/include/asm/xen/grant_table.h b/arch/ia64/include/asm/xen/grant_table.h deleted file mode 100644 index 2b1fae0..0000000 --- a/arch/ia64/include/asm/xen/grant_table.h +++ /dev/null @@ -1,29 +0,0 @@ -/****************************************************************************** - * arch/ia64/include/asm/xen/grant_table.h - * - * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> - * VA Linux Systems Japan K.K. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - */ - -#ifndef _ASM_IA64_XEN_GRANT_TABLE_H -#define _ASM_IA64_XEN_GRANT_TABLE_H - -struct vm_struct *xen_alloc_vm_area(unsigned long size); -void xen_free_vm_area(struct vm_struct *area); - -#endif /* _ASM_IA64_XEN_GRANT_TABLE_H */ diff --git a/arch/ia64/xen/grant-table.c b/arch/ia64/xen/grant-table.c index 48cca37..c182813 100644 --- a/arch/ia64/xen/grant-table.c +++ b/arch/ia64/xen/grant-table.c @@ -31,68 +31,6 @@ #include <asm/xen/hypervisor.h> -struct vm_struct *xen_alloc_vm_area(unsigned long size) -{ - int order; - unsigned long virt; - unsigned long nr_pages; - struct vm_struct *area; - - order = get_order(size); - virt = __get_free_pages(GFP_KERNEL, order); - if (virt == 0) - goto err0; - nr_pages = 1 << order; - scrub_pages(virt, nr_pages); - - area = kmalloc(sizeof(*area), GFP_KERNEL); - if (area == NULL) - goto err1; - - area->flags = VM_IOREMAP; - area->addr = (void *)virt; - area->size = size; - area->pages = NULL; - area->nr_pages = nr_pages; - area->phys_addr = 0; /* xenbus_map_ring_valloc uses this field! */ - - return area; - -err1: - free_pages(virt, order); -err0: - return NULL; -} -EXPORT_SYMBOL_GPL(xen_alloc_vm_area); - -void xen_free_vm_area(struct vm_struct *area) -{ - unsigned int order = get_order(area->size); - unsigned long i; - unsigned long phys_addr = __pa(area->addr); - - /* This area is used for foreign page mappping. - * So underlying machine page may not be assigned. */ - for (i = 0; i < (1 << order); i++) { - unsigned long ret; - unsigned long gpfn = (phys_addr >> PAGE_SHIFT) + i; - struct xen_memory_reservation reservation = { - .nr_extents = 1, - .address_bits = 0, - .extent_order = 0, - .domid = DOMID_SELF - }; - set_xen_guest_handle(reservation.extent_start, &gpfn); - ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, - &reservation); - BUG_ON(ret != 1); - } - free_pages((unsigned long)area->addr, order); - kfree(area); -} -EXPORT_SYMBOL_GPL(xen_free_vm_area); - - /**************************************************************************** * grant table hack * cmd: GNTTABOP_xxx diff --git a/arch/x86/include/asm/xen/grant_table.h b/arch/x86/include/asm/xen/grant_table.h deleted file mode 100644 index fdbbb45..0000000 --- a/arch/x86/include/asm/xen/grant_table.h +++ /dev/null @@ -1,7 +0,0 @@ -#ifndef _ASM_X86_XEN_GRANT_TABLE_H -#define _ASM_X86_XEN_GRANT_TABLE_H - -#define xen_alloc_vm_area(size) alloc_vm_area(size) -#define xen_free_vm_area(area) free_vm_area(area) - -#endif /* _ASM_X86_XEN_GRANT_TABLE_H */ diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c index 49ba9b5..6bbfd7a 100644 --- a/arch/x86/xen/grant-table.c +++ b/arch/x86/xen/grant-table.c @@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes, if (shared == NULL) { struct vm_struct *area - xen_alloc_vm_area(PAGE_SIZE * max_nr_gframes); + alloc_vm_area(PAGE_SIZE * max_nr_gframes); BUG_ON(area == NULL); shared = area->addr; *__shared = shared; diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index cdacf92..229d3ad 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -443,7 +443,7 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) *vaddr = NULL; - area = xen_alloc_vm_area(PAGE_SIZE); + area = alloc_vm_area(PAGE_SIZE); if (!area) return -ENOMEM; @@ -453,7 +453,7 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) BUG(); if (op.status != GNTST_okay) { - xen_free_vm_area(area); + free_vm_area(area); xenbus_dev_fatal(dev, op.status, "mapping in shared page %d from domain %d", gnt_ref, dev->otherend_id); @@ -552,7 +552,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) BUG(); if (op.status == GNTST_okay) - xen_free_vm_area(area); + free_vm_area(area); else xenbus_dev_error(dev, op.status, "unmapping page at handle %d error %d", diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index b1fab6b..8a8bb76 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -43,7 +43,6 @@ #include <xen/interface/grant_table.h> #include <asm/xen/hypervisor.h> -#include <asm/xen/grant_table.h> #include <xen/features.h> -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-25 20:07 UTC
[Xen-devel] [PATCH 3/3] xen: map foreign pages for shared rings by updating the PTEs directly
From: David Vrabel <david.vrabel@citrix.com> When mapping a foreign page with xenbus_map_ring_valloc() with the GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and pass a pointer to the PTE (in init_mm). After the page is mapped, the usual fault mechanism can be used to update additional MMs. This allows the vmalloc_sync_all() to be removed from alloc_vm_area(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/grant-table.c | 2 +- drivers/xen/xenbus/xenbus_client.c | 11 ++++++++--- include/linux/vmalloc.h | 2 +- mm/vmalloc.c | 27 +++++++++++++-------------- 4 files changed, 23 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c index 6bbfd7a..5a40d24 100644 --- a/arch/x86/xen/grant-table.c +++ b/arch/x86/xen/grant-table.c @@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes, if (shared == NULL) { struct vm_struct *area - alloc_vm_area(PAGE_SIZE * max_nr_gframes); + alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL); BUG_ON(area == NULL); shared = area->addr; *__shared = shared; diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 229d3ad..52bc57f 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -34,6 +34,7 @@ #include <linux/types.h> #include <linux/vmalloc.h> #include <asm/xen/hypervisor.h> +#include <asm/xen/page.h> #include <xen/interface/xen.h> #include <xen/interface/event_channel.h> #include <xen/events.h> @@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn); int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) { struct gnttab_map_grant_ref op = { - .flags = GNTMAP_host_map, + .flags = GNTMAP_host_map | GNTMAP_contains_pte, .ref = gnt_ref, .dom = dev->otherend_id, }; struct vm_struct *area; + pte_t *pte; *vaddr = NULL; - area = alloc_vm_area(PAGE_SIZE); + area = alloc_vm_area(PAGE_SIZE, &pte); if (!area) return -ENOMEM; - op.host_addr = (unsigned long)area->addr; + op.host_addr = arbitrary_virt_to_machine(pte).maddr; if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); @@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) struct gnttab_unmap_grant_ref op = { .host_addr = (unsigned long)vaddr, }; + unsigned int level; /* It''d be nice if linux/vmalloc.h provided a find_vm_area(void *addr) * method so that we don''t have to muck with vmalloc internals here. @@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) } op.handle = (grant_handle_t)area->phys_addr; + op.host_addr = arbitrary_virt_to_machine( + lookup_address((unsigned long)vaddr, &level)).maddr; if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1)) BUG(); diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 9332e52..1a77252 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size) #endif /* Allocate/destroy a ''vmalloc'' VM area. */ -extern struct vm_struct *alloc_vm_area(size_t size); +extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes); extern void free_vm_area(struct vm_struct *area); /* for /dev/kmem */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5016f19..b5deec6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2105,23 +2105,30 @@ void __attribute__((weak)) vmalloc_sync_all(void) static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data) { - /* apply_to_page_range() does all the hard work. */ + pte_t ***p = data; + + if (p) { + *(*p) = pte; + (*p)++; + } return 0; } /** * alloc_vm_area - allocate a range of kernel address space * @size: size of the area + * @ptes: returns the PTEs for the address space * * Returns: NULL on failure, vm_struct on success * * This function reserves a range of kernel address space, and * allocates pagetables to map that range. No actual mappings - * are created. If the kernel address space is not shared - * between processes, it syncs the pagetable across all - * processes. + * are created. + * + * If @ptes is non-NULL, pointers to the PTEs (in init_mm) + * allocated for the VM area are returned. */ -struct vm_struct *alloc_vm_area(size_t size) +struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes) { struct vm_struct *area; @@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size) * of kernel virtual address space and mapped into init_mm. */ if (apply_to_page_range(&init_mm, (unsigned long)area->addr, - area->size, f, NULL)) { + size, f, ptes ? &ptes : NULL)) { free_vm_area(area); return NULL; } - /* - * If the allocated address space is passed to a hypercall - * before being used then we cannot rely on a page fault to - * trigger an update of the page tables. So sync all the page - * tables here. - */ - vmalloc_sync_all(); - return area; } EXPORT_SYMBOL_GPL(alloc_vm_area); -- 1.7.6.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-26 13:56 UTC
[Xen-devel] Re: [PATCH] MMU bug-fixes in generic code that are mostly used by Xen.
On Tue, Oct 25, 2011 at 04:07:56PM -0400, Konrad Rzeszutek Wilk wrote:> I was wondering if you could help. I''ve this bug-fix: > [PATCH 1/3] x86/paravirt: PTE updates in k(un)map_atomic need to be > > that you picked up some time ago in your tree and then dropped. I am not sure why it > was dropped but perhaps it is b/c I also had that patch in my linux-next and your tool > decided to drop it. Anyhow, was wondering if you would be OK giving it your > Ack or just pulling it in your tree for 3.2. > > These two: > [PATCH 2/3] xen: use generic functions instead of xen_{alloc, > [PATCH 3/3] xen: map foreign pages for shared rings by updating the > > remove what git commit d2fe97c3315a6a406540f74651e7430d9d51e671 > Author: David Vrabel <david.vrabel@citrix.com> > Date: Thu Sep 29 16:53:32 2011 +0100 > > xen: map foreign pages for shared rings by updating the PTEs directly > > added in 3.1 with a more selective way instead of using the big hammer. > > I was wondering if you would be OK ACK-ing those two or sticking them > in your tree for 3.2.Grrr.. Don''t stick them in your tree. I forgot that they are dependent on two other patches to both blkback and netback - otherwise compile errors gallore ensures. I can: 1). Stick the other two patches (blkback + netback) in my tree. And chase down the sub-maintainers to get an Ack for it to go through my tree (got one Ack already). And then stick these two patches on top of it (with your Ack of course). 2). Get the other two patches in via the other maintainers and once they are in the Linus''s tree, then ask for you to pull this one. This might take though lot longer to orchestrate correctly. 3). Ask you to pick all of those patches :-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel