David Vrabel
2011-Sep-01 11:51 UTC
[Xen-devel] [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
From: David Vrabel <david.vrabel@citrix.com> Xen backend drivers (e.g., blkback and netback) would sometimes fail to map grant pages into the vmalloc address space allocated with alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen could not find the page (in the L2 table) containing the PTEs it needed to update. (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 netback and blkback were making the hypercall from a kernel thread where task->active_mm != &init_mm and alloc_vm_area() was only updating the page tables for init_mm. The usual method of deferring the update to the page tables of other processes (i.e., after taking a fault) doesn''t work as a fault cannot occur during the hypercall. This would work on some systems depending on what else was using vmalloc. Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a comment to explain why it''s needed. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- mm/vmalloc.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7ef0903..5016f19 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size) return NULL; } + /* + * If the allocated address space is passed to a hypercall + * before being used then we cannot rely on a page fault to + * trigger an update of the page tables. So sync all the page + * tables here. + */ + vmalloc_sync_all(); + return area; } EXPORT_SYMBOL_GPL(alloc_vm_area); -- 1.7.2.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-01 16:11 UTC
[Xen-devel] [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>Andrew, I was wondering if you would be Ok with this patch for 3.1. It is a revert (I can prepare a proper revert if you would like that instead of this patch). The users of this particular function (alloc_vm_area) are just Xen. There are no others.> > Xen backend drivers (e.g., blkback and netback) would sometimes fail > to map grant pages into the vmalloc address space allocated with > alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen > could not find the page (in the L2 table) containing the PTEs it > needed to update. > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 > > netback and blkback were making the hypercall from a kernel thread > where task->active_mm != &init_mm and alloc_vm_area() was only > updating the page tables for init_mm. The usual method of deferring > the update to the page tables of other processes (i.e., after taking a > fault) doesn''t work as a fault cannot occur during the hypercall. > > This would work on some systems depending on what else was using > vmalloc. > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a > comment to explain why it''s needed. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > mm/vmalloc.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 7ef0903..5016f19 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size) > return NULL; > } > > + /* > + * If the allocated address space is passed to a hypercall > + * before being used then we cannot rely on a page fault to > + * trigger an update of the page tables. So sync all the page > + * tables here. > + */ > + vmalloc_sync_all(); > + > return area; > } > EXPORT_SYMBOL_GPL(alloc_vm_area); > -- > 1.7.2.5_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Sep-01 20:37 UTC
[Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> > Andrew, > > I was wondering if you would be Ok with this patch for 3.1. > > It is a revert (I can prepare a proper revert if you would like > that instead of this patch). > > The users of this particular function (alloc_vm_area) are just > Xen. There are no others.I''d prefer to put explicit vmalloc_sync_all()s in the callsites where necessary, and ultimately try to work out ways of avoiding it altogether (like have some hypercall wrapper which touches the arg memory to make sure its mapped?). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Morton
2011-Sep-01 21:17 UTC
[Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Thu, 01 Sep 2011 13:37:46 -0700 Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote: > > On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> > > Andrew, > > > > I was wondering if you would be Ok with this patch for 3.1. > > > > It is a revert (I can prepare a proper revert if you would like > > that instead of this patch).David''s patch looks better than a straight reversion. Problem is, I can''t find David''s original email anywhere. Someone''s been playing games with To: headers?> > The users of this particular function (alloc_vm_area) are just > > Xen. There are no others. > > I''d prefer to put explicit vmalloc_sync_all()s in the callsites where > necessary,What would that patch look like? Bear in mind that we''ll need something suitable for 3.1 and for a 3.0 backport.> and ultimately try to work out ways of avoiding it altogether > (like have some hypercall wrapper which touches the arg memory to make > sure its mapped?)._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Sep-02 07:22 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Thu, 2011-09-01 at 21:37 +0100, Jeremy Fitzhardinge wrote:> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote: > > On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> > > Andrew, > > > > I was wondering if you would be Ok with this patch for 3.1. > > > > It is a revert (I can prepare a proper revert if you would like > > that instead of this patch). > > > > The users of this particular function (alloc_vm_area) are just > > Xen. There are no others. > > I''d prefer to put explicit vmalloc_sync_all()s in the callsites where > necessary, and ultimately try to work out ways of avoiding it altogether > (like have some hypercall wrapper which touches the arg memory to make > sure its mapped?).That only syncs the current pagetable though. If that is sufficient (and it could well be) then perhaps just doing a vmalloc_sync_one on the current page tables directly would be better than faulting to do it? It''s the sort of thing you could hide inside the gnttab_set_map_op type helpers I guess? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Sep-02 07:31 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On 02/09/2011 08:22, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:> On Thu, 2011-09-01 at 21:37 +0100, Jeremy Fitzhardinge wrote: >> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote: >>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote: >>>> From: David Vrabel <david.vrabel@citrix.com> >>> Andrew, >>> >>> I was wondering if you would be Ok with this patch for 3.1. >>> >>> It is a revert (I can prepare a proper revert if you would like >>> that instead of this patch). >>> >>> The users of this particular function (alloc_vm_area) are just >>> Xen. There are no others. >> >> I''d prefer to put explicit vmalloc_sync_all()s in the callsites where >> necessary, and ultimately try to work out ways of avoiding it altogether >> (like have some hypercall wrapper which touches the arg memory to make >> sure its mapped?). > > That only syncs the current pagetable though. If that is sufficient (and > it could well be) then perhaps just doing a vmalloc_sync_one on the > current page tables directly would be better than faulting to do it?It''s probably sufficient unless the kernel has non-voluntary preemption, and we are not preempt_disable()d. -- Keir> It''s the sort of thing you could hide inside the gnttab_set_map_op type > helpers I guess? > > Ian. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Vrabel
2011-Sep-02 11:39 UTC
[Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On 01/09/11 22:17, Andrew Morton wrote:> On Thu, 01 Sep 2011 13:37:46 -0700 > Jeremy Fitzhardinge <jeremy@goop.org> wrote: > >> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote: >>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote: >>>> From: David Vrabel <david.vrabel@citrix.com> >>> Andrew, >>> >>> I was wondering if you would be Ok with this patch for 3.1. >>> >>> It is a revert (I can prepare a proper revert if you would like >>> that instead of this patch). > > David''s patch looks better than a straight reversion. > > Problem is, I can''t find David''s original email anywhere. Someone''s > been playing games with To: headers?Sorry, I should have Cc''d linux-kernel and others on the original patch.>From 6844ca07140e08f29454ca7b3fa459571c7ba428 Mon Sep 17 00:00:00 2001From: David Vrabel <david.vrabel@citrix.com> Date: Thu, 1 Sep 2011 11:42:50 +0100 Subject: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area() Xen backend drivers (e.g., blkback and netback) would sometimes fail to map grant pages into the vmalloc address space allocated with alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen could not find the page (in the L2 table) containing the PTEs it needed to update. (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 netback and blkback were making the hypercall from a kernel thread where task->active_mm != &init_mm and alloc_vm_area() was only updating the page tables for init_mm. The usual method of deferring the update to the page tables of other processes (i.e., after taking a fault) doesn''t work as a fault cannot occur during the hypercall. This would work on some systems depending on what else was using vmalloc. Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a comment to explain why it''s needed. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- mm/vmalloc.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7ef0903..5016f19 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size) return NULL; } + /* + * If the allocated address space is passed to a hypercall + * before being used then we cannot rely on a page fault to + * trigger an update of the page tables. So sync all the page + * tables here. + */ + vmalloc_sync_all(); + return area; } EXPORT_SYMBOL_GPL(alloc_vm_area); -- 1.7.2.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Morton
2011-Sep-02 22:32 UTC
[Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Fri, 2 Sep 2011 12:39:19 +0100 David Vrabel <david.vrabel@citrix.com> wrote:> Xen backend drivers (e.g., blkback and netback) would sometimes fail > to map grant pages into the vmalloc address space allocated with > alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen > could not find the page (in the L2 table) containing the PTEs it > needed to update. > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 > > netback and blkback were making the hypercall from a kernel thread > where task->active_mm != &init_mm and alloc_vm_area() was only > updating the page tables for init_mm. The usual method of deferring > the update to the page tables of other processes (i.e., after taking a > fault) doesn''t work as a fault cannot occur during the hypercall. > > This would work on some systems depending on what else was using > vmalloc. > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a > comment to explain why it''s needed.oookay, I queued this for 3.1 and tagged it for a 3.0.x backport. I *think* that''s the outcome of this discussion, for the short-term? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Sep-02 23:04 UTC
[Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On 09/02/2011 03:32 PM, Andrew Morton wrote:> On Fri, 2 Sep 2011 12:39:19 +0100 > David Vrabel <david.vrabel@citrix.com> wrote: > >> Xen backend drivers (e.g., blkback and netback) would sometimes fail >> to map grant pages into the vmalloc address space allocated with >> alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen >> could not find the page (in the L2 table) containing the PTEs it >> needed to update. >> >> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 >> >> netback and blkback were making the hypercall from a kernel thread >> where task->active_mm != &init_mm and alloc_vm_area() was only >> updating the page tables for init_mm. The usual method of deferring >> the update to the page tables of other processes (i.e., after taking a >> fault) doesn''t work as a fault cannot occur during the hypercall. >> >> This would work on some systems depending on what else was using >> vmalloc. >> >> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a >> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a >> comment to explain why it''s needed. > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport. I > *think* that''s the outcome of this discussion, for the short-term?Sure, that will get things going for now. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Sep-06 16:35 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:> On Fri, 2 Sep 2011 12:39:19 +0100 > David Vrabel <david.vrabel@citrix.com> wrote: > > > Xen backend drivers (e.g., blkback and netback) would sometimes fail > > to map grant pages into the vmalloc address space allocated with > > alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen > > could not find the page (in the L2 table) containing the PTEs it > > needed to update. > > > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 > > > > netback and blkback were making the hypercall from a kernel thread > > where task->active_mm != &init_mm and alloc_vm_area() was only > > updating the page tables for init_mm. The usual method of deferring > > the update to the page tables of other processes (i.e., after taking a > > fault) doesn''t work as a fault cannot occur during the hypercall. > > > > This would work on some systems depending on what else was using > > vmalloc. > > > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a > > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a > > comment to explain why it''s needed. > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport. I > *think* that''s the outcome of this discussion, for the short-term?<nods> Yup. Thanks! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-05 13:38 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Tue, Sep 06, 2011 at 12:35:53PM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote: > > On Fri, 2 Sep 2011 12:39:19 +0100 > > David Vrabel <david.vrabel@citrix.com> wrote: > > > > > Xen backend drivers (e.g., blkback and netback) would sometimes fail > > > to map grant pages into the vmalloc address space allocated with > > > alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen > > > could not find the page (in the L2 table) containing the PTEs it > > > needed to update. > > > > > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 > > > > > > netback and blkback were making the hypercall from a kernel thread > > > where task->active_mm != &init_mm and alloc_vm_area() was only > > > updating the page tables for init_mm. The usual method of deferring > > > the update to the page tables of other processes (i.e., after taking a > > > fault) doesn''t work as a fault cannot occur during the hypercall. > > > > > > This would work on some systems depending on what else was using > > > vmalloc. > > > > > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a > > > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a > > > comment to explain why it''s needed. > > > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport. I > > *think* that''s the outcome of this discussion, for the short-term? > > <nods> Yup. Thanks!Hey Andrew, The long term outcome is the patchset that David worked on. I''ve sent a GIT PULL to Linus to pick up the Xen related patches that switch over the users of the right API: (xen) stable/vmalloc-3.2 for Linux 3.2-rc0 (https://lkml.org/lkml/2011/10/29/82) And then on top of that use this patch: [Note, I am still waiting for Linus to pull that patchset above.. so not sure on the outcome. perhaps a better way would be for you to pull all patches in your tree?] Also, not sure what you thought of this patch below?>From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001From: David Vrabel <david.vrabel@citrix.com> Date: Thu, 29 Sep 2011 16:53:32 +0100 Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs directly When mapping a foreign page with xenbus_map_ring_valloc() with the GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and pass a pointer to the PTE (in init_mm). After the page is mapped, the usual fault mechanism can be used to update additional MMs. This allows the vmalloc_sync_all() to be removed from alloc_vm_area(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/grant-table.c | 2 +- drivers/xen/xenbus/xenbus_client.c | 11 ++++++++--- include/linux/vmalloc.h | 2 +- mm/vmalloc.c | 27 +++++++++++++-------------- 4 files changed, 23 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c index 6bbfd7a..5a40d24 100644 --- a/arch/x86/xen/grant-table.c +++ b/arch/x86/xen/grant-table.c @@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes, if (shared == NULL) { struct vm_struct *area - alloc_vm_area(PAGE_SIZE * max_nr_gframes); + alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL); BUG_ON(area == NULL); shared = area->addr; *__shared = shared; diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 229d3ad..52bc57f 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -34,6 +34,7 @@ #include <linux/types.h> #include <linux/vmalloc.h> #include <asm/xen/hypervisor.h> +#include <asm/xen/page.h> #include <xen/interface/xen.h> #include <xen/interface/event_channel.h> #include <xen/events.h> @@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn); int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) { struct gnttab_map_grant_ref op = { - .flags = GNTMAP_host_map, + .flags = GNTMAP_host_map | GNTMAP_contains_pte, .ref = gnt_ref, .dom = dev->otherend_id, }; struct vm_struct *area; + pte_t *pte; *vaddr = NULL; - area = alloc_vm_area(PAGE_SIZE); + area = alloc_vm_area(PAGE_SIZE, &pte); if (!area) return -ENOMEM; - op.host_addr = (unsigned long)area->addr; + op.host_addr = arbitrary_virt_to_machine(pte).maddr; if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); @@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) struct gnttab_unmap_grant_ref op = { .host_addr = (unsigned long)vaddr, }; + unsigned int level; /* It''d be nice if linux/vmalloc.h provided a find_vm_area(void *addr) * method so that we don''t have to muck with vmalloc internals here. @@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) } op.handle = (grant_handle_t)area->phys_addr; + op.host_addr = arbitrary_virt_to_machine( + lookup_address((unsigned long)vaddr, &level)).maddr; if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1)) BUG(); diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 9332e52..1a77252 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size) #endif /* Allocate/destroy a ''vmalloc'' VM area. */ -extern struct vm_struct *alloc_vm_area(size_t size); +extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes); extern void free_vm_area(struct vm_struct *area); /* for /dev/kmem */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5016f19..b5deec6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2105,23 +2105,30 @@ void __attribute__((weak)) vmalloc_sync_all(void) static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data) { - /* apply_to_page_range() does all the hard work. */ + pte_t ***p = data; + + if (p) { + *(*p) = pte; + (*p)++; + } return 0; } /** * alloc_vm_area - allocate a range of kernel address space * @size: size of the area + * @ptes: returns the PTEs for the address space * * Returns: NULL on failure, vm_struct on success * * This function reserves a range of kernel address space, and * allocates pagetables to map that range. No actual mappings - * are created. If the kernel address space is not shared - * between processes, it syncs the pagetable across all - * processes. + * are created. + * + * If @ptes is non-NULL, pointers to the PTEs (in init_mm) + * allocated for the VM area are returned. */ -struct vm_struct *alloc_vm_area(size_t size) +struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes) { struct vm_struct *area; @@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size) * of kernel virtual address space and mapped into init_mm. */ if (apply_to_page_range(&init_mm, (unsigned long)area->addr, - area->size, f, NULL)) { + size, f, ptes ? &ptes : NULL)) { free_vm_area(area); return NULL; } - /* - * If the allocated address space is passed to a hypercall - * before being used then we cannot rely on a page fault to - * trigger an update of the page tables. So sync all the page - * tables here. - */ - vmalloc_sync_all(); - return area; } EXPORT_SYMBOL_GPL(alloc_vm_area); -- 1.7.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-07 20:36 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
> > > > > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport. I > > > *think* that''s the outcome of this discussion, for the short-term? > > > > <nods> Yup. Thanks! > > Hey Andrew, > > The long term outcome is the patchset that David worked on. I''ve sent > a GIT PULL to Linus to pick up the Xen related patches that switch over > the users of the right API: > > (xen) stable/vmalloc-3.2 for Linux 3.2-rc0 > > (https://lkml.org/lkml/2011/10/29/82)And Linus picked it up. .. snip..> > Also, not sure what you thought of this patch below?Patch included as attachment for easier review..> > From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001 > From: David Vrabel <david.vrabel@citrix.com> > Date: Thu, 29 Sep 2011 16:53:32 +0100 > Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs > directly > > When mapping a foreign page with xenbus_map_ring_valloc() with the > GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and > pass a pointer to the PTE (in init_mm). > > After the page is mapped, the usual fault mechanism can be used to > update additional MMs. This allows the vmalloc_sync_all() to be > removed from alloc_vm_area(). > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Jeremy Fitzhardinge <jeremy@goop.org> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > arch/x86/xen/grant-table.c | 2 +- > drivers/xen/xenbus/xenbus_client.c | 11 ++++++++--- > include/linux/vmalloc.h | 2 +- > mm/vmalloc.c | 27 +++++++++++++-------------- > 4 files changed, 23 insertions(+), 19 deletions(-) > > diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c > index 6bbfd7a..5a40d24 100644 > --- a/arch/x86/xen/grant-table.c > +++ b/arch/x86/xen/grant-table.c > @@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes, > > if (shared == NULL) { > struct vm_struct *area > - alloc_vm_area(PAGE_SIZE * max_nr_gframes); > + alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL); > BUG_ON(area == NULL); > shared = area->addr; > *__shared = shared; > diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c > index 229d3ad..52bc57f 100644 > --- a/drivers/xen/xenbus/xenbus_client.c > +++ b/drivers/xen/xenbus/xenbus_client.c > @@ -34,6 +34,7 @@ > #include <linux/types.h> > #include <linux/vmalloc.h> > #include <asm/xen/hypervisor.h> > +#include <asm/xen/page.h> > #include <xen/interface/xen.h> > #include <xen/interface/event_channel.h> > #include <xen/events.h> > @@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn); > int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) > { > struct gnttab_map_grant_ref op = { > - .flags = GNTMAP_host_map, > + .flags = GNTMAP_host_map | GNTMAP_contains_pte, > .ref = gnt_ref, > .dom = dev->otherend_id, > }; > struct vm_struct *area; > + pte_t *pte; > > *vaddr = NULL; > > - area = alloc_vm_area(PAGE_SIZE); > + area = alloc_vm_area(PAGE_SIZE, &pte); > if (!area) > return -ENOMEM; > > - op.host_addr = (unsigned long)area->addr; > + op.host_addr = arbitrary_virt_to_machine(pte).maddr; > > if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) > BUG(); > @@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) > struct gnttab_unmap_grant_ref op = { > .host_addr = (unsigned long)vaddr, > }; > + unsigned int level; > > /* It''d be nice if linux/vmalloc.h provided a find_vm_area(void *addr) > * method so that we don''t have to muck with vmalloc internals here. > @@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) > } > > op.handle = (grant_handle_t)area->phys_addr; > + op.host_addr = arbitrary_virt_to_machine( > + lookup_address((unsigned long)vaddr, &level)).maddr; > > if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1)) > BUG(); > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 9332e52..1a77252 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size) > #endif > > /* Allocate/destroy a ''vmalloc'' VM area. */ > -extern struct vm_struct *alloc_vm_area(size_t size); > +extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes); > extern void free_vm_area(struct vm_struct *area); > > /* for /dev/kmem */ > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 5016f19..b5deec6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2105,23 +2105,30 @@ void __attribute__((weak)) vmalloc_sync_all(void) > > static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data) > { > - /* apply_to_page_range() does all the hard work. */ > + pte_t ***p = data; > + > + if (p) { > + *(*p) = pte; > + (*p)++; > + } > return 0; > } > > /** > * alloc_vm_area - allocate a range of kernel address space > * @size: size of the area > + * @ptes: returns the PTEs for the address space > * > * Returns: NULL on failure, vm_struct on success > * > * This function reserves a range of kernel address space, and > * allocates pagetables to map that range. No actual mappings > - * are created. If the kernel address space is not shared > - * between processes, it syncs the pagetable across all > - * processes. > + * are created. > + * > + * If @ptes is non-NULL, pointers to the PTEs (in init_mm) > + * allocated for the VM area are returned. > */ > -struct vm_struct *alloc_vm_area(size_t size) > +struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes) > { > struct vm_struct *area; > > @@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size) > * of kernel virtual address space and mapped into init_mm. > */ > if (apply_to_page_range(&init_mm, (unsigned long)area->addr, > - area->size, f, NULL)) { > + size, f, ptes ? &ptes : NULL)) { > free_vm_area(area); > return NULL; > } > > - /* > - * If the allocated address space is passed to a hypercall > - * before being used then we cannot rely on a page fault to > - * trigger an update of the page tables. So sync all the page > - * tables here. > - */ > - vmalloc_sync_all(); > - > return area; > } > EXPORT_SYMBOL_GPL(alloc_vm_area); > -- > 1.7.7.1 >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Morton
2011-Nov-08 23:01 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Mon, 7 Nov 2011 15:36:13 -0500 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> > > > > > > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport. I > > > > *think* that''s the outcome of this discussion, for the short-term? > > > > > > <nods> Yup. Thanks! > > > > Hey Andrew, > > > > The long term outcome is the patchset that David worked on. I''ve sent > > a GIT PULL to Linus to pick up the Xen related patches that switch over > > the users of the right API: > > > > (xen) stable/vmalloc-3.2 for Linux 3.2-rc0 > > > > (https://lkml.org/lkml/2011/10/29/82) > > And Linus picked it up.I''ve no idea what''s going on here.> .. snip.. > > > > Also, not sure what you thought of this patch below? > > Patch included as attachment for easier review..The patch "xen: map foreign pages for shared rings by updating the PTEs directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-08 23:31 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
> > And Linus picked it up. > > I''ve no idea what''s going on here.Heh. Sorry for being so confusing. Merge windows are a bit stressful and I sometimes end up writing run-on sentences.> > > .. snip.. > > > > > > Also, not sure what you thought of this patch below? > > > > Patch included as attachment for easier review.. > > The patch "xen: map foreign pages for shared rings by updating the PTEs > directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.<nods>. That is b/c it does not have your Ack. The patch applies cleanly to 3.2-rc1 (as all the other patches that it depends on are now in 3.2-rc1). I am humbly asking for you to: a) review the patch (which you did) and get an idea whether you are OK (sounds like you are) b) pick it up in your -mm tree. or alternately: b) give an Ack on the patch so I can put it in my linux-next and push it for 3.2-rc1. Thank you for you consideration! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Morton
2011-Nov-08 23:36 UTC
Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Tue, 8 Nov 2011 18:31:32 -0500 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> > > And Linus picked it up. > > > > I''ve no idea what''s going on here. > > Heh. Sorry for being so confusing. Merge windows are a bit stressful and > I sometimes end up writing run-on sentences. > > > > > .. snip.. > > > > > > > > Also, not sure what you thought of this patch below? > > > > > > Patch included as attachment for easier review.. > > > > The patch "xen: map foreign pages for shared rings by updating the PTEs > > directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next. > > <nods>. That is b/c it does not have your Ack. The patch applies cleanly to > 3.2-rc1 (as all the other patches that it depends on are now in 3.2-rc1). > > I am humbly asking for you to: > a) review the patch (which you did) and get an idea whether you are OK (sounds like you are)Yup.> b) pick it up in your -mm tree.No added benefit there.> or alternately: > b) give an Ack on the patch so I can put it in my linux-next and push it > for 3.2-rc1.That''s OK by me. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Nov-28 09:36 UTC
Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
>>> On 07.11.11 at 21:36, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > Patch included as attachment for easier review..I just noticed that this patch made it upstream meanwhile, and that I should have paid more attention earlier: Once again this adds x86 specific bits to (supposedly) architecture independent Xen code (lookup_address(), use of GNTMAP_contains_pte). Unless everyone agrees that x86 is going to remain the only architecture Xen will support going forward (no ia64, no ARM, nothing else), patches doing so (at least outside proper #ifdef-s or alike) should really be rejected. Besides that, the patch also appears to close the road to running backends in HVM - use of GNTMAP_contains_pte isn''t permitted for paging_mode_external() guests, so it''s even a step backwards for x86. Jan
Ian Campbell
2011-Nov-28 10:19 UTC
Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote:> >>> On 07.11.11 at 21:36, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > Patch included as attachment for easier review.. > > I just noticed that this patch made it upstream meanwhile, and that I > should have paid more attention earlier: Once again this adds x86 > specific bits to (supposedly) architecture independent Xen code > (lookup_address(), use of GNTMAP_contains_pte). Unless everyone > agrees that x86 is going to remain the only architecture Xen will support > going forward (no ia64, no ARM, nothing else), patches doing so (at > least outside proper #ifdef-s or alike) should really be rejected.I think it is up to those interested in such architectures to ensure that a working baseline exists in the first place. The ARM stuff hasn''t even been submitted yet. When the arm stuff is submitted it will naturally include fixes for these sorts issues as necessary and at that point we can talk about regressions and reviewing patches in order to avoid them (until then we can''t really know what a "regression" is). There is nothing unusual about that and nothing about patches we take right now commit us to never supporting another arch in the future so lets not get carried away here. The IA64 support in mainline Linux does not appear to have anyone interested in working on it. AFAICT it hasn''t really been touched (other than odd fixes and tree-wide cleanups) since it was first committed (circa 2.6.25 IIRC). It doesn''t even build right now and looks like it hasn''t built since at least 2.6.36, based on the one failure I happened to look at. I know you''ve been working on fixing up the hypervisor side of ia64 support things recently but it is not clear that there is an existing or viable user or developer base for that port right now.> Besides that, the patch also appears to close the road to running > backends in HVM - use of GNTMAP_contains_pte isn''t permitted for > paging_mode_external() guests, so it''s even a step backwards for > x86.That is a bigger concern. Ian.
David Vrabel
2011-Nov-28 11:36 UTC
Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On 28/11/11 10:19, Ian Campbell wrote:> On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote: >> Besides that, the patch also appears to close the road to running >> backends in HVM - use of GNTMAP_contains_pte isn''t permitted for >> paging_mode_external() guests, so it''s even a step backwards for >> x86. > > That is a bigger concern.Daniel de Graaf has a patch series for this. http://lists.xen.org/archives/html/xen-devel/2011-10/msg01484.html Specifically: [PATCH 1/6] xenbus: Support HVM backends http://lists.xen.org/archives/html/xen-devel/2011-10/msg01486.html David
Ian Campbell
2011-Nov-28 11:48 UTC
Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Mon, 2011-11-28 at 11:36 +0000, David Vrabel wrote:> On 28/11/11 10:19, Ian Campbell wrote: > > On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote: > >> Besides that, the patch also appears to close the road to running > >> backends in HVM - use of GNTMAP_contains_pte isn''t permitted for > >> paging_mode_external() guests, so it''s even a step backwards for > >> x86. > > > > That is a bigger concern. > > Daniel de Graaf has a patch series for this. > > http://lists.xen.org/archives/html/xen-devel/2011-10/msg01484.html > > Specifically: [PATCH 1/6] xenbus: Support HVM backends > > http://lists.xen.org/archives/html/xen-devel/2011-10/msg01486.htmlI''d forgotten about that, thanks! Ian.
Konrad Rzeszutek Wilk
2011-Nov-28 15:20 UTC
Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
On Mon, Nov 28, 2011 at 11:36:49AM +0000, David Vrabel wrote:> On 28/11/11 10:19, Ian Campbell wrote: > > On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote: > >> Besides that, the patch also appears to close the road to running > >> backends in HVM - use of GNTMAP_contains_pte isn''t permitted for > >> paging_mode_external() guests, so it''s even a step backwards for > >> x86. > > > > That is a bigger concern. > > Daniel de Graaf has a patch series for this. > > http://lists.xen.org/archives/html/xen-devel/2011-10/msg01484.html > > Specifically: [PATCH 1/6] xenbus: Support HVM backendsHmm, and I somehow loss track of those. Daniel, Could you repost that patch series on top of v3.2-rc3 please? Or perhaps all of the patches you have so far in one big shot?> > http://lists.xen.org/archives/html/xen-devel/2011-10/msg01486.html > > David > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Reposting my current list of un-applied patches on top of v3.2-rc3 xen/{net,blk}back support for running in HVM: [PATCH 01/12] xenbus: Support HVM backends [PATCH 02/12] xenbus: Use grant-table wrapper functions [PATCH 03/12] xen/grant-table: Support mappings required by blkback [PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers [PATCH 05/12] xen/netback: Enable netback on HVM guests [PATCH 06/12] xen/blkback: Enable blkback on HVM guests Event channel reference counts (these are on konrad/stable/for-linus-3.3 already, just reposted for reference): [PATCH 07/12] xen/event: Add reference counting to event channels [PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex [PATCH 09/12] xen/gnt{dev,alloc}: reserve event channels for notify Fix on top of #7-9 (this could also be merged into #7): [PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels Fixes for gntalloc reference/page leaks: [PATCH 11/12] xen/gntalloc: release grant references on page free [PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings
Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so that ring mappings can be done without using GNTMAP_contains_pte which is not supported on HVM. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/xenbus/xenbus_client.c | 155 +++++++++++++++++++++++++++++------- 1 files changed, 125 insertions(+), 30 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 1906125..688c4b4 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -32,16 +32,27 @@ #include <linux/slab.h> #include <linux/types.h> +#include <linux/spinlock.h> #include <linux/vmalloc.h> #include <linux/export.h> #include <asm/xen/hypervisor.h> #include <asm/xen/page.h> #include <xen/interface/xen.h> #include <xen/interface/event_channel.h> +#include <xen/balloon.h> #include <xen/events.h> #include <xen/grant_table.h> #include <xen/xenbus.h> +struct xenbus_map_node { + struct list_head next; + struct page *page; + grant_handle_t handle; +}; + +static DEFINE_SPINLOCK(xenbus_valloc_lock); +static LIST_HEAD(xenbus_valloc_pages); + const char *xenbus_strstate(enum xenbus_state state) { static const char *const name[] = { @@ -420,21 +431,8 @@ int xenbus_free_evtchn(struct xenbus_device *dev, int port) EXPORT_SYMBOL_GPL(xenbus_free_evtchn); -/** - * xenbus_map_ring_valloc - * @dev: xenbus device - * @gnt_ref: grant reference - * @vaddr: pointer to address to be filled out by mapping - * - * Based on Rusty Russell''s skeleton driver''s map_page. - * Map a page of memory into this domain from another domain''s grant table. - * xenbus_map_ring_valloc allocates a page of virtual address space, maps the - * page to that address, and sets *vaddr to that address. - * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h) - * or -ENOMEM on error. If an error is returned, device will switch to - * XenbusStateClosing and the error message will be saved in XenStore. - */ -int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) +static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev, + int gnt_ref, void **vaddr) { struct gnttab_map_grant_ref op = { .flags = GNTMAP_host_map | GNTMAP_contains_pte, @@ -469,6 +467,64 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) *vaddr = area->addr; return 0; } + +static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev, + int gnt_ref, void **vaddr) +{ + struct xenbus_map_node *node; + int err; + void *addr; + + *vaddr = NULL; + + node = kzalloc(sizeof(*node), GFP_KERNEL); + if (!node) + return -ENOMEM; + + err = alloc_xenballooned_pages(1, &node->page, false); + if (err) + goto out_err; + + addr = pfn_to_kaddr(page_to_pfn(node->page)); + + err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr); + if (err) + goto out_err; + + spin_lock(&xenbus_valloc_lock); + list_add(&node->next, &xenbus_valloc_pages); + spin_unlock(&xenbus_valloc_lock); + + *vaddr = addr; + return 0; + + out_err: + free_xenballooned_pages(1, &node->page); + kfree(node); + return err; +} + +/** + * xenbus_map_ring_valloc + * @dev: xenbus device + * @gnt_ref: grant reference + * @vaddr: pointer to address to be filled out by mapping + * + * Based on Rusty Russell''s skeleton driver''s map_page. + * Map a page of memory into this domain from another domain''s grant table. + * xenbus_map_ring_valloc allocates a page of virtual address space, maps the + * page to that address, and sets *vaddr to that address. + * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h) + * or -ENOMEM on error. If an error is returned, device will switch to + * XenbusStateClosing and the error message will be saved in XenStore. + */ +int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) +{ + if (xen_pv_domain()) + return xenbus_map_ring_valloc_pv(dev, gnt_ref, vaddr); + else + return xenbus_map_ring_valloc_hvm(dev, gnt_ref, vaddr); +} EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc); @@ -510,20 +566,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref, } EXPORT_SYMBOL_GPL(xenbus_map_ring); - -/** - * xenbus_unmap_ring_vfree - * @dev: xenbus device - * @vaddr: addr to unmap - * - * Based on Rusty Russell''s skeleton driver''s unmap_page. - * Unmap a page of memory in this domain that was imported from another domain. - * Use xenbus_unmap_ring_vfree if you mapped in your memory with - * xenbus_map_ring_valloc (it will free the virtual address space). - * Returns 0 on success and returns GNTST_* on error - * (see xen/include/interface/grant_table.h). - */ -int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) +static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr) { struct vm_struct *area; struct gnttab_unmap_grant_ref op = { @@ -566,8 +609,60 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) return op.status; } -EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree); +static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr) +{ + int rv; + struct xenbus_map_node *node; + void *addr; + + spin_lock(&xenbus_valloc_lock); + list_for_each_entry(node, &xenbus_valloc_pages, next) { + addr = pfn_to_kaddr(page_to_pfn(node->page)); + if (addr == vaddr) { + list_del(&node->next); + goto found; + } + } + node = NULL; + found: + spin_unlock(&xenbus_valloc_lock); + + if (!node) { + xenbus_dev_error(dev, -ENOENT, + "can''t find mapped virtual address %p", vaddr); + return -ENOENT; + } + + rv = xenbus_unmap_ring(dev, node->handle, addr); + + if (!rv) + free_xenballooned_pages(1, &node->page); + + kfree(node); + return rv; +} + +/** + * xenbus_unmap_ring_vfree + * @dev: xenbus device + * @vaddr: addr to unmap + * + * Based on Rusty Russell''s skeleton driver''s unmap_page. + * Unmap a page of memory in this domain that was imported from another domain. + * Use xenbus_unmap_ring_vfree if you mapped in your memory with + * xenbus_map_ring_valloc (it will free the virtual address space). + * Returns 0 on success and returns GNTST_* on error + * (see xen/include/interface/grant_table.h). + */ +int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) +{ + if (xen_pv_domain()) + return xenbus_unmap_ring_vfree_pv(dev, vaddr); + else + return xenbus_unmap_ring_vfree_hvm(dev, vaddr); +} +EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree); /** * xenbus_unmap_ring -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 02/12] xenbus: Use grant-table wrapper functions
The gnttab_set_{map,unmap}_op functions should be used instead of directly populating the fields of gnttab_map_grant_ref. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/xenbus/xenbus_client.c | 17 +++++++---------- 1 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 688c4b4..a90f959 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -545,12 +545,10 @@ EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc); int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref, grant_handle_t *handle, void *vaddr) { - struct gnttab_map_grant_ref op = { - .host_addr = (unsigned long)vaddr, - .flags = GNTMAP_host_map, - .ref = gnt_ref, - .dom = dev->otherend_id, - }; + struct gnttab_map_grant_ref op; + + gnttab_set_map_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, gnt_ref, + dev->otherend_id); if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1)) BUG(); @@ -677,10 +675,9 @@ EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree); int xenbus_unmap_ring(struct xenbus_device *dev, grant_handle_t handle, void *vaddr) { - struct gnttab_unmap_grant_ref op = { - .host_addr = (unsigned long)vaddr, - .handle = handle, - }; + struct gnttab_unmap_grant_ref op; + + gnttab_set_unmap_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, handle); if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1)) BUG(); -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 03/12] xen/grant-table: Support mappings required by blkback
Allow mappings without GNTMAP_contains_pte and allow unmapping to specify if the PTEs should be cleared. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 3 ++- drivers/xen/grant-table.c | 23 ++++------------------- include/xen/grant_table.h | 2 +- 3 files changed, 7 insertions(+), 21 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index afca14d..65bff07 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -312,7 +312,8 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages) } } - err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages + offset, pages); + err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages + offset, + pages, true); if (err) return err; diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index bf1c094..a02d139 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -472,24 +472,9 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, (map_ops[i].host_addr & ~PAGE_MASK)); mfn = pte_mfn(*pte); } else { - /* If you really wanted to do this: - * mfn = PFN_DOWN(map_ops[i].dev_bus_addr); - * - * The reason we do not implement it is b/c on the - * unmap path (gnttab_unmap_refs) we have no means of - * checking whether the page is !GNTMAP_contains_pte. - * - * That is without some extra data-structure to carry - * the struct page, bool clear_pte, and list_head next - * tuples and deal with allocation/delallocation, etc. - * - * The users of this API set the GNTMAP_contains_pte - * flag so lets just return not supported until it - * becomes neccessary to implement. - */ - return -EOPNOTSUPP; + mfn = PFN_DOWN(map_ops[i].dev_bus_addr); } - ret = m2p_add_override(mfn, pages[i], &kmap_ops[i]); + ret = m2p_add_override(mfn, pages[i], kmap_ops ? &kmap_ops[i] : NULL); if (ret) return ret; } @@ -499,7 +484,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, EXPORT_SYMBOL_GPL(gnttab_map_refs); int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, - struct page **pages, unsigned int count) + struct page **pages, unsigned int count, bool clear_pte) { int i, ret; @@ -511,7 +496,7 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, return ret; for (i = 0; i < count; i++) { - ret = m2p_remove_override(pages[i], true /* clear the PTE */); + ret = m2p_remove_override(pages[i], clear_pte); if (ret) return ret; } diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index 11e2dfc..37da54d 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -158,6 +158,6 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count); int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops, - struct page **pages, unsigned int count); + struct page **pages, unsigned int count, bool clear_pte); #endif /* __ASM_GNTTAB_H__ */ -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/block/xen-blkback/blkback.c | 29 ++++------------------------- 1 files changed, 4 insertions(+), 25 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 15ec4db..1e256dc 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -324,6 +324,7 @@ struct seg_buf { static void xen_blkbk_unmap(struct pending_req *req) { struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST]; unsigned int i, invcount = 0; grant_handle_t handle; int ret; @@ -335,25 +336,12 @@ static void xen_blkbk_unmap(struct pending_req *req) gnttab_set_unmap_op(&unmap[invcount], vaddr(req, i), GNTMAP_host_map, handle); pending_handle(req, i) = BLKBACK_INVALID_HANDLE; + pages[invcount] = virt_to_page(vaddr(req, i)); invcount++; } - ret = HYPERVISOR_grant_table_op( - GNTTABOP_unmap_grant_ref, unmap, invcount); + ret = gnttab_unmap_refs(unmap, pages, invcount, false); BUG_ON(ret); - /* - * Note, we use invcount, so nr->pages, so we can''t index - * using vaddr(req, i). - */ - for (i = 0; i < invcount; i++) { - ret = m2p_remove_override( - virt_to_page(unmap[i].host_addr), false); - if (ret) { - pr_alert(DRV_PFX "Failed to remove M2P override for %lx\n", - (unsigned long)unmap[i].host_addr); - continue; - } - } } static int xen_blkbk_map(struct blkif_request *req, @@ -381,7 +369,7 @@ static int xen_blkbk_map(struct blkif_request *req, pending_req->blkif->domid); } - ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map, nseg); + ret = gnttab_map_refs(map, NULL, &blkbk->pending_page(pending_req, 0), nseg); BUG_ON(ret); /* @@ -401,15 +389,6 @@ static int xen_blkbk_map(struct blkif_request *req, if (ret) continue; - ret = m2p_add_override(PFN_DOWN(map[i].dev_bus_addr), - blkbk->pending_page(pending_req, i), NULL); - if (ret) { - pr_alert(DRV_PFX "Failed to install M2P override for %lx (ret: %d)\n", - (unsigned long)map[i].dev_bus_addr, ret); - /* We could switch over to GNTTABOP_copy */ - continue; - } - seg[i].buf = map[i].dev_bus_addr | (req->u.rw.seg[i].first_sect << 9); } -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 05/12] xen/netback: Enable netback on HVM guests
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/net/xen-netback/netback.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 0cb594c..951c713 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1638,7 +1638,7 @@ static int __init netback_init(void) int rc = 0; int group; - if (!xen_pv_domain()) + if (!xen_domain()) return -ENODEV; xen_netbk_group_nr = num_online_cpus(); -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 06/12] xen/blkback: Enable blkback on HVM guests
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/block/xen-blkback/blkback.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 1e256dc..fbffdf0 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -823,7 +823,7 @@ static int __init xen_blkif_init(void) int i, mmap_pages; int rc = 0; - if (!xen_pv_domain()) + if (!xen_domain()) return -ENODEV; blkbk = kzalloc(sizeof(struct xen_blkbk), GFP_KERNEL); -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 07/12] xen/event: Add reference counting to event channels
Event channels exposed to userspace by the evtchn module may be used by other modules in an asynchronous manner, which requires that reference counting be used to prevent the event channel from being closed before the signals are delivered. The reference count on new event channels defaults to -1 which indicates the event channel is not referenced outside the kernel; evtchn_get fails if called on such an event channel. The event channels made visible to userspace by evtchn have a normal reference count. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/events.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++- drivers/xen/evtchn.c | 2 +- include/xen/events.h | 7 +++++ 3 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 6e075cd..a3bcd61 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -87,6 +87,7 @@ enum xen_irq_type { */ struct irq_info { struct list_head list; + int refcnt; enum xen_irq_type type; /* type */ unsigned irq; unsigned short evtchn; /* event channel */ @@ -406,6 +407,7 @@ static void xen_irq_init(unsigned irq) panic("Unable to allocate metadata for IRQ%d\n", irq); info->type = IRQT_UNBOUND; + info->refcnt = -1; irq_set_handler_data(irq, info); @@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq) irq_set_handler_data(irq, NULL); + WARN_ON(info->refcnt > 0); + kfree(info); /* Legacy IRQ descriptors are managed by the arch. */ @@ -637,7 +641,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi, if (irq != -1) { printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n", irq, gsi); - goto out; /* XXX need refcount? */ + goto out; } irq = xen_allocate_irq_gsi(gsi); @@ -939,9 +943,16 @@ static void unbind_from_irq(unsigned int irq) { struct evtchn_close close; int evtchn = evtchn_from_irq(irq); + struct irq_info *info = irq_get_handler_data(irq); mutex_lock(&irq_mapping_update_lock); + if (info->refcnt > 0) { + info->refcnt--; + if (info->refcnt != 0) + goto done; + } + if (VALID_EVTCHN(evtchn)) { close.port = evtchn; if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) @@ -970,6 +981,7 @@ static void unbind_from_irq(unsigned int irq) xen_free_irq(irq); + done: mutex_unlock(&irq_mapping_update_lock); } @@ -1065,6 +1077,66 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id) } EXPORT_SYMBOL_GPL(unbind_from_irqhandler); +int evtchn_make_refcounted(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + struct irq_info *info; + + if (irq == -1) + return -ENOENT; + + info = irq_get_handler_data(irq); + + if (!info) + return -ENOENT; + + WARN_ON(info->refcnt != -1); + + info->refcnt = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(evtchn_make_refcounted); + +int evtchn_get(unsigned int evtchn) +{ + int irq; + struct irq_info *info; + int err = -ENOENT; + + mutex_lock(&irq_mapping_update_lock); + + irq = evtchn_to_irq[evtchn]; + if (irq == -1) + goto done; + + info = irq_get_handler_data(irq); + + if (!info) + goto done; + + err = -EINVAL; + if (info->refcnt <= 0) + goto done; + + info->refcnt++; + err = 0; + done: + mutex_unlock(&irq_mapping_update_lock); + + return err; +} +EXPORT_SYMBOL_GPL(evtchn_get); + +void evtchn_put(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + if (WARN_ON(irq == -1)) + return; + unbind_from_irq(irq); +} +EXPORT_SYMBOL_GPL(evtchn_put); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) { int irq = per_cpu(ipi_to_irq, cpu)[vector]; diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c index dbc13e9..b1f60a0 100644 --- a/drivers/xen/evtchn.c +++ b/drivers/xen/evtchn.c @@ -268,7 +268,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port) rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED, u->name, (void *)(unsigned long)port); if (rc >= 0) - rc = 0; + rc = evtchn_make_refcounted(port); return rc; } diff --git a/include/xen/events.h b/include/xen/events.h index d287997..0f77370 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -37,6 +37,13 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain, */ void unbind_from_irqhandler(unsigned int irq, void *dev_id); +/* + * Allow extra references to event channels exposed to userspace by evtchn + */ +int evtchn_make_refcounted(unsigned int evtchn); +int evtchn_get(unsigned int evtchn); +void evtchn_put(unsigned int evtchn); + void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector); int resend_irq_on_evtchn(unsigned int irq); void rebind_evtchn_irq(int evtchn, int irq); -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex
The event channel release function cannot be called under a spinlock because it can attempt to acquire a mutex due to the event channel reference acquired when setting up unmap notifications. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/gntalloc.c | 41 +++++++++++++++++++++-------------------- 1 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index e1c4c6e..2a7e9f8 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -74,7 +74,7 @@ MODULE_PARM_DESC(limit, "Maximum number of grants that may be allocated by " "the gntalloc device"); static LIST_HEAD(gref_list); -static DEFINE_SPINLOCK(gref_lock); +static DEFINE_MUTEX(gref_mutex); static int gref_size; struct notify_info { @@ -143,15 +143,15 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op, } /* Add to gref lists. */ - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); list_splice_tail(&queue_gref, &gref_list); list_splice_tail(&queue_file, &priv->list); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return 0; undo: - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref_size -= (op->count - i); list_for_each_entry(gref, &queue_file, next_file) { @@ -167,7 +167,7 @@ undo: */ if (unlikely(!list_empty(&queue_gref))) list_splice_tail(&queue_gref, &gref_list); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rc; } @@ -251,7 +251,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp) pr_debug("%s: priv %p\n", __func__, priv); - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); while (!list_empty(&priv->list)) { gref = list_entry(priv->list.next, struct gntalloc_gref, next_file); @@ -261,7 +261,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp) __del_gref(gref); } kfree(priv); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return 0; } @@ -286,21 +286,21 @@ static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, goto out; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); /* Clean up pages that were at zero (local) users but were still mapped * by remote domains. Since those pages count towards the limit that we * are about to enforce, removing them here is a good idea. */ do_cleanup(); if (gref_size + op.count > limit) { - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); rc = -ENOSPC; goto out_free; } gref_size += op.count; op.index = priv->index; priv->index += op.count * PAGE_SIZE; - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); rc = add_grefs(&op, gref_ids, priv); if (rc < 0) @@ -343,7 +343,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, goto dealloc_grant_out; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, op.index, op.count); if (gref) { /* Remove from the file list only, and decrease reference count. @@ -363,7 +363,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, do_cleanup(); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); dealloc_grant_out: return rc; } @@ -383,7 +383,7 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, index = op.index & ~(PAGE_SIZE - 1); pgoff = op.index & (PAGE_SIZE - 1); - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, index, 1); if (!gref) { @@ -400,8 +400,9 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; rc = 0; + unlock_out: - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rc; } @@ -433,9 +434,9 @@ static void gntalloc_vma_open(struct vm_area_struct *vma) if (!gref) return; - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref->users++; - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); } static void gntalloc_vma_close(struct vm_area_struct *vma) @@ -444,11 +445,11 @@ static void gntalloc_vma_close(struct vm_area_struct *vma) if (!gref) return; - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref->users--; if (gref->users == 0) __del_gref(gref); - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); } static struct vm_operations_struct gntalloc_vmops = { @@ -471,7 +472,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) return -EINVAL; } - spin_lock(&gref_lock); + mutex_lock(&gref_mutex); gref = find_grefs(priv, vma->vm_pgoff << PAGE_SHIFT, count); if (gref == NULL) { rv = -ENOENT; @@ -499,7 +500,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) rv = 0; out_unlock: - spin_unlock(&gref_lock); + mutex_unlock(&gref_mutex); return rv; } -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 09/12] xen/gnt{dev, alloc}: reserve event channels for notify
When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/gntalloc.c | 21 ++++++++++++++++++++- drivers/xen/gntdev.c | 31 ++++++++++++++++++++++++++++++- 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index 2a7e9f8..60eee4e 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref) tmp[gref->notify.pgoff] = 0; kunmap(gref->page); } - if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(gref->notify.event); + evtchn_put(gref->notify.event); + } gref->notify.flags = 0; @@ -396,6 +398,23 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv, goto unlock_out; } + /* We need to grab a reference to the event channel we are going to use + * to send the notify before releasing the reference we may already have + * (if someone has called this ioctl twice). This is required so that + * it is possible to change the clear_byte part of the notification + * without disturbing the event channel part, which may now be the last + * reference to that event channel. + */ + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) { + rc = -EINVAL; + goto unlock_out; + } + } + + if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(gref->notify.event); + gref->notify.flags = op.action; gref->notify.pgoff = pgoff; gref->notify.event = op.event_channel_port; diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 65bff07..4e189f5 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -193,8 +193,10 @@ static void gntdev_put_map(struct grant_map *map) atomic_sub(map->count, &pages_mapped); - if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) + if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(map->notify.event); + evtchn_put(map->notify.event); + } if (map->pages) { if (!use_ptemod) @@ -600,6 +602,8 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) struct ioctl_gntdev_unmap_notify op; struct grant_map *map; int rc; + int out_flags; + unsigned int out_event; if (copy_from_user(&op, u, sizeof(op))) return -EFAULT; @@ -607,6 +611,21 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) if (op.action & ~(UNMAP_NOTIFY_CLEAR_BYTE|UNMAP_NOTIFY_SEND_EVENT)) return -EINVAL; + /* We need to grab a reference to the event channel we are going to use + * to send the notify before releasing the reference we may already have + * (if someone has called this ioctl twice). This is required so that + * it is possible to change the clear_byte part of the notification + * without disturbing the event channel part, which may now be the last + * reference to that event channel. + */ + if (op.action & UNMAP_NOTIFY_SEND_EVENT) { + if (evtchn_get(op.event_channel_port)) + return -EINVAL; + } + + out_flags = op.action; + out_event = op.event_channel_port; + spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { @@ -625,12 +644,22 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) goto unlock_out; } + out_flags = map->notify.flags; + out_event = map->notify.event; + map->notify.flags = op.action; map->notify.addr = op.index - (map->index << PAGE_SHIFT); map->notify.event = op.event_channel_port; + rc = 0; + unlock_out: spin_unlock(&priv->lock); + + /* Drop the reference to the event channel we did not save in the map */ + if (out_flags & UNMAP_NOTIFY_SEND_EVENT) + evtchn_put(out_event); + return rc; } -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels
The event channel number provided to evtchn_get can be provided by userspace, so needs to be checked against the maximum number of event channels prior to using it to index into evtchn_to_irq. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/events.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index a3bcd61..e5e5812 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -1104,6 +1104,9 @@ int evtchn_get(unsigned int evtchn) struct irq_info *info; int err = -ENOENT; + if (evtchn >= NR_EVENT_CHANNELS) + return -EINVAL; + mutex_lock(&irq_mapping_update_lock); irq = evtchn_to_irq[evtchn]; -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 11/12] xen/gntalloc: release grant references on page free
gnttab_end_foreign_access_ref does not return the grant reference it is passed to the free list; gnttab_free_grant_reference needs to be explicitly called. While gnttab_end_foreign_access provides a wrapper for this, it is unsuitable because it does not return errors. Reported-by: Anil Madhavapeddy <anil@recoil.org> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index 60eee4e..7b936cc 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -191,6 +191,8 @@ static void __del_gref(struct gntalloc_gref *gref) if (!gnttab_end_foreign_access_ref(gref->gref_id, 0)) return; + + gnttab_free_grant_reference(gref->gref_id); } gref_size--; -- 1.7.7.3
Daniel De Graaf
2011-Nov-28 16:49 UTC
[PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings
When a multi-page mapping of gntalloc is created, the reference counts of all pages in the vma are incremented. However, the vma open/close operations only adjusted the reference count of the first page in the mapping, leaking the other pages. Store a struct in the vm_private_data to track the original page count to properly free the pages when the last reference to the vma is closed. Reported-by: Anil Madhavapeddy <anil@recoil.org> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntalloc.c | 56 ++++++++++++++++++++++++++++++++++++----------- 1 files changed, 43 insertions(+), 13 deletions(-) diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index 7b936cc..e2400c8 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -99,6 +99,12 @@ struct gntalloc_file_private_data { uint64_t index; }; +struct gntalloc_vma_private_data { + struct gntalloc_gref *gref; + int users; + int count; +}; + static void __del_gref(struct gntalloc_gref *gref); static void do_cleanup(void) @@ -451,25 +457,39 @@ static long gntalloc_ioctl(struct file *filp, unsigned int cmd, static void gntalloc_vma_open(struct vm_area_struct *vma) { - struct gntalloc_gref *gref = vma->vm_private_data; - if (!gref) + struct gntalloc_vma_private_data *priv = vma->vm_private_data; + + if (!priv) return; mutex_lock(&gref_mutex); - gref->users++; + priv->users++; mutex_unlock(&gref_mutex); } static void gntalloc_vma_close(struct vm_area_struct *vma) { - struct gntalloc_gref *gref = vma->vm_private_data; - if (!gref) + struct gntalloc_vma_private_data *priv = vma->vm_private_data; + struct gntalloc_gref *gref, *next; + int i; + + if (!priv) return; mutex_lock(&gref_mutex); - gref->users--; - if (gref->users == 0) - __del_gref(gref); + priv->users--; + if (priv->users == 0) { + gref = priv->gref; + for (i = 0; i < priv->count; i++) { + gref->users--; + next = list_entry(gref->next_gref.next, + struct gntalloc_gref, next_gref); + if (gref->users == 0) + __del_gref(gref); + gref = next; + } + kfree(priv); + } mutex_unlock(&gref_mutex); } @@ -481,19 +501,25 @@ static struct vm_operations_struct gntalloc_vmops = { static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) { struct gntalloc_file_private_data *priv = filp->private_data; + struct gntalloc_vma_private_data *vm_priv; struct gntalloc_gref *gref; int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; int rv, i; - pr_debug("%s: priv %p, page %lu+%d\n", __func__, - priv, vma->vm_pgoff, count); - if (!(vma->vm_flags & VM_SHARED)) { printk(KERN_ERR "%s: Mapping must be shared.\n", __func__); return -EINVAL; } + vm_priv = kmalloc(sizeof(*vm_priv), GFP_KERNEL); + if (!vm_priv) + return -ENOMEM; + mutex_lock(&gref_mutex); + + pr_debug("%s: priv %p,%p, page %lu+%d\n", __func__, + priv, vm_priv, vma->vm_pgoff, count); + gref = find_grefs(priv, vma->vm_pgoff << PAGE_SHIFT, count); if (gref == NULL) { rv = -ENOENT; @@ -502,9 +528,13 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) goto out_unlock; } - vma->vm_private_data = gref; + vm_priv->gref = gref; + vm_priv->users = 1; + vm_priv->count = count; + + vma->vm_private_data = vm_priv; - vma->vm_flags |= VM_RESERVED; + vma->vm_flags |= VM_RESERVED | VM_DONTEXPAND; vma->vm_ops = &gntalloc_vmops; -- 1.7.7.3
Konrad Rzeszutek Wilk
2011-Dec-14 18:56 UTC
Re: [PATCH 07/12] xen/event: Add reference counting to event channels
On Mon, Nov 28, 2011 at 11:49:06AM -0500, Daniel De Graaf wrote:> Event channels exposed to userspace by the evtchn module may be used by > other modules in an asynchronous manner, which requires that reference > counting be used to prevent the event channel from being closed before > the signals are delivered. > > The reference count on new event channels defaults to -1 which indicates > the event channel is not referenced outside the kernel; evtchn_get fails > if called on such an event channel. The event channels made visible to > userspace by evtchn have a normal reference count.So I''ve 7 through 12 in my branch now. (7,8 and 9 are in the #linux-next and the 10-12 are in the #testing). The other 1-6 you have might need to be parceled out (especially the netback one which usually goes through David Miller). Also I have to double-check - but did somebody review those 1-6 patches?
Konrad Rzeszutek Wilk
2011-Dec-14 19:03 UTC
Re: [PATCH 01/12] xenbus: Support HVM backends
On Mon, Nov 28, 2011 at 11:49:00AM -0500, Daniel De Graaf wrote:> Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so > that ring mappings can be done without using GNTMAP_contains_pte which > is not supported on HVM.So what else besides these patches should I do to load the blkback/netback drivers in a HVM domain? There are some xen toolstack patches patches I presume? Can you tell me what the c/s are (if any?) Thanks!> > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/xenbus/xenbus_client.c | 155 +++++++++++++++++++++++++++++------- > 1 files changed, 125 insertions(+), 30 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c > index 1906125..688c4b4 100644 > --- a/drivers/xen/xenbus/xenbus_client.c > +++ b/drivers/xen/xenbus/xenbus_client.c > @@ -32,16 +32,27 @@ > > #include <linux/slab.h> > #include <linux/types.h> > +#include <linux/spinlock.h> > #include <linux/vmalloc.h> > #include <linux/export.h> > #include <asm/xen/hypervisor.h> > #include <asm/xen/page.h> > #include <xen/interface/xen.h> > #include <xen/interface/event_channel.h> > +#include <xen/balloon.h> > #include <xen/events.h> > #include <xen/grant_table.h> > #include <xen/xenbus.h> > > +struct xenbus_map_node { > + struct list_head next; > + struct page *page; > + grant_handle_t handle; > +}; > + > +static DEFINE_SPINLOCK(xenbus_valloc_lock); > +static LIST_HEAD(xenbus_valloc_pages); > + > const char *xenbus_strstate(enum xenbus_state state) > { > static const char *const name[] = { > @@ -420,21 +431,8 @@ int xenbus_free_evtchn(struct xenbus_device *dev, int port) > EXPORT_SYMBOL_GPL(xenbus_free_evtchn); > > > -/** > - * xenbus_map_ring_valloc > - * @dev: xenbus device > - * @gnt_ref: grant reference > - * @vaddr: pointer to address to be filled out by mapping > - * > - * Based on Rusty Russell''s skeleton driver''s map_page. > - * Map a page of memory into this domain from another domain''s grant table. > - * xenbus_map_ring_valloc allocates a page of virtual address space, maps the > - * page to that address, and sets *vaddr to that address. > - * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h) > - * or -ENOMEM on error. If an error is returned, device will switch to > - * XenbusStateClosing and the error message will be saved in XenStore. > - */ > -int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) > +static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev, > + int gnt_ref, void **vaddr) > { > struct gnttab_map_grant_ref op = { > .flags = GNTMAP_host_map | GNTMAP_contains_pte, > @@ -469,6 +467,64 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) > *vaddr = area->addr; > return 0; > } > + > +static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev, > + int gnt_ref, void **vaddr) > +{ > + struct xenbus_map_node *node; > + int err; > + void *addr; > + > + *vaddr = NULL; > + > + node = kzalloc(sizeof(*node), GFP_KERNEL); > + if (!node) > + return -ENOMEM; > + > + err = alloc_xenballooned_pages(1, &node->page, false); > + if (err) > + goto out_err; > + > + addr = pfn_to_kaddr(page_to_pfn(node->page)); > + > + err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr); > + if (err) > + goto out_err; > + > + spin_lock(&xenbus_valloc_lock); > + list_add(&node->next, &xenbus_valloc_pages); > + spin_unlock(&xenbus_valloc_lock); > + > + *vaddr = addr; > + return 0; > + > + out_err: > + free_xenballooned_pages(1, &node->page); > + kfree(node); > + return err; > +} > + > +/** > + * xenbus_map_ring_valloc > + * @dev: xenbus device > + * @gnt_ref: grant reference > + * @vaddr: pointer to address to be filled out by mapping > + * > + * Based on Rusty Russell''s skeleton driver''s map_page. > + * Map a page of memory into this domain from another domain''s grant table. > + * xenbus_map_ring_valloc allocates a page of virtual address space, maps the > + * page to that address, and sets *vaddr to that address. > + * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h) > + * or -ENOMEM on error. If an error is returned, device will switch to > + * XenbusStateClosing and the error message will be saved in XenStore. > + */ > +int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr) > +{ > + if (xen_pv_domain()) > + return xenbus_map_ring_valloc_pv(dev, gnt_ref, vaddr); > + else > + return xenbus_map_ring_valloc_hvm(dev, gnt_ref, vaddr); > +} > EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc); > > > @@ -510,20 +566,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref, > } > EXPORT_SYMBOL_GPL(xenbus_map_ring); > > - > -/** > - * xenbus_unmap_ring_vfree > - * @dev: xenbus device > - * @vaddr: addr to unmap > - * > - * Based on Rusty Russell''s skeleton driver''s unmap_page. > - * Unmap a page of memory in this domain that was imported from another domain. > - * Use xenbus_unmap_ring_vfree if you mapped in your memory with > - * xenbus_map_ring_valloc (it will free the virtual address space). > - * Returns 0 on success and returns GNTST_* on error > - * (see xen/include/interface/grant_table.h). > - */ > -int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) > +static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr) > { > struct vm_struct *area; > struct gnttab_unmap_grant_ref op = { > @@ -566,8 +609,60 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) > > return op.status; > } > -EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree); > > +static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr) > +{ > + int rv; > + struct xenbus_map_node *node; > + void *addr; > + > + spin_lock(&xenbus_valloc_lock); > + list_for_each_entry(node, &xenbus_valloc_pages, next) { > + addr = pfn_to_kaddr(page_to_pfn(node->page)); > + if (addr == vaddr) { > + list_del(&node->next); > + goto found; > + } > + } > + node = NULL; > + found: > + spin_unlock(&xenbus_valloc_lock); > + > + if (!node) { > + xenbus_dev_error(dev, -ENOENT, > + "can''t find mapped virtual address %p", vaddr); > + return -ENOENT; > + } > + > + rv = xenbus_unmap_ring(dev, node->handle, addr); > + > + if (!rv) > + free_xenballooned_pages(1, &node->page); > + > + kfree(node); > + return rv; > +} > + > +/** > + * xenbus_unmap_ring_vfree > + * @dev: xenbus device > + * @vaddr: addr to unmap > + * > + * Based on Rusty Russell''s skeleton driver''s unmap_page. > + * Unmap a page of memory in this domain that was imported from another domain. > + * Use xenbus_unmap_ring_vfree if you mapped in your memory with > + * xenbus_map_ring_valloc (it will free the virtual address space). > + * Returns 0 on success and returns GNTST_* on error > + * (see xen/include/interface/grant_table.h). > + */ > +int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr) > +{ > + if (xen_pv_domain()) > + return xenbus_unmap_ring_vfree_pv(dev, vaddr); > + else > + return xenbus_unmap_ring_vfree_hvm(dev, vaddr); > +} > +EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree); > > /** > * xenbus_unmap_ring > -- > 1.7.7.3
Daniel De Graaf
2011-Dec-14 19:52 UTC
Re: [PATCH 07/12] xen/event: Add reference counting to event channels
On 12/14/2011 01:56 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Nov 28, 2011 at 11:49:06AM -0500, Daniel De Graaf wrote: >> Event channels exposed to userspace by the evtchn module may be used by >> other modules in an asynchronous manner, which requires that reference >> counting be used to prevent the event channel from being closed before >> the signals are delivered. >> >> The reference count on new event channels defaults to -1 which indicates >> the event channel is not referenced outside the kernel; evtchn_get fails >> if called on such an event channel. The event channels made visible to >> userspace by evtchn have a normal reference count. > > So I''ve 7 through 12 in my branch now. (7,8 and 9 are in the #linux-next > and the 10-12 are in the #testing). > > The other 1-6 you have might need to be parceled out (especially the netback > one which usually goes through David Miller). Also I have to double-check - > but did somebody review those 1-6 patches? >The last review I''ve seen of these patches is this thread: http://lists.xen.org/archives/html/xen-devel/2011-10/threads.html#01484 I''ll post a v3 for those in a bit incorporating the comments in that thread; the netback one was (unofficially?) acked by David Miller. I don''t see which of the other patches need to be parceled out as the files they touch look like they are still being committed with only Xen developer sign-offs.
On 12/14/2011 02:03 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Nov 28, 2011 at 11:49:00AM -0500, Daniel De Graaf wrote: >> Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so >> that ring mappings can be done without using GNTMAP_contains_pte which >> is not supported on HVM. > > > So what else besides these patches should I do to load the blkback/netback > drivers in a HVM domain? There are some xen toolstack patches patches I presume? > Can you tell me what the c/s are (if any?) > > Thanks!No toolstack changes are required for netback; the interface simply shows up in the guest the same as it does in a PV domain. The same is true for blkback, but the toolstack does not currently support specifying backend domains (PV or HVM) for block devices. I posted a patch earlier [1] adding this support, but the conversion from block device name to major/minor number is done in the toolstack''s domain, not the domain exporting the block device. [1] http://lists.xen.org/archives/html/xen-devel/2011-10/threads.html#00818
Konrad Rzeszutek Wilk
2011-Dec-16 19:35 UTC
Re: [PATCH 07/12] xen/event: Add reference counting to event channels
On Wed, Dec 14, 2011 at 02:52:00PM -0500, Daniel De Graaf wrote:> On 12/14/2011 01:56 PM, Konrad Rzeszutek Wilk wrote: > > On Mon, Nov 28, 2011 at 11:49:06AM -0500, Daniel De Graaf wrote: > >> Event channels exposed to userspace by the evtchn module may be used by > >> other modules in an asynchronous manner, which requires that reference > >> counting be used to prevent the event channel from being closed before > >> the signals are delivered. > >> > >> The reference count on new event channels defaults to -1 which indicates > >> the event channel is not referenced outside the kernel; evtchn_get fails > >> if called on such an event channel. The event channels made visible to > >> userspace by evtchn have a normal reference count. > > > > So I''ve 7 through 12 in my branch now. (7,8 and 9 are in the #linux-next > > and the 10-12 are in the #testing). > > > > The other 1-6 you have might need to be parceled out (especially the netback > > one which usually goes through David Miller). Also I have to double-check - > > but did somebody review those 1-6 patches? > > > > The last review I''ve seen of these patches is this thread: > > http://lists.xen.org/archives/html/xen-devel/2011-10/threads.html#01484 > > I''ll post a v3 for those in a bit incorporating the comments in that > thread; the netback one was (unofficially?) acked by David Miller. I don''tYup. I see it now. Thx> see which of the other patches need to be parceled out as the files they > touch look like they are still being committed with only Xen developer > sign-offs. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel