Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH v2] Userspace grant communication
Split functionality and interface changes into distinct patches. After testing on PV, the phys_to_machine updates required to use pages after an unmap is more complex and costly than the existing MMU notifier method, so the new address-space remapping is only used when we have xen_feature(XENFEAT_auto_translated_physmap). [PATCH 1/6] xen-gntdev: Fix circular locking dependency [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH 1/6] xen-gntdev: Fix circular locking dependency
apply_to_page_range will acquire PTE lock while priv->lock is held, and mn_invl_range_start tries to acquire priv->lock with PTE already held. Fix by not holding priv->lock during the entire map operation. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 19 +++++++++---------- 1 files changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index a33e443..c582804 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) if (!(vma->vm_flags & VM_WRITE)) map->flags |= GNTMAP_readonly; + spin_unlock(&priv->lock); + err = apply_to_page_range(vma->vm_mm, vma->vm_start, vma->vm_end - vma->vm_start, find_grant_ptes, map); - if (err) { - goto unlock_out; - if (debug) - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__); - } + if (err) + return err; err = map_grant_pages(map); - if (err) { - goto unlock_out; - if (debug) - printk("%s: map_grant_pages() failure.\n", __FUNCTION__); - } + if (err) + return err; + map->is_mapped = 1; + return 0; + unlock_out: spin_unlock(&priv->lock); return err; -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
Because there is no limitation on how many times a user can open a given device file, an per-file-description limit on the number of pages granted offers little to no benefit. Change to a global limit and remove the ioctl() as the parameter can now be changed via sysfs. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 48 ++++++++++++++---------------------------------- 1 files changed, 14 insertions(+), 34 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index c582804..62639af 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver"); static int debug = 0; module_param(debug, int, 0644); -static int limit = 1024; +static int limit = 1024*1024; module_param(limit, int, 0644); +static atomic_t pages_mapped = ATOMIC_INIT(0); struct gntdev_priv { struct list_head maps; - uint32_t used; - uint32_t limit; spinlock_t lock; struct mm_struct *mm; struct mmu_notifier mn; @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv, { struct grant_map *map; - printk("%s: maps list (priv %p, usage %d/%d)\n", - __FUNCTION__, priv, priv->used, priv->limit); + printk("%s: maps list (priv %p)\n", + __FUNCTION__, priv); list_for_each_entry(map, &priv->maps, next) printk(" index %2d, count %2d %s\n", map->index, map->count, @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) add->count = count; add->priv = priv; - if (add->count + priv->used > priv->limit) - goto err; - return add; err: @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) list_add_tail(&add->next, &priv->maps); done: - priv->used += add->count; if (debug) gntdev_print_maps(priv, "[new]", add->index); } @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map) if (map->unmap_ops[i].handle) return -EBUSY; - map->priv->used -= map->count; + atomic_sub(map->count, &pages_mapped); list_del(&map->next); return 0; } @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip) INIT_LIST_HEAD(&priv->maps); spin_lock_init(&priv->lock); - priv->limit = limit; priv->mm = get_task_mm(current); if (!priv->mm) { @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, op.count); if (unlikely(op.count <= 0)) return -EINVAL; - if (unlikely(op.count > priv->limit)) - return -EINVAL; err = -ENOMEM; map = gntdev_alloc_map(priv, op.count); if (!map) return err; + if (copy_from_user(map->grants, &u->refs, sizeof(map->grants[0]) * op.count) != 0) { gntdev_free_map(map); return err; } + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) + { + if (debug) + printk("%s: can''t map: over limit\n", __FUNCTION__); + gntdev_free_map(map); + return err; + } + spin_lock(&priv->lock); gntdev_add_map(priv, map); op.index = map->index << PAGE_SHIFT; @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, return 0; } -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv, - struct ioctl_gntdev_set_max_grants __user *u) -{ - struct ioctl_gntdev_set_max_grants op; - - if (copy_from_user(&op, u, sizeof(op)) != 0) - return -EFAULT; - if (debug) - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count); - if (op.count > limit) - return -EINVAL; - - spin_lock(&priv->lock); - priv->limit = op.count; - spin_unlock(&priv->lock); - return 0; -} - static long gntdev_ioctl(struct file *flip, unsigned int cmd, unsigned long arg) { @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip, case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); - case IOCTL_GNTDEV_SET_MAX_GRANTS: - return gntdev_ioctl_set_max_grants(priv, ptr); - default: if (debug) printk("%s: priv %p, unknown cmd %x\n", -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
The entire hypercall argument list isn''t required; only selected fields from the hypercall need to be tracked between the ioctl, map, and unmap operations. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 195 +++++++++++++++++++++++++++++-------------------- 1 files changed, 115 insertions(+), 80 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 62639af..773b76c 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -55,17 +55,23 @@ struct gntdev_priv { struct mmu_notifier mn; }; +struct granted_page { + u64 pte_maddr; + union { + struct ioctl_gntdev_grant_ref target; + grant_handle_t handle; + }; +}; + struct grant_map { struct list_head next; struct gntdev_priv *priv; struct vm_area_struct *vma; int index; int count; - int flags; - int is_mapped; - struct ioctl_gntdev_grant_ref *grants; - struct gnttab_map_grant_ref *map_ops; - struct gnttab_unmap_grant_ref *unmap_ops; + int is_mapped:1; + int is_ro:1; + struct granted_page pages[0]; }; /* ------------------------------------------------------------------ */ @@ -83,40 +89,28 @@ static void gntdev_print_maps(struct gntdev_priv *priv, map->index == text_index && text ? text : ""); } -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) +static struct grant_map *gntdev_alloc_map(int count, + struct ioctl_gntdev_grant_ref* grants) { struct grant_map *add; + int i; - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL); - if (NULL == add) + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL); + if (!add) return NULL; - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL); - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL); - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL); - if (NULL == add->grants || - NULL == add->map_ops || - NULL == add->unmap_ops) - goto err; - - add->index = 0; add->count = count; - add->priv = priv; + for(i = 0; i < count; i++) + add->pages[i].target = grants[i]; return add; - -err: - kfree(add->grants); - kfree(add->map_ops); - kfree(add->unmap_ops); - kfree(add); - return NULL; } static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) { struct grant_map *map; + spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { if (add->index + add->count < map->index) { list_add_tail(&add->next, &map->next); @@ -127,8 +121,10 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) list_add_tail(&add->next, &priv->maps); done: + add->priv = priv; if (debug) gntdev_print_maps(priv, "[new]", add->index); + spin_unlock(&priv->lock); } static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, @@ -169,9 +165,10 @@ static int gntdev_del_map(struct grant_map *map) if (map->vma) return -EBUSY; - for (i = 0; i < map->count; i++) - if (map->unmap_ops[i].handle) - return -EBUSY; + if (map->is_mapped) + for (i = 0; i < map->count; i++) + if (map->pages[i].handle) + return -EBUSY; atomic_sub(map->count, &pages_mapped); list_del(&map->next); @@ -182,9 +179,6 @@ static void gntdev_free_map(struct grant_map *map) { if (!map) return; - kfree(map->grants); - kfree(map->map_ops); - kfree(map->unmap_ops); kfree(map); } @@ -199,51 +193,91 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void BUG_ON(pgnr >= map->count); pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT; pte_maddr += (unsigned long)pte & ~PAGE_MASK; - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags, - map->grants[pgnr].ref, - map->grants[pgnr].domid); - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags, - 0 /* handle */); + map->pages[pgnr].pte_maddr = pte_maddr; + return 0; } static int map_grant_pages(struct grant_map *map) { - int i, err = 0; + int i, flags, err = 0; + struct gnttab_map_grant_ref* map_ops; + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; + if (map->is_ro) + flags |= GNTMAP_readonly; + + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); + if (!map_ops) + return -ENOMEM; + + for(i=0; i < map->count; i++) + gnttab_set_map_op(&map_ops[i], map->pages[i].pte_maddr, flags, + map->pages[i].target.ref, + map->pages[i].target.domid); if (debug) printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count); err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, - map->map_ops, map->count); + map_ops, map->count); if (WARN_ON(err)) - return err; + goto out; for (i = 0; i < map->count; i++) { - if (map->map_ops[i].status) + if (map_ops[i].status) { + map->pages[i].pte_maddr = 0; err = -EINVAL; - map->unmap_ops[i].handle = map->map_ops[i].handle; + } else { + map->pages[i].handle = map_ops[i].handle; + } } + +out: + kfree(map_ops); return err; } -static int unmap_grant_pages(struct grant_map *map, int offset, int pages) +static void unmap_grant_pages(struct grant_map *map, int offset, int pages) { - int i, err = 0; + int i, flags, err = 0; + struct gnttab_unmap_grant_ref *unmap_ops; + struct gnttab_unmap_grant_ref unmap_single; + + if (pages > 1) { + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages, + GFP_TEMPORARY); + if (unlikely(!unmap_ops)) { + for(i=0; i < pages; i++) + unmap_grant_pages(map, offset + i, 1); + return; + } + } else { + unmap_ops = &unmap_single; + } + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; + if (map->is_ro) + flags |= GNTMAP_readonly; + + for(i=0; i < pages; i++) + gnttab_set_unmap_op(&unmap_ops[i], + map->pages[offset+i].pte_maddr, flags, + map->pages[offset+i].handle); if (debug) printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__, map->index, map->count, offset, pages); err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, - map->unmap_ops + offset, pages); + unmap_ops, pages); if (WARN_ON(err)) - return err; + goto out; for (i = 0; i < pages; i++) { - if (map->unmap_ops[offset+i].status) - err = -EINVAL; - map->unmap_ops[offset+i].handle = 0; + WARN_ON(unmap_ops[i].status); + map->pages[offset+i].pte_maddr = 0; + map->pages[offset+i].handle = 0; } - return err; +out: + if (unmap_ops != &unmap_single) + kfree(unmap_ops); } /* ------------------------------------------------------------------ */ @@ -282,7 +316,6 @@ static void mn_invl_range_start(struct mmu_notifier *mn, struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); struct grant_map *map; unsigned long mstart, mend; - int err; spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { @@ -301,10 +334,9 @@ static void mn_invl_range_start(struct mmu_notifier *mn, __FUNCTION__, map->index, map->count, map->vma->vm_start, map->vma->vm_end, start, end, mstart, mend); - err = unmap_grant_pages(map, - (mstart - map->vma->vm_start) >> PAGE_SHIFT, - (mend - mstart) >> PAGE_SHIFT); - WARN_ON(err); + unmap_grant_pages(map, + (mstart - map->vma->vm_start) >> PAGE_SHIFT, + (mend - mstart) >> PAGE_SHIFT); } spin_unlock(&priv->lock); } @@ -321,7 +353,6 @@ static void mn_release(struct mmu_notifier *mn, { struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); struct grant_map *map; - int err; spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { @@ -331,8 +362,7 @@ static void mn_release(struct mmu_notifier *mn, printk("%s: map %d+%d (%lx %lx)\n", __FUNCTION__, map->index, map->count, map->vma->vm_start, map->vma->vm_end); - err = unmap_grant_pages(map, 0, map->count); - WARN_ON(err); + unmap_grant_pages(map, 0, map->count); } spin_unlock(&priv->lock); } @@ -401,6 +431,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, { struct ioctl_gntdev_map_grant_ref op; struct grant_map *map; + struct ioctl_gntdev_grant_ref* grants; int err; if (copy_from_user(&op, u, sizeof(op)) != 0) @@ -411,38 +442,45 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, if (unlikely(op.count <= 0)) return -EINVAL; + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); + if (!grants) + return -ENOMEM; + + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) { + err = -EFAULT; + goto out_free; + } + err = -ENOMEM; - map = gntdev_alloc_map(priv, op.count); + map = gntdev_alloc_map(op.count, grants); if (!map) - return err; - - if (copy_from_user(map->grants, &u->refs, - sizeof(map->grants[0]) * op.count) != 0) { - gntdev_free_map(map); - return err; - } + goto out_free; if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) { if (debug) printk("%s: can''t map: over limit\n", __FUNCTION__); - gntdev_free_map(map); - return err; + goto out_free_map; } - spin_lock(&priv->lock); gntdev_add_map(priv, map); op.index = map->index << PAGE_SHIFT; - spin_unlock(&priv->lock); - if (copy_to_user(u, &op, sizeof(op)) != 0) { - spin_lock(&priv->lock); - gntdev_del_map(map); - spin_unlock(&priv->lock); - gntdev_free_map(map); - return err; + if (copy_to_user(u, &op, sizeof(op))) { + err = -EFAULT; + goto out_remove; } - return 0; + err = 0; +out_free: + kfree(grants); + return err; +out_remove: + spin_lock(&priv->lock); + gntdev_del_map(map); + spin_unlock(&priv->lock); +out_free_map: + gntdev_free_map(map); + goto out_free; } static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, @@ -556,10 +594,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) vma->vm_private_data = map; map->vma = vma; - - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; - if (!(vma->vm_flags & VM_WRITE)) - map->flags |= GNTMAP_readonly; + map->is_ro = !(vma->vm_flags & VM_WRITE); spin_unlock(&priv->lock); -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
This should be faster if many mappings exist, and also removes the only user of vma that is not related to PTE modification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 34 ++++++++++------------------------ 1 files changed, 10 insertions(+), 24 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 773b76c..a73f07c 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -74,6 +74,8 @@ struct grant_map { struct granted_page pages[0]; }; +static struct vm_operations_struct gntdev_vmops; + /* ------------------------------------------------------------------ */ static void gntdev_print_maps(struct gntdev_priv *priv, @@ -142,23 +144,6 @@ static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int ind return NULL; } -static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv, - unsigned long vaddr) -{ - struct grant_map *map; - - list_for_each_entry(map, &priv->maps, next) { - if (!map->vma) - continue; - if (vaddr < map->vma->vm_start) - continue; - if (vaddr >= map->vma->vm_end) - continue; - return map; - } - return NULL; -} - static int gntdev_del_map(struct grant_map *map) { int i; @@ -510,6 +495,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, struct ioctl_gntdev_get_offset_for_vaddr __user *u) { struct ioctl_gntdev_get_offset_for_vaddr op; + struct vm_area_struct *vma; struct grant_map *map; if (copy_from_user(&op, u, sizeof(op)) != 0) @@ -518,16 +504,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv, (unsigned long)op.vaddr); - spin_lock(&priv->lock); - map = gntdev_find_map_vaddr(priv, op.vaddr); - if (map == NULL || - map->vma->vm_start != op.vaddr) { - spin_unlock(&priv->lock); + vma = find_vma(current->mm, op.vaddr); + if (!vma) return -EINVAL; - } + + map = vma->vm_private_data; + if (vma->vm_ops != &gntdev_vmops || !map) + return -EINVAL; + op.offset = map->index << PAGE_SHIFT; op.count = map->count; - spin_unlock(&priv->lock); if (copy_to_user(u, &op, sizeof(op)) != 0) return -EFAULT; -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
This allows a userspace application to allocate a shared page for implementing inter-domain communication or device drivers. These shared pages can be mapped using the gntdev device or by the kernel in another domain. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/Kconfig | 7 + drivers/xen/Makefile | 2 + drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++ include/xen/gntalloc.h | 68 +++++++ 4 files changed, 533 insertions(+), 0 deletions(-) create mode 100644 drivers/xen/gntalloc.c create mode 100644 include/xen/gntalloc.h diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index fa9982e..8398cb0 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -180,6 +180,13 @@ config XEN_GNTDEV help Allows userspace processes use grants. +config XEN_GRANT_DEV_ALLOC + tristate "User-space grant reference allocator driver" + depends on XEN + help + Allows userspace processes to create pages with access granted + to other domains. + config XEN_S3 def_bool y depends on XEN_DOM0 && ACPI diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index ef1ea63..9814c1d 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o obj-$(CONFIG_XEN_BALLOON) += balloon.o obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/ @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o +xen-gntalloc-y := gntalloc.o diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c new file mode 100644 index 0000000..f26adfd --- /dev/null +++ b/drivers/xen/gntalloc.c @@ -0,0 +1,456 @@ +/****************************************************************************** + * gntalloc.c + * + * Device for creating grant references (in user-space) that may be shared + * with other domains. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +/* + * This driver exists to allow userspace programs in Linux to allocate kernel + * memory that will later be shared with another domain. Without this device, + * Linux userspace programs cannot create grant references. + * + * How this stuff works: + * X -> granting a page to Y + * Y -> mapping the grant from X + * + * 1. X uses the gntalloc device to allocate a page of kernel memory, P. + * 2. X creates an entry in the grant table that says domid(Y) can + * access P. + * 3. X gives the grant reference identifier, GREF, to Y. + * 4. A program in Y uses the gntdev device to map the page (owned by X + * and identified by GREF) into domain(Y) and then into the address + * space of the program. Behind the scenes, this requires a + * hypercall in which Xen modifies the host CPU page tables to + * perform the sharing -- that''s where the actual cross-domain mapping + * occurs. + * 5. A program in X mmap()s a segment of the gntalloc device that + * corresponds to the shared page. + * 6. The two userspace programs can now communicate over the shared page. + * + * + * NOTE TO USERSPACE LIBRARIES: + * The grant allocation and mmap()ing are, naturally, two separate + * operations. You set up the sharing by calling the create ioctl() and + * then the mmap(). You must tear down the sharing in the reverse order + * (munmap() and then the destroy ioctl()). + * + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant + * reference, this device can be used to consume kernel memory by leaving grant + * references mapped by another domain when an application exits. Therefore, + * there is a global limit on the number of pages that can be allocated. When + * all references to the page are unmapped, it will be freed during the next + * grant operation. + */ + +#include <asm/atomic.h> +#include <linux/module.h> +#include <linux/miscdevice.h> +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/slab.h> +#include <linux/fs.h> +#include <linux/device.h> +#include <linux/mm.h> +#include <asm/uaccess.h> +#include <linux/types.h> +#include <linux/list.h> + +#include <xen/xen.h> +#include <xen/page.h> +#include <xen/grant_table.h> +#include <xen/gntalloc.h> + +static int debug = 0; +module_param(debug, int, 0644); + +static int limit = 1024; +module_param(limit, int, 0644); + +static LIST_HEAD(gref_list); +static DEFINE_SPINLOCK(gref_lock); +static int gref_size = 0; + +/* Metadata on a grant reference. */ +struct gntalloc_gref { + struct list_head next_all; /* list entry gref_list */ + struct list_head next_file; /* list entry file->list, if open */ + domid_t foreign_domid; /* The ID of the domain to share with. */ + grant_ref_t gref_id; /* The grant reference number. */ + unsigned int users; /* Use count - when zero, waiting on Xen */ + struct page* page; /* The shared page. */ +}; + +struct gntalloc_file_private_data { + struct list_head list; +}; + +static void __del_gref(struct gntalloc_gref *gref); + +static void do_cleanup(void) +{ + struct gntalloc_gref *gref, *n; + list_for_each_entry_safe(gref, n, &gref_list, next_all) { + if (!gref->users) + __del_gref(gref); + } +} + + +static int add_gref(domid_t foreign_domid, uint32_t readonly, + struct gntalloc_file_private_data *priv) +{ + int rc; + struct gntalloc_gref *gref; + + rc = -ENOMEM; + spin_lock(&gref_lock); + do_cleanup(); + if (gref_size >= limit) { + spin_unlock(&gref_lock); + rc = -ENOSPC; + goto out; + } + gref_size++; + spin_unlock(&gref_lock); + + gref = kzalloc(sizeof(*gref), GFP_KERNEL); + if (!gref) + goto out; + + gref->foreign_domid = foreign_domid; + gref->users = 1; + + /* Allocate the page to share. */ + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO); + if (!gref->page) + goto out_nopage; + + /* Grant foreign access to the page. */ + gref->gref_id = gnttab_grant_foreign_access(foreign_domid, + pfn_to_mfn(page_to_pfn(gref->page)), readonly); + if (gref->gref_id < 0) { + printk(KERN_ERR "%s: failed to grant foreign access for mfn " + "%lu to domain %u\n", __func__, + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid); + rc = -EFAULT; + goto out_no_foreign_gref; + } + + /* Add to gref lists. */ + spin_lock(&gref_lock); + list_add_tail(&gref->next_all, &gref_list); + list_add_tail(&gref->next_file, &priv->list); + spin_unlock(&gref_lock); + + return gref->gref_id; + +out_no_foreign_gref: + __free_page(gref->page); +out_nopage: + kfree(gref); +out: + return rc; +} + +static void __del_gref(struct gntalloc_gref *gref) +{ + if (gnttab_query_foreign_access(gref->gref_id)) + return; + + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0)) + return; + + gref_size--; + list_del(&gref->next_all); + + __free_page(gref->page); + kfree(gref); +} + +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv, + grant_ref_t gref_id) +{ + struct gntalloc_gref *gref; + list_for_each_entry(gref, &priv->list, next_file) { + if (gref->gref_id == gref_id) + return gref; + } + return NULL; +} + +/* + * ------------------------------------- + * File operations. + * ------------------------------------- + */ +static int gntalloc_open(struct inode *inode, struct file *filp) +{ + struct gntalloc_file_private_data *priv; + + try_module_get(THIS_MODULE); + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + goto out_nomem; + INIT_LIST_HEAD(&priv->list); + + filp->private_data = priv; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + return 0; + +out_nomem: + return -ENOMEM; +} + +static int gntalloc_release(struct inode *inode, struct file *filp) +{ + struct gntalloc_file_private_data *priv = filp->private_data; + struct gntalloc_gref *gref; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + spin_lock(&gref_lock); + while (!list_empty(&priv->list)) { + gref = list_entry(priv->list.next, + struct gntalloc_gref, next_file); + list_del(&gref->next_file); + gref->users--; + if (gref->users == 0) + __del_gref(gref); + } + kfree(priv); + spin_unlock(&gref_lock); + + module_put(THIS_MODULE); + + return 0; +} + +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, + void __user *arg) +{ + int rc = 0; + struct ioctl_gntalloc_alloc_gref op; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + if (copy_from_user(&op, arg, sizeof(op))) { + rc = -EFAULT; + goto alloc_grant_out; + } + rc = add_gref(op.foreign_domid, op.readonly, priv); + if (rc < 0) + goto alloc_grant_out; + + op.gref_id = rc; + op.page_idx = rc; + + rc = 0; + + if (copy_to_user((void __user *)arg, &op, sizeof(op))) { + rc = -EFAULT; + goto alloc_grant_out; + } + +alloc_grant_out: + return rc; +} + +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, + void __user *arg) +{ + int rc = 0; + struct ioctl_gntalloc_dealloc_gref op; + struct gntalloc_gref *gref; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + if (copy_from_user(&op, arg, sizeof(op))) { + rc = -EFAULT; + goto dealloc_grant_out; + } + + spin_lock(&gref_lock); + gref = find_gref(priv, op.gref_id); + if (gref) { + list_del(&gref->next_file); + gref->users--; + rc = 0; + } else { + rc = -EINVAL; + } + + do_cleanup(); + spin_unlock(&gref_lock); +dealloc_grant_out: + return rc; +} + +static long gntalloc_ioctl(struct file *filp, unsigned int cmd, + unsigned long arg) +{ + struct gntalloc_file_private_data *priv = filp->private_data; + + switch (cmd) { + case IOCTL_GNTALLOC_ALLOC_GREF: + return gntalloc_ioctl_alloc(priv, (void __user*)arg); + + case IOCTL_GNTALLOC_DEALLOC_GREF: + return gntalloc_ioctl_dealloc(priv, (void __user*)arg); + + default: + return -ENOIOCTLCMD; + } + + return 0; +} + +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct gntalloc_gref *gref = vma->vm_private_data; + if (!gref) + return VM_FAULT_SIGBUS; + + vmf->page = gref->page; + get_page(vmf->page); + + return 0; +}; + +static void gntalloc_vma_close(struct vm_area_struct *vma) +{ + struct gntalloc_gref *gref = vma->vm_private_data; + if (!gref) + return; + + spin_lock(&gref_lock); + gref->users--; + if (gref->users == 0) + __del_gref(gref); + spin_unlock(&gref_lock); +} + +static struct vm_operations_struct gntalloc_vmops = { + .fault = gntalloc_vma_fault, + .close = gntalloc_vma_close, +}; + +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct gntalloc_file_private_data *priv = filp->private_data; + struct gntalloc_gref *gref; + + if (debug) + printk("%s: priv %p, page %lu\n", __func__, + priv, vma->vm_pgoff); + + /* + * There is a 1-to-1 correspondence of grant references to shared + * pages, so it only makes sense to map exactly one page per + * call to mmap(). + */ + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) { + printk(KERN_ERR "%s: Only one page can be memory-mapped " + "per grant reference.\n", __func__); + return -EINVAL; + } + + if (!(vma->vm_flags & VM_SHARED)) { + printk(KERN_ERR "%s: Mapping must be shared.\n", + __func__); + return -EINVAL; + } + + spin_lock(&gref_lock); + gref = find_gref(priv, vma->vm_pgoff); + if (gref == NULL) { + spin_unlock(&gref_lock); + printk(KERN_ERR "%s: Could not find a grant reference with " + "page index %lu.\n", __func__, vma->vm_pgoff); + return -ENOENT; + } + gref->users++; + spin_unlock(&gref_lock); + + vma->vm_private_data = gref; + + /* This flag prevents Bad PTE errors when the memory is unmapped. */ + vma->vm_flags |= VM_RESERVED; + vma->vm_flags |= VM_DONTCOPY; + vma->vm_flags |= VM_IO; + + vma->vm_ops = &gntalloc_vmops; + + return 0; +} + +static const struct file_operations gntalloc_fops = { + .owner = THIS_MODULE, + .open = gntalloc_open, + .release = gntalloc_release, + .unlocked_ioctl = gntalloc_ioctl, + .mmap = gntalloc_mmap +}; + +/* + * ------------------------------------- + * Module creation/destruction. + * ------------------------------------- + */ +static struct miscdevice gntalloc_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "xen/gntalloc", + .fops = &gntalloc_fops, +}; + +static int __init gntalloc_init(void) +{ + int err; + + if (!xen_domain()) { + if (debug) + printk(KERN_ERR "gntalloc: You must be running Xen\n"); + return -ENODEV; + } + + err = misc_register(&gntalloc_miscdev); + if (err != 0) { + printk(KERN_ERR "Could not register misc gntalloc device\n"); + return err; + } + + if (debug) + printk(KERN_INFO "Created grant allocation device at %d,%d\n", + MISC_MAJOR, gntalloc_miscdev.minor); + + return 0; +} + +static void __exit gntalloc_exit(void) +{ + misc_deregister(&gntalloc_miscdev); +} + +module_init(gntalloc_init); +module_exit(gntalloc_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, " + "Daniel De Graaf <dgdegra@tycho.nsa.gov>"); +MODULE_DESCRIPTION("User-space grant reference allocator driver"); diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h new file mode 100644 index 0000000..76b70d7 --- /dev/null +++ b/include/xen/gntalloc.h @@ -0,0 +1,68 @@ +/****************************************************************************** + * gntalloc.h + * + * Interface to /dev/xen/gntalloc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef __LINUX_PUBLIC_GNTALLOC_H__ +#define __LINUX_PUBLIC_GNTALLOC_H__ + +/* + * Allocates a new page and creates a new grant reference. + * + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it''s the + * page number and not an actual address. It must be shifted again prior + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT). + */ +#define IOCTL_GNTALLOC_ALLOC_GREF \ +_IOC(_IOC_NONE, ''G'', 1, sizeof(struct ioctl_gntalloc_alloc_gref)) +struct ioctl_gntalloc_alloc_gref { + /* IN parameters */ + /* The ID of the domain creating the grant reference. */ + domid_t owner_domid; + /* The ID of the domain to be given access to the grant. */ + domid_t foreign_domid; + /* The type of access given to domid. */ + uint32_t readonly; + /* OUT parameters */ + /* The grant reference of the newly created grant. */ + grant_ref_t gref_id; + /* The page index (page number, NOT address) for grant mmap(). */ + uint32_t page_idx; +}; + +/* + * Deallocates the grant reference, freeing the associated page. + */ +#define IOCTL_GNTALLOC_DEALLOC_GREF \ +_IOC(_IOC_NONE, ''G'', 2, sizeof(struct ioctl_gntalloc_dealloc_gref)) +struct ioctl_gntalloc_dealloc_gref { + /* IN parameter */ + /* The grant reference to deallocate. */ + grant_ref_t gref_id; +}; +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */ -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 14:55 UTC
[Xen-devel] [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev
HVM does not allow direct PTE modification, so guest pages must be allocated and used for grant mappings. If Xen does not provide an auto-translated physmap, the existing direct PTE modification is more efficient. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/Makefile | 2 + drivers/xen/gntdev-hvm.c | 601 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/xen/gntdev.c | 3 + 3 files changed, 606 insertions(+), 0 deletions(-) create mode 100644 drivers/xen/gntdev-hvm.c diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 9814c1d..ab0e6eb 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o obj-$(CONFIG_XEN_BALLOON) += balloon.o obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o +obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev-hvm.o obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ @@ -26,4 +27,5 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o +xen-gntdev-hvm-y := gntdev-hvm.o xen-gntalloc-y := gntalloc.o diff --git a/drivers/xen/gntdev-hvm.c b/drivers/xen/gntdev-hvm.c new file mode 100644 index 0000000..331d5af --- /dev/null +++ b/drivers/xen/gntdev-hvm.c @@ -0,0 +1,601 @@ +/****************************************************************************** + * gntdev.c + * + * Device for accessing (in user-space) pages that have been granted by other + * domains. + * + * Copyright (c) 2006-2007, D G Murray. + * (c) 2009 Gerd Hoffmann <kraxel@redhat.com> + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/module.h> +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/mman.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <linux/sched.h> +#include <linux/spinlock.h> +#include <linux/vmalloc.h> + +#include <xen/xen.h> +#include <xen/grant_table.h> +#include <xen/gntdev.h> +#include <xen/interface/memory.h> +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> +#include <asm/xen/page.h> + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Derek G. Murray <Derek.Murray@cl.cam.ac.uk>, " + "Gerd Hoffmann <kraxel@redhat.com>"); +MODULE_DESCRIPTION("User-space granted page access driver"); + +static int debug = 0; +module_param(debug, int, 0644); +static int limit = 1024*1024; +module_param(limit, int, 0644); +static atomic_t pages_mapped = ATOMIC_INIT(0); + +struct gntdev_priv { + struct list_head maps; + spinlock_t lock; +}; + +struct granted_page { + struct page* page; + union { + struct ioctl_gntdev_grant_ref target; + grant_handle_t handle; + }; +}; + +struct grant_map { + struct list_head next; + int index; + int count; + atomic_t users; + int is_mapped:1; + int is_ro:1; + struct granted_page pages[0]; +}; + +static struct vm_operations_struct gntdev_vmops; + +/* ------------------------------------------------------------------ */ + +static void gntdev_print_maps(struct gntdev_priv *priv, + char *text, int text_index) +{ + struct grant_map *map; + + printk("%s: maps list (priv %p)\n", __FUNCTION__, priv); + list_for_each_entry(map, &priv->maps, next) { + printk(" %p: %2d+%2d, r%c, %s %d,%d %s\n", map, + map->index, map->count, map->is_ro ? ''o'' : ''w'', + map->is_mapped ? "use,hnd" : "dom,ref", + map->is_mapped ? atomic_read(&map->users) + : map->pages[0].target.domid, + map->is_mapped ? map->pages[0].handle + : map->pages[0].target.ref, + map->index == text_index && text ? text : ""); + } +} + +static struct grant_map *gntdev_alloc_map(int count, + struct ioctl_gntdev_grant_ref* grants) +{ + struct grant_map *add; + int i; + + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL); + if (!add) + return NULL; + + atomic_set(&add->users, 1); + add->count = count; + + for(i = 0; i < count; i++) + add->pages[i].target = grants[i]; + + return add; +} + +static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) +{ + struct grant_map *map; + + spin_lock(&priv->lock); + + list_for_each_entry(map, &priv->maps, next) { + if (add->index + add->count < map->index) { + list_add_tail(&add->next, &map->next); + goto done; + } + add->index = map->index + map->count; + } + list_add_tail(&add->next, &priv->maps); + +done: + if (debug) + gntdev_print_maps(priv, "[new]", add->index); + + spin_unlock(&priv->lock); +} + +static void __gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) +{ + list_del(&map->next); +} + +static void gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) +{ + spin_lock(&priv->lock); + __gntdev_del_map(priv, map); + spin_unlock(&priv->lock); +} + +static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, + int count) +{ + struct grant_map *map; + + list_for_each_entry(map, &priv->maps, next) { + if (map->index != index) + continue; + if (map->count != count) + continue; + return map; + } + return NULL; +} + +static void gntdev_unmap_fast(struct grant_map *map, + struct gnttab_unmap_grant_ref *unmap_ops) +{ + int err, flags, i, unmap_size = 0; + unsigned long pfn; + phys_addr_t mfn; + + flags = GNTMAP_host_map; + if (map->is_ro) + flags |= GNTMAP_readonly; + + for (i=0; i < map->count; i++) { + if (!map->pages[i].page) + continue; + pfn = page_to_pfn(map->pages[i].page); + mfn = (phys_addr_t)pfn_to_kaddr(pfn); + gnttab_set_unmap_op(&unmap_ops[unmap_size], mfn, flags, + map->pages[i].handle); + unmap_size++; + } + + err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, + unmap_ops, unmap_size); + WARN_ON(err); + + for (i = 0; i < unmap_size; i++) + WARN_ON(unmap_ops[i].status); +} + +// for the out-of-memory case +static void gntdev_unmap_slow(struct grant_map *map) +{ + int err, flags, i; + unsigned long pfn; + phys_addr_t mfn; + struct gnttab_unmap_grant_ref unmap_op; + + flags = GNTMAP_host_map; + if (map->is_ro) + flags |= GNTMAP_readonly; + + for (i=0; i < map->count; i++) { + if (!map->pages[i].page) + continue; + + pfn = page_to_pfn(map->pages[i].page); + mfn = (phys_addr_t)pfn_to_kaddr(pfn); + gnttab_set_unmap_op(&unmap_op, mfn, flags, map->pages[i].handle); + err = HYPERVISOR_grant_table_op( + GNTTABOP_unmap_grant_ref, &unmap_op, 1); + WARN_ON(err); + WARN_ON(unmap_op.status); + } +} + +static void gntdev_put_map(struct grant_map *map) +{ + struct gnttab_unmap_grant_ref *unmap_ops; + int i; + if (!map) + return; + if (!atomic_dec_and_test(&map->users)) + return; + if (debug) + printk("%s: unmap %p (%d pages)\n", __FUNCTION__, map, map->count); + if (map->is_mapped) { + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * map->count, + GFP_TEMPORARY); + if (likely(unmap_ops)) { + gntdev_unmap_fast(map, unmap_ops); + kfree(unmap_ops); + } else { + gntdev_unmap_slow(map); + } + + atomic_sub(map->count, &pages_mapped); + } + for (i=0; i < map->count; i++) + if (map->pages[i].page) + __free_page(map->pages[i].page); + kfree(map); +} + +static int gntdev_open(struct inode *inode, struct file *flip) +{ + struct gntdev_priv *priv; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + INIT_LIST_HEAD(&priv->maps); + spin_lock_init(&priv->lock); + + flip->private_data = priv; + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + return 0; +} + +static int gntdev_release(struct inode *inode, struct file *flip) +{ + struct gntdev_priv *priv = flip->private_data; + struct grant_map *map; + + if (debug) { + printk("%s: priv %p\n", __FUNCTION__, priv); + gntdev_print_maps(priv, NULL, 0); + } + + spin_lock(&priv->lock); + while (!list_empty(&priv->maps)) { + map = list_entry(priv->maps.next, struct grant_map, next); + list_del(&map->next); + gntdev_put_map(map); + } + spin_unlock(&priv->lock); + + kfree(priv); + return 0; +} + +static int gntdev_do_map(struct grant_map *map) +{ + int err, flags, i; + struct page* page; + phys_addr_t mfn; + struct gnttab_map_grant_ref* map_ops; + + flags = GNTMAP_host_map; + if (map->is_ro) + flags |= GNTMAP_readonly; + + err = -ENOMEM; + + if (unlikely(atomic_add_return(map->count, &pages_mapped) > limit)) { + if (debug) + printk("%s: maps full\n", __FUNCTION__); + goto out; + } + + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); + if (!map_ops) + goto out; + + for (i = 0; i < map->count; i++) { + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO); + if (unlikely(!page)) + goto out_free; + map->pages[i].page = page; + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(page)); + gnttab_set_map_op(&map_ops[i], mfn, flags, + map->pages[i].target.ref, + map->pages[i].target.domid); + } + err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, + map_ops, map->count); + if (WARN_ON(err)) + goto out_free; + + if (debug && map->count) + printk("%s: mapped first at gfn=%lx mfn=%lx\n", __func__, + page_to_pfn(map->pages[0].page), pfn_to_mfn(page_to_pfn(map->pages[0].page))); + + map->is_mapped = 1; + + for (i = 0; i < map->count; i++) { + if (map_ops[i].status) { + if (debug) + printk("%s: failed map at page %d: stat=%d\n", + __FUNCTION__, i, map_ops[i].status); + __free_page(map->pages[i].page); + map->pages[i].page = NULL; + err = -EINVAL; + } else { + map->pages[i].handle = map_ops[i].handle; + } + } + +out_free: + kfree(map_ops); +out: + if (!map->is_mapped) + atomic_sub(map->count, &pages_mapped); + return err; +} + +static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, + struct ioctl_gntdev_map_grant_ref __user *u) +{ + struct ioctl_gntdev_map_grant_ref op; + struct grant_map *map; + struct ioctl_gntdev_grant_ref* grants; + int err; + + if (copy_from_user(&op, u, sizeof(op)) != 0) + return -EFAULT; + if (debug) + printk("%s: priv %p, add %d\n", __FUNCTION__, priv, + op.count); + if (unlikely(op.count <= 0)) + return -EINVAL; + + err = -ENOMEM; + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); + if (!grants) + goto out_fail; + + err = -EFAULT; + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) + goto out_free; + + map = gntdev_alloc_map(op.count, grants); + if (!map) + goto out_free; + + gntdev_add_map(priv, map); + + op.index = map->index << PAGE_SHIFT; + + err = -EFAULT; + if (copy_to_user(u, &op, sizeof(op)) != 0) + goto out_remove; + + err = 0; + +out_free: + kfree(grants); +out_fail: + return err; + +out_remove: + gntdev_del_map(priv, map); + gntdev_put_map(map); + goto out_free; +} + +static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, + struct ioctl_gntdev_unmap_grant_ref __user *u) +{ + struct ioctl_gntdev_unmap_grant_ref op; + struct grant_map *map; + int err = 0; + + if (copy_from_user(&op, u, sizeof(op)) != 0) + return -EFAULT; + + spin_lock(&priv->lock); + map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count); + if (map) { + __gntdev_del_map(priv, map); + } else + err = -EINVAL; + spin_unlock(&priv->lock); + + if (debug) + printk("%s: priv %p, del %d+%d = %p\n", __FUNCTION__, priv, + (int)op.index, (int)op.count, map); + + gntdev_put_map(map); + return err; +} + +static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, + struct ioctl_gntdev_get_offset_for_vaddr __user *u) +{ + struct ioctl_gntdev_get_offset_for_vaddr op; + struct vm_area_struct *vma; + struct grant_map *map; + + if (copy_from_user(&op, u, sizeof(op)) != 0) + return -EFAULT; + if (debug) + printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv, + (unsigned long)op.vaddr); + + vma = find_vma(current->mm, op.vaddr); + if (!vma) + return -EINVAL; + + map = vma->vm_private_data; + if (vma->vm_ops != &gntdev_vmops || !map) + return -EINVAL; + + op.offset = map->index << PAGE_SHIFT; + op.count = map->count; + + if (copy_to_user(u, &op, sizeof(op)) != 0) + return -EFAULT; + return 0; +} + +static long gntdev_ioctl(struct file *flip, + unsigned int cmd, unsigned long arg) +{ + struct gntdev_priv *priv = flip->private_data; + void __user *ptr = (void __user *)arg; + + switch (cmd) { + case IOCTL_GNTDEV_MAP_GRANT_REF: + return gntdev_ioctl_map_grant_ref(priv, ptr); + + case IOCTL_GNTDEV_UNMAP_GRANT_REF: + return gntdev_ioctl_unmap_grant_ref(priv, ptr); + + case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: + return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); + + default: + if (debug) + printk("%s: priv %p, unknown cmd %x\n", + __FUNCTION__, priv, cmd); + return -ENOIOCTLCMD; + } + + return 0; +} + +static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct grant_map *map = vma->vm_private_data; + pgoff_t pgoff = vmf->pgoff - vma->vm_pgoff; + + if (!map || !map->is_mapped || pgoff < 0 || pgoff > map->count) { + if (debug) + printk("%s: vaddr %p, pgoff %ld (shouldn''t happen)\n", + __FUNCTION__, vmf->virtual_address, pgoff); + return VM_FAULT_SIGBUS; + } + + vmf->page = map->pages[pgoff].page; + get_page(vmf->page); + return 0; +} + +static void gntdev_vma_close(struct vm_area_struct *vma) +{ + struct grant_map *map = vma->vm_private_data; + gntdev_put_map(map); +} + +static struct vm_operations_struct gntdev_vmops = { + .fault = gntdev_vma_fault, + .close = gntdev_vma_close, +}; + +static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) +{ + struct gntdev_priv *priv = flip->private_data; + int index = vma->vm_pgoff; + int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + struct grant_map *map; + int err = -EINVAL; + + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + spin_lock(&priv->lock); + map = gntdev_find_map_index(priv, index, count); + + if (debug) + printk("%s: map %d+%d at %lx (priv %p, map %p)\n", __func__, + index, count, vma->vm_start, priv, map); + + if (!map) + goto unlock_out; + + if (!map->is_mapped) { + map->is_ro = !(vma->vm_flags & VM_WRITE); + err = gntdev_do_map(map); + if (err) + goto unlock_out; + } + + if ((vma->vm_flags & VM_WRITE) && map->is_ro) + goto unlock_out; + + err = 0; + vma->vm_ops = &gntdev_vmops; + + vma->vm_flags |= VM_RESERVED; + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_flags |= VM_IO; + + vma->vm_private_data = map; + + atomic_inc(&map->users); + +unlock_out: + spin_unlock(&priv->lock); + return err; +} + +static const struct file_operations gntdev_fops = { + .owner = THIS_MODULE, + .open = gntdev_open, + .release = gntdev_release, + .mmap = gntdev_mmap, + .unlocked_ioctl = gntdev_ioctl +}; + +static struct miscdevice gntdev_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "xen/gntdev", + .fops = &gntdev_fops, +}; + +/* ------------------------------------------------------------------ */ + +static int __init gntdev_init(void) +{ + int err; + + if (!xen_domain()) + return -ENODEV; + + if (!xen_feature(XENFEAT_auto_translated_physmap)) + return -ENODEV; + + err = misc_register(&gntdev_miscdev); + if (err != 0) { + printk(KERN_ERR "Could not register gntdev device\n"); + return err; + } + return 0; +} + +static void __exit gntdev_exit(void) +{ + misc_deregister(&gntdev_miscdev); +} + +module_init(gntdev_init); +module_exit(gntdev_exit); + +/* ------------------------------------------------------------------ */ diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index a73f07c..f6b98c0 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -626,6 +626,9 @@ static int __init gntdev_init(void) if (!xen_domain()) return -ENODEV; + if (xen_feature(XENFEAT_auto_translated_physmap)) + return -ENODEV; + err = misc_register(&gntdev_miscdev); if (err != 0) { printk(KERN_ERR "Could not register gntdev device\n"); -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:11 UTC
Re: [Xen-devel] [PATCH 1/6] xen-gntdev: Fix circular locking dependency
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> apply_to_page_range will acquire PTE lock while priv->lock is held, and > mn_invl_range_start tries to acquire priv->lock with PTE already held. > Fix by not holding priv->lock during the entire map operation.Is priv->lock needed to protect the contents of map? J> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/gntdev.c | 19 +++++++++---------- > 1 files changed, 9 insertions(+), 10 deletions(-) > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index a33e443..c582804 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) > if (!(vma->vm_flags & VM_WRITE)) > map->flags |= GNTMAP_readonly; > > + spin_unlock(&priv->lock); > + > err = apply_to_page_range(vma->vm_mm, vma->vm_start, > vma->vm_end - vma->vm_start, > find_grant_ptes, map); > - if (err) { > - goto unlock_out; > - if (debug) > - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__); > - } > + if (err) > + return err; > > err = map_grant_pages(map); > - if (err) { > - goto unlock_out; > - if (debug) > - printk("%s: map_grant_pages() failure.\n", __FUNCTION__); > - } > + if (err) > + return err; > + > map->is_mapped = 1; > > + return 0; > + > unlock_out: > spin_unlock(&priv->lock); > return err;_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:12 UTC
Re: [Xen-devel] [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> Because there is no limitation on how many times a user can open a > given device file, an per-file-description limit on the number of > pages granted offers little to no benefit. Change to a global limit > and remove the ioctl() as the parameter can now be changed via sysfs.Does anyone use this ioctl? Wouldn''t it be safer to replace it with a no-op version? J> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/gntdev.c | 48 ++++++++++++++---------------------------------- > 1 files changed, 14 insertions(+), 34 deletions(-) > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index c582804..62639af 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver"); > > static int debug = 0; > module_param(debug, int, 0644); > -static int limit = 1024; > +static int limit = 1024*1024; > module_param(limit, int, 0644); > +static atomic_t pages_mapped = ATOMIC_INIT(0); > > struct gntdev_priv { > struct list_head maps; > - uint32_t used; > - uint32_t limit; > spinlock_t lock; > struct mm_struct *mm; > struct mmu_notifier mn; > @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv, > { > struct grant_map *map; > > - printk("%s: maps list (priv %p, usage %d/%d)\n", > - __FUNCTION__, priv, priv->used, priv->limit); > + printk("%s: maps list (priv %p)\n", > + __FUNCTION__, priv); > list_for_each_entry(map, &priv->maps, next) > printk(" index %2d, count %2d %s\n", > map->index, map->count, > @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) > add->count = count; > add->priv = priv; > > - if (add->count + priv->used > priv->limit) > - goto err; > - > return add; > > err: > @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > list_add_tail(&add->next, &priv->maps); > > done: > - priv->used += add->count; > if (debug) > gntdev_print_maps(priv, "[new]", add->index); > } > @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map) > if (map->unmap_ops[i].handle) > return -EBUSY; > > - map->priv->used -= map->count; > + atomic_sub(map->count, &pages_mapped); > list_del(&map->next); > return 0; > } > @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip) > > INIT_LIST_HEAD(&priv->maps); > spin_lock_init(&priv->lock); > - priv->limit = limit; > > priv->mm = get_task_mm(current); > if (!priv->mm) { > @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > op.count); > if (unlikely(op.count <= 0)) > return -EINVAL; > - if (unlikely(op.count > priv->limit)) > - return -EINVAL; > > err = -ENOMEM; > map = gntdev_alloc_map(priv, op.count); > if (!map) > return err; > + > if (copy_from_user(map->grants, &u->refs, > sizeof(map->grants[0]) * op.count) != 0) { > gntdev_free_map(map); > return err; > } > > + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) > + { > + if (debug) > + printk("%s: can''t map: over limit\n", __FUNCTION__); > + gntdev_free_map(map); > + return err; > + } > + > spin_lock(&priv->lock); > gntdev_add_map(priv, map); > op.index = map->index << PAGE_SHIFT; > @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > return 0; > } > > -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv, > - struct ioctl_gntdev_set_max_grants __user *u) > -{ > - struct ioctl_gntdev_set_max_grants op; > - > - if (copy_from_user(&op, u, sizeof(op)) != 0) > - return -EFAULT; > - if (debug) > - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count); > - if (op.count > limit) > - return -EINVAL; > - > - spin_lock(&priv->lock); > - priv->limit = op.count; > - spin_unlock(&priv->lock); > - return 0; > -} > - > static long gntdev_ioctl(struct file *flip, > unsigned int cmd, unsigned long arg) > { > @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip, > case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: > return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); > > - case IOCTL_GNTDEV_SET_MAX_GRANTS: > - return gntdev_ioctl_set_max_grants(priv, ptr); > - > default: > if (debug) > printk("%s: priv %p, unknown cmd %x\n",_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:15 UTC
Re: [Xen-devel] [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> The entire hypercall argument list isn''t required; only selected > fields from the hypercall need to be tracked between the ioctl, map, > and unmap operations.Is the rationale of this patch to save memory? If so, how much does it save. (This patch seems sensible in principle, but it doesn''t seem to save much complexity.) J> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/gntdev.c | 195 +++++++++++++++++++++++++++++-------------------- > 1 files changed, 115 insertions(+), 80 deletions(-) > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index 62639af..773b76c 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -55,17 +55,23 @@ struct gntdev_priv { > struct mmu_notifier mn; > }; > > +struct granted_page { > + u64 pte_maddr; > + union { > + struct ioctl_gntdev_grant_ref target; > + grant_handle_t handle; > + }; > +}; > + > struct grant_map { > struct list_head next; > struct gntdev_priv *priv; > struct vm_area_struct *vma; > int index; > int count; > - int flags; > - int is_mapped; > - struct ioctl_gntdev_grant_ref *grants; > - struct gnttab_map_grant_ref *map_ops; > - struct gnttab_unmap_grant_ref *unmap_ops; > + int is_mapped:1; > + int is_ro:1; > + struct granted_page pages[0]; > }; > > /* ------------------------------------------------------------------ */ > @@ -83,40 +89,28 @@ static void gntdev_print_maps(struct gntdev_priv *priv, > map->index == text_index && text ? text : ""); > } > > -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) > +static struct grant_map *gntdev_alloc_map(int count, > + struct ioctl_gntdev_grant_ref* grants) > { > struct grant_map *add; > + int i; > > - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL); > - if (NULL == add) > + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL); > + if (!add) > return NULL; > > - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL); > - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL); > - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL); > - if (NULL == add->grants || > - NULL == add->map_ops || > - NULL == add->unmap_ops) > - goto err; > - > - add->index = 0; > add->count = count; > - add->priv = priv; > + for(i = 0; i < count; i++) > + add->pages[i].target = grants[i]; > > return add; > - > -err: > - kfree(add->grants); > - kfree(add->map_ops); > - kfree(add->unmap_ops); > - kfree(add); > - return NULL; > } > > static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > { > struct grant_map *map; > > + spin_lock(&priv->lock); > list_for_each_entry(map, &priv->maps, next) { > if (add->index + add->count < map->index) { > list_add_tail(&add->next, &map->next); > @@ -127,8 +121,10 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > list_add_tail(&add->next, &priv->maps); > > done: > + add->priv = priv; > if (debug) > gntdev_print_maps(priv, "[new]", add->index); > + spin_unlock(&priv->lock); > } > > static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, > @@ -169,9 +165,10 @@ static int gntdev_del_map(struct grant_map *map) > > if (map->vma) > return -EBUSY; > - for (i = 0; i < map->count; i++) > - if (map->unmap_ops[i].handle) > - return -EBUSY; > + if (map->is_mapped) > + for (i = 0; i < map->count; i++) > + if (map->pages[i].handle) > + return -EBUSY; > > atomic_sub(map->count, &pages_mapped); > list_del(&map->next); > @@ -182,9 +179,6 @@ static void gntdev_free_map(struct grant_map *map) > { > if (!map) > return; > - kfree(map->grants); > - kfree(map->map_ops); > - kfree(map->unmap_ops); > kfree(map); > } > > @@ -199,51 +193,91 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void > BUG_ON(pgnr >= map->count); > pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT; > pte_maddr += (unsigned long)pte & ~PAGE_MASK; > - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags, > - map->grants[pgnr].ref, > - map->grants[pgnr].domid); > - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags, > - 0 /* handle */); > + map->pages[pgnr].pte_maddr = pte_maddr; > + > return 0; > } > > static int map_grant_pages(struct grant_map *map) > { > - int i, err = 0; > + int i, flags, err = 0; > + struct gnttab_map_grant_ref* map_ops; > > + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > + > + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); > + if (!map_ops) > + return -ENOMEM; > + > + for(i=0; i < map->count; i++) > + gnttab_set_map_op(&map_ops[i], map->pages[i].pte_maddr, flags, > + map->pages[i].target.ref, > + map->pages[i].target.domid); > if (debug) > printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count); > err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, > - map->map_ops, map->count); > + map_ops, map->count); > if (WARN_ON(err)) > - return err; > + goto out; > > for (i = 0; i < map->count; i++) { > - if (map->map_ops[i].status) > + if (map_ops[i].status) { > + map->pages[i].pte_maddr = 0; > err = -EINVAL; > - map->unmap_ops[i].handle = map->map_ops[i].handle; > + } else { > + map->pages[i].handle = map_ops[i].handle; > + } > } > + > +out: > + kfree(map_ops); > return err; > } > > -static int unmap_grant_pages(struct grant_map *map, int offset, int pages) > +static void unmap_grant_pages(struct grant_map *map, int offset, int pages) > { > - int i, err = 0; > + int i, flags, err = 0; > + struct gnttab_unmap_grant_ref *unmap_ops; > + struct gnttab_unmap_grant_ref unmap_single; > + > + if (pages > 1) { > + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages, > + GFP_TEMPORARY); > + if (unlikely(!unmap_ops)) { > + for(i=0; i < pages; i++) > + unmap_grant_pages(map, offset + i, 1); > + return; > + } > + } else { > + unmap_ops = &unmap_single; > + } > > + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > + > + for(i=0; i < pages; i++) > + gnttab_set_unmap_op(&unmap_ops[i], > + map->pages[offset+i].pte_maddr, flags, > + map->pages[offset+i].handle); > if (debug) > printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__, > map->index, map->count, offset, pages); > err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, > - map->unmap_ops + offset, pages); > + unmap_ops, pages); > if (WARN_ON(err)) > - return err; > + goto out; > > for (i = 0; i < pages; i++) { > - if (map->unmap_ops[offset+i].status) > - err = -EINVAL; > - map->unmap_ops[offset+i].handle = 0; > + WARN_ON(unmap_ops[i].status); > + map->pages[offset+i].pte_maddr = 0; > + map->pages[offset+i].handle = 0; > } > - return err; > +out: > + if (unmap_ops != &unmap_single) > + kfree(unmap_ops); > } > > /* ------------------------------------------------------------------ */ > @@ -282,7 +316,6 @@ static void mn_invl_range_start(struct mmu_notifier *mn, > struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); > struct grant_map *map; > unsigned long mstart, mend; > - int err; > > spin_lock(&priv->lock); > list_for_each_entry(map, &priv->maps, next) { > @@ -301,10 +334,9 @@ static void mn_invl_range_start(struct mmu_notifier *mn, > __FUNCTION__, map->index, map->count, > map->vma->vm_start, map->vma->vm_end, > start, end, mstart, mend); > - err = unmap_grant_pages(map, > - (mstart - map->vma->vm_start) >> PAGE_SHIFT, > - (mend - mstart) >> PAGE_SHIFT); > - WARN_ON(err); > + unmap_grant_pages(map, > + (mstart - map->vma->vm_start) >> PAGE_SHIFT, > + (mend - mstart) >> PAGE_SHIFT); > } > spin_unlock(&priv->lock); > } > @@ -321,7 +353,6 @@ static void mn_release(struct mmu_notifier *mn, > { > struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); > struct grant_map *map; > - int err; > > spin_lock(&priv->lock); > list_for_each_entry(map, &priv->maps, next) { > @@ -331,8 +362,7 @@ static void mn_release(struct mmu_notifier *mn, > printk("%s: map %d+%d (%lx %lx)\n", > __FUNCTION__, map->index, map->count, > map->vma->vm_start, map->vma->vm_end); > - err = unmap_grant_pages(map, 0, map->count); > - WARN_ON(err); > + unmap_grant_pages(map, 0, map->count); > } > spin_unlock(&priv->lock); > } > @@ -401,6 +431,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > { > struct ioctl_gntdev_map_grant_ref op; > struct grant_map *map; > + struct ioctl_gntdev_grant_ref* grants; > int err; > > if (copy_from_user(&op, u, sizeof(op)) != 0) > @@ -411,38 +442,45 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > if (unlikely(op.count <= 0)) > return -EINVAL; > > + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); > + if (!grants) > + return -ENOMEM; > + > + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) { > + err = -EFAULT; > + goto out_free; > + } > + > err = -ENOMEM; > - map = gntdev_alloc_map(priv, op.count); > + map = gntdev_alloc_map(op.count, grants); > if (!map) > - return err; > - > - if (copy_from_user(map->grants, &u->refs, > - sizeof(map->grants[0]) * op.count) != 0) { > - gntdev_free_map(map); > - return err; > - } > + goto out_free; > > if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) > { > if (debug) > printk("%s: can''t map: over limit\n", __FUNCTION__); > - gntdev_free_map(map); > - return err; > + goto out_free_map; > } > > - spin_lock(&priv->lock); > gntdev_add_map(priv, map); > op.index = map->index << PAGE_SHIFT; > - spin_unlock(&priv->lock); > > - if (copy_to_user(u, &op, sizeof(op)) != 0) { > - spin_lock(&priv->lock); > - gntdev_del_map(map); > - spin_unlock(&priv->lock); > - gntdev_free_map(map); > - return err; > + if (copy_to_user(u, &op, sizeof(op))) { > + err = -EFAULT; > + goto out_remove; > } > - return 0; > + err = 0; > +out_free: > + kfree(grants); > + return err; > +out_remove: > + spin_lock(&priv->lock); > + gntdev_del_map(map); > + spin_unlock(&priv->lock); > +out_free_map: > + gntdev_free_map(map); > + goto out_free; > } > > static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, > @@ -556,10 +594,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) > > vma->vm_private_data = map; > map->vma = vma; > - > - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; > - if (!(vma->vm_flags & VM_WRITE)) > - map->flags |= GNTMAP_readonly; > + map->is_ro = !(vma->vm_flags & VM_WRITE); > > spin_unlock(&priv->lock); >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:20 UTC
Re: [Xen-devel] [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> This should be faster if many mappings exist,Yes, seems reasonable.> and also removes > the only user of vma that is not related to PTE modification.What do you mean? Oh, you mean map->vma?> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/gntdev.c | 34 ++++++++++------------------------ > 1 files changed, 10 insertions(+), 24 deletions(-) > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index 773b76c..a73f07c 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -74,6 +74,8 @@ struct grant_map { > struct granted_page pages[0]; > }; > > +static struct vm_operations_struct gntdev_vmops;Is it OK for this to be all NULL?> + > /* ------------------------------------------------------------------ */ > > static void gntdev_print_maps(struct gntdev_priv *priv, > @@ -142,23 +144,6 @@ static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int ind > return NULL; > } > > -static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv, > - unsigned long vaddr) > -{ > - struct grant_map *map; > - > - list_for_each_entry(map, &priv->maps, next) { > - if (!map->vma) > - continue; > - if (vaddr < map->vma->vm_start) > - continue; > - if (vaddr >= map->vma->vm_end) > - continue; > - return map; > - } > - return NULL; > -} > - > static int gntdev_del_map(struct grant_map *map) > { > int i; > @@ -510,6 +495,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > struct ioctl_gntdev_get_offset_for_vaddr __user *u) > { > struct ioctl_gntdev_get_offset_for_vaddr op; > + struct vm_area_struct *vma; > struct grant_map *map; > > if (copy_from_user(&op, u, sizeof(op)) != 0) > @@ -518,16 +504,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv, > (unsigned long)op.vaddr); > > - spin_lock(&priv->lock); > - map = gntdev_find_map_vaddr(priv, op.vaddr); > - if (map == NULL || > - map->vma->vm_start != op.vaddr) { > - spin_unlock(&priv->lock); > + vma = find_vma(current->mm, op.vaddr); > + if (!vma) > return -EINVAL; > - } > + > + map = vma->vm_private_data; > + if (vma->vm_ops != &gntdev_vmops || !map) > + return -EINVAL;I know this is perfectly OK, but I''d be happier if you don''t refer to vm_private_data as "map" until the vma has been confirmed to be a gntdev one.> + > op.offset = map->index << PAGE_SHIFT; > op.count = map->count; > - spin_unlock(&priv->lock); > > if (copy_to_user(u, &op, sizeof(op)) != 0) > return -EFAULT;J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 21:40 UTC
Re: [Xen-devel] [PATCH 1/6] xen-gntdev: Fix circular locking dependency
On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote:> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >> apply_to_page_range will acquire PTE lock while priv->lock is held, and >> mn_invl_range_start tries to acquire priv->lock with PTE already held. >> Fix by not holding priv->lock during the entire map operation. > > Is priv->lock needed to protect the contents of map? > > JNo, since the map can only be mapped once (checked by map->vma assignment while the lock is held). The unmap ioctl will return -EBUSY until an munmap() is called on the area, so it also can''t race, and the other users are only active once the mmap operation completes.> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/gntdev.c | 19 +++++++++---------- >> 1 files changed, 9 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index a33e443..c582804 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) >> if (!(vma->vm_flags & VM_WRITE)) >> map->flags |= GNTMAP_readonly; >> >> + spin_unlock(&priv->lock); >> + >> err = apply_to_page_range(vma->vm_mm, vma->vm_start, >> vma->vm_end - vma->vm_start, >> find_grant_ptes, map); >> - if (err) { >> - goto unlock_out; >> - if (debug) >> - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__); >> - } >> + if (err) >> + return err; >> >> err = map_grant_pages(map); >> - if (err) { >> - goto unlock_out; >> - if (debug) >> - printk("%s: map_grant_pages() failure.\n", __FUNCTION__); >> - } >> + if (err) >> + return err; >> + >> map->is_mapped = 1; >> >> + return 0; >> + >> unlock_out: >> spin_unlock(&priv->lock); >> return err; >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 21:42 UTC
Re: [Xen-devel] [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
On 12/14/2010 04:12 PM, Jeremy Fitzhardinge wrote:> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >> Because there is no limitation on how many times a user can open a >> given device file, an per-file-description limit on the number of >> pages granted offers little to no benefit. Change to a global limit >> and remove the ioctl() as the parameter can now be changed via sysfs. > > Does anyone use this ioctl? Wouldn''t it be safer to replace it with a > no-op version? > > J >I do not know of any users. If it''s preferred to replace with a noop ioctl instead of the current -ENOSYS return, that''s easy to do.>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/gntdev.c | 48 ++++++++++++++---------------------------------- >> 1 files changed, 14 insertions(+), 34 deletions(-) >> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index c582804..62639af 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver"); >> >> static int debug = 0; >> module_param(debug, int, 0644); >> -static int limit = 1024; >> +static int limit = 1024*1024; >> module_param(limit, int, 0644); >> +static atomic_t pages_mapped = ATOMIC_INIT(0); >> >> struct gntdev_priv { >> struct list_head maps; >> - uint32_t used; >> - uint32_t limit; >> spinlock_t lock; >> struct mm_struct *mm; >> struct mmu_notifier mn; >> @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv, >> { >> struct grant_map *map; >> >> - printk("%s: maps list (priv %p, usage %d/%d)\n", >> - __FUNCTION__, priv, priv->used, priv->limit); >> + printk("%s: maps list (priv %p)\n", >> + __FUNCTION__, priv); >> list_for_each_entry(map, &priv->maps, next) >> printk(" index %2d, count %2d %s\n", >> map->index, map->count, >> @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) >> add->count = count; >> add->priv = priv; >> >> - if (add->count + priv->used > priv->limit) >> - goto err; >> - >> return add; >> >> err: >> @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) >> list_add_tail(&add->next, &priv->maps); >> >> done: >> - priv->used += add->count; >> if (debug) >> gntdev_print_maps(priv, "[new]", add->index); >> } >> @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map) >> if (map->unmap_ops[i].handle) >> return -EBUSY; >> >> - map->priv->used -= map->count; >> + atomic_sub(map->count, &pages_mapped); >> list_del(&map->next); >> return 0; >> } >> @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip) >> >> INIT_LIST_HEAD(&priv->maps); >> spin_lock_init(&priv->lock); >> - priv->limit = limit; >> >> priv->mm = get_task_mm(current); >> if (!priv->mm) { >> @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, >> op.count); >> if (unlikely(op.count <= 0)) >> return -EINVAL; >> - if (unlikely(op.count > priv->limit)) >> - return -EINVAL; >> >> err = -ENOMEM; >> map = gntdev_alloc_map(priv, op.count); >> if (!map) >> return err; >> + >> if (copy_from_user(map->grants, &u->refs, >> sizeof(map->grants[0]) * op.count) != 0) { >> gntdev_free_map(map); >> return err; >> } >> >> + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) >> + { >> + if (debug) >> + printk("%s: can''t map: over limit\n", __FUNCTION__); >> + gntdev_free_map(map); >> + return err; >> + } >> + >> spin_lock(&priv->lock); >> gntdev_add_map(priv, map); >> op.index = map->index << PAGE_SHIFT; >> @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, >> return 0; >> } >> >> -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv, >> - struct ioctl_gntdev_set_max_grants __user *u) >> -{ >> - struct ioctl_gntdev_set_max_grants op; >> - >> - if (copy_from_user(&op, u, sizeof(op)) != 0) >> - return -EFAULT; >> - if (debug) >> - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count); >> - if (op.count > limit) >> - return -EINVAL; >> - >> - spin_lock(&priv->lock); >> - priv->limit = op.count; >> - spin_unlock(&priv->lock); >> - return 0; >> -} >> - >> static long gntdev_ioctl(struct file *flip, >> unsigned int cmd, unsigned long arg) >> { >> @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip, >> case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: >> return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); >> >> - case IOCTL_GNTDEV_SET_MAX_GRANTS: >> - return gntdev_ioctl_set_max_grants(priv, ptr); >> - >> default: >> if (debug) >> printk("%s: priv %p, unknown cmd %x\n", >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:42 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> This allows a userspace application to allocate a shared page for > implementing inter-domain communication or device drivers. These > shared pages can be mapped using the gntdev device or by the kernel > in another domain. > > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/Kconfig | 7 + > drivers/xen/Makefile | 2 + > drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++ > include/xen/gntalloc.h | 68 +++++++ > 4 files changed, 533 insertions(+), 0 deletions(-) > create mode 100644 drivers/xen/gntalloc.c > create mode 100644 include/xen/gntalloc.h > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > index fa9982e..8398cb0 100644 > --- a/drivers/xen/Kconfig > +++ b/drivers/xen/Kconfig > @@ -180,6 +180,13 @@ config XEN_GNTDEV > help > Allows userspace processes use grants. > > +config XEN_GRANT_DEV_ALLOC > + tristate "User-space grant reference allocator driver" > + depends on XEN > + help > + Allows userspace processes to create pages with access granted > + to other domains. > + > config XEN_S3 > def_bool y > depends on XEN_DOM0 && ACPI > diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile > index ef1ea63..9814c1d 100644 > --- a/drivers/xen/Makefile > +++ b/drivers/xen/Makefile > @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o > obj-$(CONFIG_XEN_BALLOON) += balloon.o > obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o > obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o > +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o > obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ > obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ > obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/ > @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o > > xen-evtchn-y := evtchn.o > xen-gntdev-y := gntdev.o > +xen-gntalloc-y := gntalloc.o > diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c > new file mode 100644 > index 0000000..f26adfd > --- /dev/null > +++ b/drivers/xen/gntalloc.c > @@ -0,0 +1,456 @@ > +/****************************************************************************** > + * gntalloc.c > + * > + * Device for creating grant references (in user-space) that may be shared > + * with other domains. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > + */ > + > +/* > + * This driver exists to allow userspace programs in Linux to allocate kernel > + * memory that will later be shared with another domain. Without this device, > + * Linux userspace programs cannot create grant references. > + * > + * How this stuff works: > + * X -> granting a page to Y > + * Y -> mapping the grant from X > + * > + * 1. X uses the gntalloc device to allocate a page of kernel memory, P. > + * 2. X creates an entry in the grant table that says domid(Y) can > + * access P. > + * 3. X gives the grant reference identifier, GREF, to Y. > + * 4. A program in Y uses the gntdev device to map the page (owned by X > + * and identified by GREF) into domain(Y) and then into the address > + * space of the program. Behind the scenes, this requires a > + * hypercall in which Xen modifies the host CPU page tables to > + * perform the sharing -- that''s where the actual cross-domain mapping > + * occurs.Presumably Y could be any grant-page user, not specifically gntdev? So you could use this to implement a frontend in userspace?> + * 5. A program in X mmap()s a segment of the gntalloc device that > + * corresponds to the shared page. > + * 6. The two userspace programs can now communicate over the shared page. > + * > + * > + * NOTE TO USERSPACE LIBRARIES: > + * The grant allocation and mmap()ing are, naturally, two separate > + * operations. You set up the sharing by calling the create ioctl() and > + * then the mmap(). You must tear down the sharing in the reverse order > + * (munmap() and then the destroy ioctl()). > + * > + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant > + * reference, this device can be used to consume kernel memory by leaving grant > + * references mapped by another domain when an application exits. Therefore, > + * there is a global limit on the number of pages that can be allocated. When > + * all references to the page are unmapped, it will be freed during the next > + * grant operation. > + */ > + > +#include <asm/atomic.h> > +#include <linux/module.h> > +#include <linux/miscdevice.h> > +#include <linux/kernel.h> > +#include <linux/init.h> > +#include <linux/slab.h> > +#include <linux/fs.h> > +#include <linux/device.h> > +#include <linux/mm.h> > +#include <asm/uaccess.h> > +#include <linux/types.h> > +#include <linux/list.h> > + > +#include <xen/xen.h> > +#include <xen/page.h> > +#include <xen/grant_table.h> > +#include <xen/gntalloc.h> > + > +static int debug = 0; > +module_param(debug, int, 0644); > + > +static int limit = 1024; > +module_param(limit, int, 0644); > + > +static LIST_HEAD(gref_list); > +static DEFINE_SPINLOCK(gref_lock); > +static int gref_size = 0; > + > +/* Metadata on a grant reference. */ > +struct gntalloc_gref { > + struct list_head next_all; /* list entry gref_list */ > + struct list_head next_file; /* list entry file->list, if open */ > + domid_t foreign_domid; /* The ID of the domain to share with. */ > + grant_ref_t gref_id; /* The grant reference number. */ > + unsigned int users; /* Use count - when zero, waiting on Xen */ > + struct page* page; /* The shared page. */ > +}; > + > +struct gntalloc_file_private_data { > + struct list_head list; > +}; > + > +static void __del_gref(struct gntalloc_gref *gref); > + > +static void do_cleanup(void) > +{ > + struct gntalloc_gref *gref, *n; > + list_for_each_entry_safe(gref, n, &gref_list, next_all) { > + if (!gref->users) > + __del_gref(gref); > + } > +} > + > + > +static int add_gref(domid_t foreign_domid, uint32_t readonly, > + struct gntalloc_file_private_data *priv) > +{ > + int rc; > + struct gntalloc_gref *gref; > + > + rc = -ENOMEM; > + spin_lock(&gref_lock); > + do_cleanup(); > + if (gref_size >= limit) { > + spin_unlock(&gref_lock); > + rc = -ENOSPC; > + goto out; > + } > + gref_size++; > + spin_unlock(&gref_lock); > + > + gref = kzalloc(sizeof(*gref), GFP_KERNEL); > + if (!gref) > + goto out; > + > + gref->foreign_domid = foreign_domid; > + gref->users = 1; > + > + /* Allocate the page to share. */ > + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO);Could this be GFP_HIGHUSER?> + if (!gref->page) > + goto out_nopage; > + > + /* Grant foreign access to the page. */ > + gref->gref_id = gnttab_grant_foreign_access(foreign_domid, > + pfn_to_mfn(page_to_pfn(gref->page)), readonly); > + if (gref->gref_id < 0) { > + printk(KERN_ERR "%s: failed to grant foreign access for mfn " > + "%lu to domain %u\n", __func__, > + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid); > + rc = -EFAULT; > + goto out_no_foreign_gref; > + } > + > + /* Add to gref lists. */ > + spin_lock(&gref_lock); > + list_add_tail(&gref->next_all, &gref_list); > + list_add_tail(&gref->next_file, &priv->list); > + spin_unlock(&gref_lock); > + > + return gref->gref_id; > + > +out_no_foreign_gref: > + __free_page(gref->page); > +out_nopage: > + kfree(gref); > +out: > + return rc; > +} > + > +static void __del_gref(struct gntalloc_gref *gref) > +{ > + if (gnttab_query_foreign_access(gref->gref_id)) > + return; > + > + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0)) > + return; > + > + gref_size--; > + list_del(&gref->next_all); > + > + __free_page(gref->page); > + kfree(gref); > +} > + > +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv, > + grant_ref_t gref_id) > +{ > + struct gntalloc_gref *gref; > + list_for_each_entry(gref, &priv->list, next_file) { > + if (gref->gref_id == gref_id) > + return gref; > + } > + return NULL; > +} > + > +/* > + * ------------------------------------- > + * File operations. > + * ------------------------------------- > + */ > +static int gntalloc_open(struct inode *inode, struct file *filp) > +{ > + struct gntalloc_file_private_data *priv; > + > + try_module_get(THIS_MODULE); > + > + priv = kzalloc(sizeof(*priv), GFP_KERNEL); > + if (!priv) > + goto out_nomem; > + INIT_LIST_HEAD(&priv->list); > + > + filp->private_data = priv; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + return 0; > + > +out_nomem: > + return -ENOMEM; > +} > + > +static int gntalloc_release(struct inode *inode, struct file *filp) > +{ > + struct gntalloc_file_private_data *priv = filp->private_data; > + struct gntalloc_gref *gref; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + spin_lock(&gref_lock);Presumably this is unnecessary because there can be no other users if you''re tearing down the list and destroying priv.> + while (!list_empty(&priv->list)) { > + gref = list_entry(priv->list.next, > + struct gntalloc_gref, next_file); > + list_del(&gref->next_file); > + gref->users--; > + if (gref->users == 0) > + __del_gref(gref); > + } > + kfree(priv); > + spin_unlock(&gref_lock); > + > + module_put(THIS_MODULE); > + > + return 0; > +} > + > +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, > + void __user *arg) > +{ > + int rc = 0; > + struct ioctl_gntalloc_alloc_gref op; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + if (copy_from_user(&op, arg, sizeof(op))) { > + rc = -EFAULT; > + goto alloc_grant_out; > + } > + rc = add_gref(op.foreign_domid, op.readonly, priv); > + if (rc < 0) > + goto alloc_grant_out; > + > + op.gref_id = rc; > + op.page_idx = rc;Hm, see below.> + > + rc = 0; > + > + if (copy_to_user((void __user *)arg, &op, sizeof(op))) { > + rc = -EFAULT; > + goto alloc_grant_out; > + } > + > +alloc_grant_out: > + return rc; > +} > + > +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, > + void __user *arg) > +{ > + int rc = 0; > + struct ioctl_gntalloc_dealloc_gref op; > + struct gntalloc_gref *gref; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + if (copy_from_user(&op, arg, sizeof(op))) { > + rc = -EFAULT; > + goto dealloc_grant_out; > + } > + > + spin_lock(&gref_lock); > + gref = find_gref(priv, op.gref_id); > + if (gref) { > + list_del(&gref->next_file); > + gref->users--; > + rc = 0; > + } else { > + rc = -EINVAL; > + } > + > + do_cleanup(); > + spin_unlock(&gref_lock); > +dealloc_grant_out: > + return rc; > +} > + > +static long gntalloc_ioctl(struct file *filp, unsigned int cmd, > + unsigned long arg) > +{ > + struct gntalloc_file_private_data *priv = filp->private_data; > + > + switch (cmd) { > + case IOCTL_GNTALLOC_ALLOC_GREF: > + return gntalloc_ioctl_alloc(priv, (void __user*)arg); > + > + case IOCTL_GNTALLOC_DEALLOC_GREF: > + return gntalloc_ioctl_dealloc(priv, (void __user*)arg); > + > + default: > + return -ENOIOCTLCMD; > + } > + > + return 0; > +} > + > +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > +{ > + struct gntalloc_gref *gref = vma->vm_private_data; > + if (!gref) > + return VM_FAULT_SIGBUS; > + > + vmf->page = gref->page; > + get_page(vmf->page); > + > + return 0; > +}; > + > +static void gntalloc_vma_close(struct vm_area_struct *vma) > +{ > + struct gntalloc_gref *gref = vma->vm_private_data; > + if (!gref) > + return; > + > + spin_lock(&gref_lock); > + gref->users--; > + if (gref->users == 0) > + __del_gref(gref); > + spin_unlock(&gref_lock); > +} > + > +static struct vm_operations_struct gntalloc_vmops = { > + .fault = gntalloc_vma_fault, > + .close = gntalloc_vma_close, > +}; > + > +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) > +{ > + struct gntalloc_file_private_data *priv = filp->private_data; > + struct gntalloc_gref *gref; > + > + if (debug) > + printk("%s: priv %p, page %lu\n", __func__, > + priv, vma->vm_pgoff); > + > + /* > + * There is a 1-to-1 correspondence of grant references to shared > + * pages, so it only makes sense to map exactly one page per > + * call to mmap(). > + */Single-page mmap makes sense if the only possible use-cases are for single-page mappings, but if you''re talking about framebuffers and the like is seems like a very awkward way to use mmap. It would be cleaner from an API perspective to have a user-mode defined flat address space indexed by pgoff which maps to an array of grefs, so you can sensibly do a multi-page mapping. It would also allow you to hide the grefs from usermode entirely. Then its just up to usermode to choose suitable file offsets for itself.> + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) { > + printk(KERN_ERR "%s: Only one page can be memory-mapped " > + "per grant reference.\n", __func__); > + return -EINVAL; > + } > + > + if (!(vma->vm_flags & VM_SHARED)) { > + printk(KERN_ERR "%s: Mapping must be shared.\n", > + __func__); > + return -EINVAL; > + } > + > + spin_lock(&gref_lock); > + gref = find_gref(priv, vma->vm_pgoff); > + if (gref == NULL) { > + spin_unlock(&gref_lock); > + printk(KERN_ERR "%s: Could not find a grant reference with " > + "page index %lu.\n", __func__, vma->vm_pgoff); > + return -ENOENT; > + } > + gref->users++; > + spin_unlock(&gref_lock); > + > + vma->vm_private_data = gref; > + > + /* This flag prevents Bad PTE errors when the memory is unmapped. */ > + vma->vm_flags |= VM_RESERVED; > + vma->vm_flags |= VM_DONTCOPY; > + vma->vm_flags |= VM_IO;If you set VM_PFNMAP then you don''t need to deal with faults.> + > + vma->vm_ops = &gntalloc_vmops; > + > + return 0; > +} > + > +static const struct file_operations gntalloc_fops = { > + .owner = THIS_MODULE, > + .open = gntalloc_open, > + .release = gntalloc_release, > + .unlocked_ioctl = gntalloc_ioctl, > + .mmap = gntalloc_mmap > +}; > + > +/* > + * ------------------------------------- > + * Module creation/destruction. > + * ------------------------------------- > + */ > +static struct miscdevice gntalloc_miscdev = { > + .minor = MISC_DYNAMIC_MINOR, > + .name = "xen/gntalloc", > + .fops = &gntalloc_fops, > +}; > + > +static int __init gntalloc_init(void) > +{ > + int err; > + > + if (!xen_domain()) { > + if (debug) > + printk(KERN_ERR "gntalloc: You must be running Xen\n"); > + return -ENODEV; > + } > + > + err = misc_register(&gntalloc_miscdev); > + if (err != 0) { > + printk(KERN_ERR "Could not register misc gntalloc device\n"); > + return err; > + } > + > + if (debug) > + printk(KERN_INFO "Created grant allocation device at %d,%d\n", > + MISC_MAJOR, gntalloc_miscdev.minor); > + > + return 0; > +} > + > +static void __exit gntalloc_exit(void) > +{ > + misc_deregister(&gntalloc_miscdev); > +} > + > +module_init(gntalloc_init); > +module_exit(gntalloc_exit); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, " > + "Daniel De Graaf <dgdegra@tycho.nsa.gov>"); > +MODULE_DESCRIPTION("User-space grant reference allocator driver"); > diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h > new file mode 100644 > index 0000000..76b70d7 > --- /dev/null > +++ b/include/xen/gntalloc.h > @@ -0,0 +1,68 @@ > +/****************************************************************************** > + * gntalloc.h > + * > + * Interface to /dev/xen/gntalloc. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License version 2 > + * as published by the Free Software Foundation; or, when distributed > + * separately from the Linux kernel or incorporated into other > + * software packages, subject to the following license: > + * > + * Permission is hereby granted, free of charge, to any person obtaining a copy > + * of this source file (the "Software"), to deal in the Software without > + * restriction, including without limitation the rights to use, copy, modify, > + * merge, publish, distribute, sublicense, and/or sell copies of the Software, > + * and to permit persons to whom the Software is furnished to do so, subject to > + * the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#ifndef __LINUX_PUBLIC_GNTALLOC_H__ > +#define __LINUX_PUBLIC_GNTALLOC_H__ > + > +/* > + * Allocates a new page and creates a new grant reference. > + * > + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it''s the > + * page number and not an actual address. It must be shifted again prior > + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT). > + */ > +#define IOCTL_GNTALLOC_ALLOC_GREF \ > +_IOC(_IOC_NONE, ''G'', 1, sizeof(struct ioctl_gntalloc_alloc_gref)) > +struct ioctl_gntalloc_alloc_gref { > + /* IN parameters */ > + /* The ID of the domain creating the grant reference. */ > + domid_t owner_domid; > + /* The ID of the domain to be given access to the grant. */ > + domid_t foreign_domid; > + /* The type of access given to domid. */ > + uint32_t readonly; > + /* OUT parameters */ > + /* The grant reference of the newly created grant. */ > + grant_ref_t gref_id; > + /* The page index (page number, NOT address) for grant mmap(). */ > + uint32_t page_idx; > +}; > + > +/* > + * Deallocates the grant reference, freeing the associated page. > + */ > +#define IOCTL_GNTALLOC_DEALLOC_GREF \ > +_IOC(_IOC_NONE, ''G'', 2, sizeof(struct ioctl_gntalloc_dealloc_gref)) > +struct ioctl_gntalloc_dealloc_gref { > + /* IN parameter */ > + /* The grant reference to deallocate. */ > + grant_ref_t gref_id; > +}; > +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:45 UTC
Re: [Xen-devel] [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> HVM does not allow direct PTE modification, so guest pages must be > allocated and used for grant mappings. If Xen does not provide an > auto-translated physmap, the existing direct PTE modification is more > efficient.I haven''t looked at this in detail, but doing a complete duplication of the driver doesn''t seem like a great idea. Also, Stefano and I have been working on a modification to gntdev to make sure that each granted page is backed with a proper struct page (so that direct aio works on them). Would this help the hvm case? Thanks, J> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/Makefile | 2 + > drivers/xen/gntdev-hvm.c | 601 ++++++++++++++++++++++++++++++++++++++++++++++ > drivers/xen/gntdev.c | 3 + > 3 files changed, 606 insertions(+), 0 deletions(-) > create mode 100644 drivers/xen/gntdev-hvm.c > > diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile > index 9814c1d..ab0e6eb 100644 > --- a/drivers/xen/Makefile > +++ b/drivers/xen/Makefile > @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o > obj-$(CONFIG_XEN_BALLOON) += balloon.o > obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o > obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o > +obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev-hvm.o > obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o > obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ > obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ > @@ -26,4 +27,5 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o > > xen-evtchn-y := evtchn.o > xen-gntdev-y := gntdev.o > +xen-gntdev-hvm-y := gntdev-hvm.o > xen-gntalloc-y := gntalloc.o > diff --git a/drivers/xen/gntdev-hvm.c b/drivers/xen/gntdev-hvm.c > new file mode 100644 > index 0000000..331d5af > --- /dev/null > +++ b/drivers/xen/gntdev-hvm.c > @@ -0,0 +1,601 @@ > +/****************************************************************************** > + * gntdev.c > + * > + * Device for accessing (in user-space) pages that have been granted by other > + * domains. > + * > + * Copyright (c) 2006-2007, D G Murray. > + * (c) 2009 Gerd Hoffmann <kraxel@redhat.com> > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > + */ > + > +#include <linux/module.h> > +#include <linux/kernel.h> > +#include <linux/init.h> > +#include <linux/miscdevice.h> > +#include <linux/fs.h> > +#include <linux/mm.h> > +#include <linux/mman.h> > +#include <linux/types.h> > +#include <linux/uaccess.h> > +#include <linux/sched.h> > +#include <linux/spinlock.h> > +#include <linux/vmalloc.h> > + > +#include <xen/xen.h> > +#include <xen/grant_table.h> > +#include <xen/gntdev.h> > +#include <xen/interface/memory.h> > +#include <asm/xen/hypervisor.h> > +#include <asm/xen/hypercall.h> > +#include <asm/xen/page.h> > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Derek G. Murray <Derek.Murray@cl.cam.ac.uk>, " > + "Gerd Hoffmann <kraxel@redhat.com>"); > +MODULE_DESCRIPTION("User-space granted page access driver"); > + > +static int debug = 0; > +module_param(debug, int, 0644); > +static int limit = 1024*1024; > +module_param(limit, int, 0644); > +static atomic_t pages_mapped = ATOMIC_INIT(0); > + > +struct gntdev_priv { > + struct list_head maps; > + spinlock_t lock; > +}; > + > +struct granted_page { > + struct page* page; > + union { > + struct ioctl_gntdev_grant_ref target; > + grant_handle_t handle; > + }; > +}; > + > +struct grant_map { > + struct list_head next; > + int index; > + int count; > + atomic_t users; > + int is_mapped:1; > + int is_ro:1; > + struct granted_page pages[0]; > +}; > + > +static struct vm_operations_struct gntdev_vmops; > + > +/* ------------------------------------------------------------------ */ > + > +static void gntdev_print_maps(struct gntdev_priv *priv, > + char *text, int text_index) > +{ > + struct grant_map *map; > + > + printk("%s: maps list (priv %p)\n", __FUNCTION__, priv); > + list_for_each_entry(map, &priv->maps, next) { > + printk(" %p: %2d+%2d, r%c, %s %d,%d %s\n", map, > + map->index, map->count, map->is_ro ? ''o'' : ''w'', > + map->is_mapped ? "use,hnd" : "dom,ref", > + map->is_mapped ? atomic_read(&map->users) > + : map->pages[0].target.domid, > + map->is_mapped ? map->pages[0].handle > + : map->pages[0].target.ref, > + map->index == text_index && text ? text : ""); > + } > +} > + > +static struct grant_map *gntdev_alloc_map(int count, > + struct ioctl_gntdev_grant_ref* grants) > +{ > + struct grant_map *add; > + int i; > + > + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL); > + if (!add) > + return NULL; > + > + atomic_set(&add->users, 1); > + add->count = count; > + > + for(i = 0; i < count; i++) > + add->pages[i].target = grants[i]; > + > + return add; > +} > + > +static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > +{ > + struct grant_map *map; > + > + spin_lock(&priv->lock); > + > + list_for_each_entry(map, &priv->maps, next) { > + if (add->index + add->count < map->index) { > + list_add_tail(&add->next, &map->next); > + goto done; > + } > + add->index = map->index + map->count; > + } > + list_add_tail(&add->next, &priv->maps); > + > +done: > + if (debug) > + gntdev_print_maps(priv, "[new]", add->index); > + > + spin_unlock(&priv->lock); > +} > + > +static void __gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) > +{ > + list_del(&map->next); > +} > + > +static void gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) > +{ > + spin_lock(&priv->lock); > + __gntdev_del_map(priv, map); > + spin_unlock(&priv->lock); > +} > + > +static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, > + int count) > +{ > + struct grant_map *map; > + > + list_for_each_entry(map, &priv->maps, next) { > + if (map->index != index) > + continue; > + if (map->count != count) > + continue; > + return map; > + } > + return NULL; > +} > + > +static void gntdev_unmap_fast(struct grant_map *map, > + struct gnttab_unmap_grant_ref *unmap_ops) > +{ > + int err, flags, i, unmap_size = 0; > + unsigned long pfn; > + phys_addr_t mfn; > + > + flags = GNTMAP_host_map; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > + > + for (i=0; i < map->count; i++) { > + if (!map->pages[i].page) > + continue; > + pfn = page_to_pfn(map->pages[i].page); > + mfn = (phys_addr_t)pfn_to_kaddr(pfn); > + gnttab_set_unmap_op(&unmap_ops[unmap_size], mfn, flags, > + map->pages[i].handle); > + unmap_size++; > + } > + > + err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, > + unmap_ops, unmap_size); > + WARN_ON(err); > + > + for (i = 0; i < unmap_size; i++) > + WARN_ON(unmap_ops[i].status); > +} > + > +// for the out-of-memory case > +static void gntdev_unmap_slow(struct grant_map *map) > +{ > + int err, flags, i; > + unsigned long pfn; > + phys_addr_t mfn; > + struct gnttab_unmap_grant_ref unmap_op; > + > + flags = GNTMAP_host_map; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > + > + for (i=0; i < map->count; i++) { > + if (!map->pages[i].page) > + continue; > + > + pfn = page_to_pfn(map->pages[i].page); > + mfn = (phys_addr_t)pfn_to_kaddr(pfn); > + gnttab_set_unmap_op(&unmap_op, mfn, flags, map->pages[i].handle); > + err = HYPERVISOR_grant_table_op( > + GNTTABOP_unmap_grant_ref, &unmap_op, 1); > + WARN_ON(err); > + WARN_ON(unmap_op.status); > + } > +} > + > +static void gntdev_put_map(struct grant_map *map) > +{ > + struct gnttab_unmap_grant_ref *unmap_ops; > + int i; > + if (!map) > + return; > + if (!atomic_dec_and_test(&map->users)) > + return; > + if (debug) > + printk("%s: unmap %p (%d pages)\n", __FUNCTION__, map, map->count); > + if (map->is_mapped) { > + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * map->count, > + GFP_TEMPORARY); > + if (likely(unmap_ops)) { > + gntdev_unmap_fast(map, unmap_ops); > + kfree(unmap_ops); > + } else { > + gntdev_unmap_slow(map); > + } > + > + atomic_sub(map->count, &pages_mapped); > + } > + for (i=0; i < map->count; i++) > + if (map->pages[i].page) > + __free_page(map->pages[i].page); > + kfree(map); > +} > + > +static int gntdev_open(struct inode *inode, struct file *flip) > +{ > + struct gntdev_priv *priv; > + > + priv = kzalloc(sizeof(*priv), GFP_KERNEL); > + if (!priv) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&priv->maps); > + spin_lock_init(&priv->lock); > + > + flip->private_data = priv; > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + return 0; > +} > + > +static int gntdev_release(struct inode *inode, struct file *flip) > +{ > + struct gntdev_priv *priv = flip->private_data; > + struct grant_map *map; > + > + if (debug) { > + printk("%s: priv %p\n", __FUNCTION__, priv); > + gntdev_print_maps(priv, NULL, 0); > + } > + > + spin_lock(&priv->lock); > + while (!list_empty(&priv->maps)) { > + map = list_entry(priv->maps.next, struct grant_map, next); > + list_del(&map->next); > + gntdev_put_map(map); > + } > + spin_unlock(&priv->lock); > + > + kfree(priv); > + return 0; > +} > + > +static int gntdev_do_map(struct grant_map *map) > +{ > + int err, flags, i; > + struct page* page; > + phys_addr_t mfn; > + struct gnttab_map_grant_ref* map_ops; > + > + flags = GNTMAP_host_map; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > + > + err = -ENOMEM; > + > + if (unlikely(atomic_add_return(map->count, &pages_mapped) > limit)) { > + if (debug) > + printk("%s: maps full\n", __FUNCTION__); > + goto out; > + } > + > + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); > + if (!map_ops) > + goto out; > + > + for (i = 0; i < map->count; i++) { > + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO); > + if (unlikely(!page)) > + goto out_free; > + map->pages[i].page = page; > + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(page)); > + gnttab_set_map_op(&map_ops[i], mfn, flags, > + map->pages[i].target.ref, > + map->pages[i].target.domid); > + } > + err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, > + map_ops, map->count); > + if (WARN_ON(err)) > + goto out_free; > + > + if (debug && map->count) > + printk("%s: mapped first at gfn=%lx mfn=%lx\n", __func__, > + page_to_pfn(map->pages[0].page), pfn_to_mfn(page_to_pfn(map->pages[0].page))); > + > + map->is_mapped = 1; > + > + for (i = 0; i < map->count; i++) { > + if (map_ops[i].status) { > + if (debug) > + printk("%s: failed map at page %d: stat=%d\n", > + __FUNCTION__, i, map_ops[i].status); > + __free_page(map->pages[i].page); > + map->pages[i].page = NULL; > + err = -EINVAL; > + } else { > + map->pages[i].handle = map_ops[i].handle; > + } > + } > + > +out_free: > + kfree(map_ops); > +out: > + if (!map->is_mapped) > + atomic_sub(map->count, &pages_mapped); > + return err; > +} > + > +static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > + struct ioctl_gntdev_map_grant_ref __user *u) > +{ > + struct ioctl_gntdev_map_grant_ref op; > + struct grant_map *map; > + struct ioctl_gntdev_grant_ref* grants; > + int err; > + > + if (copy_from_user(&op, u, sizeof(op)) != 0) > + return -EFAULT; > + if (debug) > + printk("%s: priv %p, add %d\n", __FUNCTION__, priv, > + op.count); > + if (unlikely(op.count <= 0)) > + return -EINVAL; > + > + err = -ENOMEM; > + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); > + if (!grants) > + goto out_fail; > + > + err = -EFAULT; > + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) > + goto out_free; > + > + map = gntdev_alloc_map(op.count, grants); > + if (!map) > + goto out_free; > + > + gntdev_add_map(priv, map); > + > + op.index = map->index << PAGE_SHIFT; > + > + err = -EFAULT; > + if (copy_to_user(u, &op, sizeof(op)) != 0) > + goto out_remove; > + > + err = 0; > + > +out_free: > + kfree(grants); > +out_fail: > + return err; > + > +out_remove: > + gntdev_del_map(priv, map); > + gntdev_put_map(map); > + goto out_free; > +} > + > +static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, > + struct ioctl_gntdev_unmap_grant_ref __user *u) > +{ > + struct ioctl_gntdev_unmap_grant_ref op; > + struct grant_map *map; > + int err = 0; > + > + if (copy_from_user(&op, u, sizeof(op)) != 0) > + return -EFAULT; > + > + spin_lock(&priv->lock); > + map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count); > + if (map) { > + __gntdev_del_map(priv, map); > + } else > + err = -EINVAL; > + spin_unlock(&priv->lock); > + > + if (debug) > + printk("%s: priv %p, del %d+%d = %p\n", __FUNCTION__, priv, > + (int)op.index, (int)op.count, map); > + > + gntdev_put_map(map); > + return err; > +} > + > +static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > + struct ioctl_gntdev_get_offset_for_vaddr __user *u) > +{ > + struct ioctl_gntdev_get_offset_for_vaddr op; > + struct vm_area_struct *vma; > + struct grant_map *map; > + > + if (copy_from_user(&op, u, sizeof(op)) != 0) > + return -EFAULT; > + if (debug) > + printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv, > + (unsigned long)op.vaddr); > + > + vma = find_vma(current->mm, op.vaddr); > + if (!vma) > + return -EINVAL; > + > + map = vma->vm_private_data; > + if (vma->vm_ops != &gntdev_vmops || !map) > + return -EINVAL; > + > + op.offset = map->index << PAGE_SHIFT; > + op.count = map->count; > + > + if (copy_to_user(u, &op, sizeof(op)) != 0) > + return -EFAULT; > + return 0; > +} > + > +static long gntdev_ioctl(struct file *flip, > + unsigned int cmd, unsigned long arg) > +{ > + struct gntdev_priv *priv = flip->private_data; > + void __user *ptr = (void __user *)arg; > + > + switch (cmd) { > + case IOCTL_GNTDEV_MAP_GRANT_REF: > + return gntdev_ioctl_map_grant_ref(priv, ptr); > + > + case IOCTL_GNTDEV_UNMAP_GRANT_REF: > + return gntdev_ioctl_unmap_grant_ref(priv, ptr); > + > + case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: > + return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); > + > + default: > + if (debug) > + printk("%s: priv %p, unknown cmd %x\n", > + __FUNCTION__, priv, cmd); > + return -ENOIOCTLCMD; > + } > + > + return 0; > +} > + > +static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > +{ > + struct grant_map *map = vma->vm_private_data; > + pgoff_t pgoff = vmf->pgoff - vma->vm_pgoff; > + > + if (!map || !map->is_mapped || pgoff < 0 || pgoff > map->count) { > + if (debug) > + printk("%s: vaddr %p, pgoff %ld (shouldn''t happen)\n", > + __FUNCTION__, vmf->virtual_address, pgoff); > + return VM_FAULT_SIGBUS; > + } > + > + vmf->page = map->pages[pgoff].page; > + get_page(vmf->page); > + return 0; > +} > + > +static void gntdev_vma_close(struct vm_area_struct *vma) > +{ > + struct grant_map *map = vma->vm_private_data; > + gntdev_put_map(map); > +} > + > +static struct vm_operations_struct gntdev_vmops = { > + .fault = gntdev_vma_fault, > + .close = gntdev_vma_close, > +}; > + > +static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) > +{ > + struct gntdev_priv *priv = flip->private_data; > + int index = vma->vm_pgoff; > + int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; > + struct grant_map *map; > + int err = -EINVAL; > + > + if (!(vma->vm_flags & VM_SHARED)) > + return -EINVAL; > + > + spin_lock(&priv->lock); > + map = gntdev_find_map_index(priv, index, count); > + > + if (debug) > + printk("%s: map %d+%d at %lx (priv %p, map %p)\n", __func__, > + index, count, vma->vm_start, priv, map); > + > + if (!map) > + goto unlock_out; > + > + if (!map->is_mapped) { > + map->is_ro = !(vma->vm_flags & VM_WRITE); > + err = gntdev_do_map(map); > + if (err) > + goto unlock_out; > + } > + > + if ((vma->vm_flags & VM_WRITE) && map->is_ro) > + goto unlock_out; > + > + err = 0; > + vma->vm_ops = &gntdev_vmops; > + > + vma->vm_flags |= VM_RESERVED; > + vma->vm_flags |= VM_DONTEXPAND; > + vma->vm_flags |= VM_IO; > + > + vma->vm_private_data = map; > + > + atomic_inc(&map->users); > + > +unlock_out: > + spin_unlock(&priv->lock); > + return err; > +} > + > +static const struct file_operations gntdev_fops = { > + .owner = THIS_MODULE, > + .open = gntdev_open, > + .release = gntdev_release, > + .mmap = gntdev_mmap, > + .unlocked_ioctl = gntdev_ioctl > +}; > + > +static struct miscdevice gntdev_miscdev = { > + .minor = MISC_DYNAMIC_MINOR, > + .name = "xen/gntdev", > + .fops = &gntdev_fops, > +}; > + > +/* ------------------------------------------------------------------ */ > + > +static int __init gntdev_init(void) > +{ > + int err; > + > + if (!xen_domain()) > + return -ENODEV; > + > + if (!xen_feature(XENFEAT_auto_translated_physmap)) > + return -ENODEV; > + > + err = misc_register(&gntdev_miscdev); > + if (err != 0) { > + printk(KERN_ERR "Could not register gntdev device\n"); > + return err; > + } > + return 0; > +} > + > +static void __exit gntdev_exit(void) > +{ > + misc_deregister(&gntdev_miscdev); > +} > + > +module_init(gntdev_init); > +module_exit(gntdev_exit); > + > +/* ------------------------------------------------------------------ */ > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index a73f07c..f6b98c0 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -626,6 +626,9 @@ static int __init gntdev_init(void) > if (!xen_domain()) > return -ENODEV; > > + if (xen_feature(XENFEAT_auto_translated_physmap)) > + return -ENODEV; > + > err = misc_register(&gntdev_miscdev); > if (err != 0) { > printk(KERN_ERR "Could not register gntdev device\n");_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 21:52 UTC
Re: [Xen-devel] [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
On 12/14/2010 04:15 PM, Jeremy Fitzhardinge wrote:> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >> The entire hypercall argument list isn''t required; only selected >> fields from the hypercall need to be tracked between the ioctl, map, >> and unmap operations. > > Is the rationale of this patch to save memory? If so, how much does it > save. > > (This patch seems sensible in principle, but it doesn''t seem to save > much complexity.) > > JThis will also allow easier testing of what pages need to be unmapped (more obvious in the HVM version). I also find it less confusing to populate the hypercall arguments immediately before the hypercall, but that''s likely a matter of opinion. It only saves 46 bytes per page, so if it seems more complex it could be dropped.>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/gntdev.c | 195 +++++++++++++++++++++++++++++-------------------- >> 1 files changed, 115 insertions(+), 80 deletions(-) >> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >> index 62639af..773b76c 100644 >> --- a/drivers/xen/gntdev.c >> +++ b/drivers/xen/gntdev.c >> @@ -55,17 +55,23 @@ struct gntdev_priv { >> struct mmu_notifier mn; >> }; >> >> +struct granted_page { >> + u64 pte_maddr; >> + union { >> + struct ioctl_gntdev_grant_ref target; >> + grant_handle_t handle; >> + }; >> +}; >> + >> struct grant_map { >> struct list_head next; >> struct gntdev_priv *priv; >> struct vm_area_struct *vma; >> int index; >> int count; >> - int flags; >> - int is_mapped; >> - struct ioctl_gntdev_grant_ref *grants; >> - struct gnttab_map_grant_ref *map_ops; >> - struct gnttab_unmap_grant_ref *unmap_ops; >> + int is_mapped:1; >> + int is_ro:1; >> + struct granted_page pages[0]; >> }; >> >> /* ------------------------------------------------------------------ */ >> @@ -83,40 +89,28 @@ static void gntdev_print_maps(struct gntdev_priv *priv, >> map->index == text_index && text ? text : ""); >> } >> >> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) >> +static struct grant_map *gntdev_alloc_map(int count, >> + struct ioctl_gntdev_grant_ref* grants) >> { >> struct grant_map *add; >> + int i; >> >> - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL); >> - if (NULL == add) >> + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL); >> + if (!add) >> return NULL; >> >> - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL); >> - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL); >> - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL); >> - if (NULL == add->grants || >> - NULL == add->map_ops || >> - NULL == add->unmap_ops) >> - goto err; >> - >> - add->index = 0; >> add->count = count; >> - add->priv = priv; >> + for(i = 0; i < count; i++) >> + add->pages[i].target = grants[i]; >> >> return add; >> - >> -err: >> - kfree(add->grants); >> - kfree(add->map_ops); >> - kfree(add->unmap_ops); >> - kfree(add); >> - return NULL; >> } >> >> static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) >> { >> struct grant_map *map; >> >> + spin_lock(&priv->lock); >> list_for_each_entry(map, &priv->maps, next) { >> if (add->index + add->count < map->index) { >> list_add_tail(&add->next, &map->next); >> @@ -127,8 +121,10 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) >> list_add_tail(&add->next, &priv->maps); >> >> done: >> + add->priv = priv; >> if (debug) >> gntdev_print_maps(priv, "[new]", add->index); >> + spin_unlock(&priv->lock); >> } >> >> static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, >> @@ -169,9 +165,10 @@ static int gntdev_del_map(struct grant_map *map) >> >> if (map->vma) >> return -EBUSY; >> - for (i = 0; i < map->count; i++) >> - if (map->unmap_ops[i].handle) >> - return -EBUSY; >> + if (map->is_mapped) >> + for (i = 0; i < map->count; i++) >> + if (map->pages[i].handle) >> + return -EBUSY; >> >> atomic_sub(map->count, &pages_mapped); >> list_del(&map->next); >> @@ -182,9 +179,6 @@ static void gntdev_free_map(struct grant_map *map) >> { >> if (!map) >> return; >> - kfree(map->grants); >> - kfree(map->map_ops); >> - kfree(map->unmap_ops); >> kfree(map); >> } >> >> @@ -199,51 +193,91 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void >> BUG_ON(pgnr >= map->count); >> pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT; >> pte_maddr += (unsigned long)pte & ~PAGE_MASK; >> - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags, >> - map->grants[pgnr].ref, >> - map->grants[pgnr].domid); >> - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags, >> - 0 /* handle */); >> + map->pages[pgnr].pte_maddr = pte_maddr; >> + >> return 0; >> } >> >> static int map_grant_pages(struct grant_map *map) >> { >> - int i, err = 0; >> + int i, flags, err = 0; >> + struct gnttab_map_grant_ref* map_ops; >> >> + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; >> + if (map->is_ro) >> + flags |= GNTMAP_readonly; >> + >> + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); >> + if (!map_ops) >> + return -ENOMEM; >> + >> + for(i=0; i < map->count; i++) >> + gnttab_set_map_op(&map_ops[i], map->pages[i].pte_maddr, flags, >> + map->pages[i].target.ref, >> + map->pages[i].target.domid); >> if (debug) >> printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count); >> err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, >> - map->map_ops, map->count); >> + map_ops, map->count); >> if (WARN_ON(err)) >> - return err; >> + goto out; >> >> for (i = 0; i < map->count; i++) { >> - if (map->map_ops[i].status) >> + if (map_ops[i].status) { >> + map->pages[i].pte_maddr = 0; >> err = -EINVAL; >> - map->unmap_ops[i].handle = map->map_ops[i].handle; >> + } else { >> + map->pages[i].handle = map_ops[i].handle; >> + } >> } >> + >> +out: >> + kfree(map_ops); >> return err; >> } >> >> -static int unmap_grant_pages(struct grant_map *map, int offset, int pages) >> +static void unmap_grant_pages(struct grant_map *map, int offset, int pages) >> { >> - int i, err = 0; >> + int i, flags, err = 0; >> + struct gnttab_unmap_grant_ref *unmap_ops; >> + struct gnttab_unmap_grant_ref unmap_single; >> + >> + if (pages > 1) { >> + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages, >> + GFP_TEMPORARY); >> + if (unlikely(!unmap_ops)) { >> + for(i=0; i < pages; i++) >> + unmap_grant_pages(map, offset + i, 1); >> + return; >> + } >> + } else { >> + unmap_ops = &unmap_single; >> + } >> >> + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; >> + if (map->is_ro) >> + flags |= GNTMAP_readonly; >> + >> + for(i=0; i < pages; i++) >> + gnttab_set_unmap_op(&unmap_ops[i], >> + map->pages[offset+i].pte_maddr, flags, >> + map->pages[offset+i].handle); >> if (debug) >> printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__, >> map->index, map->count, offset, pages); >> err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, >> - map->unmap_ops + offset, pages); >> + unmap_ops, pages); >> if (WARN_ON(err)) >> - return err; >> + goto out; >> >> for (i = 0; i < pages; i++) { >> - if (map->unmap_ops[offset+i].status) >> - err = -EINVAL; >> - map->unmap_ops[offset+i].handle = 0; >> + WARN_ON(unmap_ops[i].status); >> + map->pages[offset+i].pte_maddr = 0; >> + map->pages[offset+i].handle = 0; >> } >> - return err; >> +out: >> + if (unmap_ops != &unmap_single) >> + kfree(unmap_ops); >> } >> >> /* ------------------------------------------------------------------ */ >> @@ -282,7 +316,6 @@ static void mn_invl_range_start(struct mmu_notifier *mn, >> struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); >> struct grant_map *map; >> unsigned long mstart, mend; >> - int err; >> >> spin_lock(&priv->lock); >> list_for_each_entry(map, &priv->maps, next) { >> @@ -301,10 +334,9 @@ static void mn_invl_range_start(struct mmu_notifier *mn, >> __FUNCTION__, map->index, map->count, >> map->vma->vm_start, map->vma->vm_end, >> start, end, mstart, mend); >> - err = unmap_grant_pages(map, >> - (mstart - map->vma->vm_start) >> PAGE_SHIFT, >> - (mend - mstart) >> PAGE_SHIFT); >> - WARN_ON(err); >> + unmap_grant_pages(map, >> + (mstart - map->vma->vm_start) >> PAGE_SHIFT, >> + (mend - mstart) >> PAGE_SHIFT); >> } >> spin_unlock(&priv->lock); >> } >> @@ -321,7 +353,6 @@ static void mn_release(struct mmu_notifier *mn, >> { >> struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); >> struct grant_map *map; >> - int err; >> >> spin_lock(&priv->lock); >> list_for_each_entry(map, &priv->maps, next) { >> @@ -331,8 +362,7 @@ static void mn_release(struct mmu_notifier *mn, >> printk("%s: map %d+%d (%lx %lx)\n", >> __FUNCTION__, map->index, map->count, >> map->vma->vm_start, map->vma->vm_end); >> - err = unmap_grant_pages(map, 0, map->count); >> - WARN_ON(err); >> + unmap_grant_pages(map, 0, map->count); >> } >> spin_unlock(&priv->lock); >> } >> @@ -401,6 +431,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, >> { >> struct ioctl_gntdev_map_grant_ref op; >> struct grant_map *map; >> + struct ioctl_gntdev_grant_ref* grants; >> int err; >> >> if (copy_from_user(&op, u, sizeof(op)) != 0) >> @@ -411,38 +442,45 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, >> if (unlikely(op.count <= 0)) >> return -EINVAL; >> >> + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); >> + if (!grants) >> + return -ENOMEM; >> + >> + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) { >> + err = -EFAULT; >> + goto out_free; >> + } >> + >> err = -ENOMEM; >> - map = gntdev_alloc_map(priv, op.count); >> + map = gntdev_alloc_map(op.count, grants); >> if (!map) >> - return err; >> - >> - if (copy_from_user(map->grants, &u->refs, >> - sizeof(map->grants[0]) * op.count) != 0) { >> - gntdev_free_map(map); >> - return err; >> - } >> + goto out_free; >> >> if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) >> { >> if (debug) >> printk("%s: can''t map: over limit\n", __FUNCTION__); >> - gntdev_free_map(map); >> - return err; >> + goto out_free_map; >> } >> >> - spin_lock(&priv->lock); >> gntdev_add_map(priv, map); >> op.index = map->index << PAGE_SHIFT; >> - spin_unlock(&priv->lock); >> >> - if (copy_to_user(u, &op, sizeof(op)) != 0) { >> - spin_lock(&priv->lock); >> - gntdev_del_map(map); >> - spin_unlock(&priv->lock); >> - gntdev_free_map(map); >> - return err; >> + if (copy_to_user(u, &op, sizeof(op))) { >> + err = -EFAULT; >> + goto out_remove; >> } >> - return 0; >> + err = 0; >> +out_free: >> + kfree(grants); >> + return err; >> +out_remove: >> + spin_lock(&priv->lock); >> + gntdev_del_map(map); >> + spin_unlock(&priv->lock); >> +out_free_map: >> + gntdev_free_map(map); >> + goto out_free; >> } >> >> static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, >> @@ -556,10 +594,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) >> >> vma->vm_private_data = map; >> map->vma = vma; >> - >> - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; >> - if (!(vma->vm_flags & VM_WRITE)) >> - map->flags |= GNTMAP_readonly; >> + map->is_ro = !(vma->vm_flags & VM_WRITE); >> >> spin_unlock(&priv->lock); >> >-- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:54 UTC
Re: [Xen-devel] [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:> -static int unmap_grant_pages(struct grant_map *map, int offset, int pages) > +static void unmap_grant_pages(struct grant_map *map, int offset, int pages) > { > - int i, err = 0; > + int i, flags, err = 0; > + struct gnttab_unmap_grant_ref *unmap_ops; > + struct gnttab_unmap_grant_ref unmap_single; > + > + if (pages > 1) { > + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages, > + GFP_TEMPORARY); > + if (unlikely(!unmap_ops)) { > + for(i=0; i < pages; i++) > + unmap_grant_pages(map, offset + i, 1); > + return; > + } > + } else { > + unmap_ops = &unmap_single; > + }Rather than doing this, it would be better to use the multicall batching API, in particular xen_mc_extend_args() - see xen_extend_mmu_update() for an example. (This would mean promoting arch/x86/xen/multicall.h to include/xen/multicall.h and breaking ia64 builds until there''s an ia64 implementation of that API, but that seems like a fair tradeoff at this point.) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 21:56 UTC
Re: [Xen-devel] [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
On 12/14/2010 01:52 PM, Daniel De Graaf wrote:> On 12/14/2010 04:15 PM, Jeremy Fitzhardinge wrote: >> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >>> The entire hypercall argument list isn''t required; only selected >>> fields from the hypercall need to be tracked between the ioctl, map, >>> and unmap operations. >> Is the rationale of this patch to save memory? If so, how much does it >> save. >> >> (This patch seems sensible in principle, but it doesn''t seem to save >> much complexity.) >> >> J > This will also allow easier testing of what pages need to be unmapped > (more obvious in the HVM version). I also find it less confusing to > populate the hypercall arguments immediately before the hypercall, but > that''s likely a matter of opinion. It only saves 46 bytes per page, so > if it seems more complex it could be dropped.I like it in general. See the other mail I just sent - you can use the multicall API to remove all the allocations for the arguments, and that should help a lot. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 22:06 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/14/2010 04:42 PM, Jeremy Fitzhardinge wrote:> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >> This allows a userspace application to allocate a shared page for >> implementing inter-domain communication or device drivers. These >> shared pages can be mapped using the gntdev device or by the kernel >> in another domain. >> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> >> --- >> drivers/xen/Kconfig | 7 + >> drivers/xen/Makefile | 2 + >> drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++ >> include/xen/gntalloc.h | 68 +++++++ >> 4 files changed, 533 insertions(+), 0 deletions(-) >> create mode 100644 drivers/xen/gntalloc.c >> create mode 100644 include/xen/gntalloc.h >> >> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig >> index fa9982e..8398cb0 100644 >> --- a/drivers/xen/Kconfig >> +++ b/drivers/xen/Kconfig >> @@ -180,6 +180,13 @@ config XEN_GNTDEV >> help >> Allows userspace processes use grants. >> >> +config XEN_GRANT_DEV_ALLOC >> + tristate "User-space grant reference allocator driver" >> + depends on XEN >> + help >> + Allows userspace processes to create pages with access granted >> + to other domains. >> + >> config XEN_S3 >> def_bool y >> depends on XEN_DOM0 && ACPI >> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile >> index ef1ea63..9814c1d 100644 >> --- a/drivers/xen/Makefile >> +++ b/drivers/xen/Makefile >> @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o >> obj-$(CONFIG_XEN_BALLOON) += balloon.o >> obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o >> obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o >> +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o >> obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ >> obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ >> obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/ >> @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o >> >> xen-evtchn-y := evtchn.o >> xen-gntdev-y := gntdev.o >> +xen-gntalloc-y := gntalloc.o >> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c >> new file mode 100644 >> index 0000000..f26adfd >> --- /dev/null >> +++ b/drivers/xen/gntalloc.c >> @@ -0,0 +1,456 @@ >> +/****************************************************************************** >> + * gntalloc.c >> + * >> + * Device for creating grant references (in user-space) that may be shared >> + * with other domains. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program; if not, write to the Free Software >> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA >> + */ >> + >> +/* >> + * This driver exists to allow userspace programs in Linux to allocate kernel >> + * memory that will later be shared with another domain. Without this device, >> + * Linux userspace programs cannot create grant references. >> + * >> + * How this stuff works: >> + * X -> granting a page to Y >> + * Y -> mapping the grant from X >> + * >> + * 1. X uses the gntalloc device to allocate a page of kernel memory, P. >> + * 2. X creates an entry in the grant table that says domid(Y) can >> + * access P. >> + * 3. X gives the grant reference identifier, GREF, to Y. >> + * 4. A program in Y uses the gntdev device to map the page (owned by X >> + * and identified by GREF) into domain(Y) and then into the address >> + * space of the program. Behind the scenes, this requires a >> + * hypercall in which Xen modifies the host CPU page tables to >> + * perform the sharing -- that''s where the actual cross-domain mapping >> + * occurs. > > Presumably Y could be any grant-page user, not specifically gntdev? So > you could use this to implement a frontend in userspace?Yes, this is correct; the use of gntdev here was intended as an example, not as the only user>> + * 5. A program in X mmap()s a segment of the gntalloc device that >> + * corresponds to the shared page. >> + * 6. The two userspace programs can now communicate over the shared page. >> + * >> + * >> + * NOTE TO USERSPACE LIBRARIES: >> + * The grant allocation and mmap()ing are, naturally, two separate >> + * operations. You set up the sharing by calling the create ioctl() and >> + * then the mmap(). You must tear down the sharing in the reverse order >> + * (munmap() and then the destroy ioctl()). >> + * >> + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant >> + * reference, this device can be used to consume kernel memory by leaving grant >> + * references mapped by another domain when an application exits. Therefore, >> + * there is a global limit on the number of pages that can be allocated. When >> + * all references to the page are unmapped, it will be freed during the next >> + * grant operation. >> + */ >> + >> +#include <asm/atomic.h> >> +#include <linux/module.h> >> +#include <linux/miscdevice.h> >> +#include <linux/kernel.h> >> +#include <linux/init.h> >> +#include <linux/slab.h> >> +#include <linux/fs.h> >> +#include <linux/device.h> >> +#include <linux/mm.h> >> +#include <asm/uaccess.h> >> +#include <linux/types.h> >> +#include <linux/list.h> >> + >> +#include <xen/xen.h> >> +#include <xen/page.h> >> +#include <xen/grant_table.h> >> +#include <xen/gntalloc.h> >> + >> +static int debug = 0; >> +module_param(debug, int, 0644); >> + >> +static int limit = 1024; >> +module_param(limit, int, 0644); >> + >> +static LIST_HEAD(gref_list); >> +static DEFINE_SPINLOCK(gref_lock); >> +static int gref_size = 0; >> + >> +/* Metadata on a grant reference. */ >> +struct gntalloc_gref { >> + struct list_head next_all; /* list entry gref_list */ >> + struct list_head next_file; /* list entry file->list, if open */ >> + domid_t foreign_domid; /* The ID of the domain to share with. */ >> + grant_ref_t gref_id; /* The grant reference number. */ >> + unsigned int users; /* Use count - when zero, waiting on Xen */ >> + struct page* page; /* The shared page. */ >> +}; >> + >> +struct gntalloc_file_private_data { >> + struct list_head list; >> +}; >> + >> +static void __del_gref(struct gntalloc_gref *gref); >> + >> +static void do_cleanup(void) >> +{ >> + struct gntalloc_gref *gref, *n; >> + list_for_each_entry_safe(gref, n, &gref_list, next_all) { >> + if (!gref->users) >> + __del_gref(gref); >> + } >> +} >> + >> + >> +static int add_gref(domid_t foreign_domid, uint32_t readonly, >> + struct gntalloc_file_private_data *priv) >> +{ >> + int rc; >> + struct gntalloc_gref *gref; >> + >> + rc = -ENOMEM; >> + spin_lock(&gref_lock); >> + do_cleanup(); >> + if (gref_size >= limit) { >> + spin_unlock(&gref_lock); >> + rc = -ENOSPC; >> + goto out; >> + } >> + gref_size++; >> + spin_unlock(&gref_lock); >> + >> + gref = kzalloc(sizeof(*gref), GFP_KERNEL); >> + if (!gref) >> + goto out; >> + >> + gref->foreign_domid = foreign_domid; >> + gref->users = 1; >> + >> + /* Allocate the page to share. */ >> + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO); > > Could this be GFP_HIGHUSER?Yes, that''s probably a better flag to use here.>> + if (!gref->page) >> + goto out_nopage; >> + >> + /* Grant foreign access to the page. */ >> + gref->gref_id = gnttab_grant_foreign_access(foreign_domid, >> + pfn_to_mfn(page_to_pfn(gref->page)), readonly); >> + if (gref->gref_id < 0) { >> + printk(KERN_ERR "%s: failed to grant foreign access for mfn " >> + "%lu to domain %u\n", __func__, >> + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid); >> + rc = -EFAULT; >> + goto out_no_foreign_gref; >> + } >> + >> + /* Add to gref lists. */ >> + spin_lock(&gref_lock); >> + list_add_tail(&gref->next_all, &gref_list); >> + list_add_tail(&gref->next_file, &priv->list); >> + spin_unlock(&gref_lock); >> + >> + return gref->gref_id; >> + >> +out_no_foreign_gref: >> + __free_page(gref->page); >> +out_nopage: >> + kfree(gref); >> +out: >> + return rc; >> +} >> + >> +static void __del_gref(struct gntalloc_gref *gref) >> +{ >> + if (gnttab_query_foreign_access(gref->gref_id)) >> + return; >> + >> + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0)) >> + return; >> + >> + gref_size--; >> + list_del(&gref->next_all); >> + >> + __free_page(gref->page); >> + kfree(gref); >> +} >> + >> +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv, >> + grant_ref_t gref_id) >> +{ >> + struct gntalloc_gref *gref; >> + list_for_each_entry(gref, &priv->list, next_file) { >> + if (gref->gref_id == gref_id) >> + return gref; >> + } >> + return NULL; >> +} >> + >> +/* >> + * ------------------------------------- >> + * File operations. >> + * ------------------------------------- >> + */ >> +static int gntalloc_open(struct inode *inode, struct file *filp) >> +{ >> + struct gntalloc_file_private_data *priv; >> + >> + try_module_get(THIS_MODULE); >> + >> + priv = kzalloc(sizeof(*priv), GFP_KERNEL); >> + if (!priv) >> + goto out_nomem; >> + INIT_LIST_HEAD(&priv->list); >> + >> + filp->private_data = priv; >> + >> + if (debug) >> + printk("%s: priv %p\n", __FUNCTION__, priv); >> + >> + return 0; >> + >> +out_nomem: >> + return -ENOMEM; >> +} >> + >> +static int gntalloc_release(struct inode *inode, struct file *filp) >> +{ >> + struct gntalloc_file_private_data *priv = filp->private_data; >> + struct gntalloc_gref *gref; >> + >> + if (debug) >> + printk("%s: priv %p\n", __FUNCTION__, priv); >> + >> + spin_lock(&gref_lock); > > Presumably this is unnecessary because there can be no other users if > you''re tearing down the list and destroying priv. > >> + while (!list_empty(&priv->list)) { >> + gref = list_entry(priv->list.next, >> + struct gntalloc_gref, next_file); >> + list_del(&gref->next_file); >> + gref->users--; >> + if (gref->users == 0) >> + __del_gref(gref); >> + } >> + kfree(priv); >> + spin_unlock(&gref_lock); >> + >> + module_put(THIS_MODULE); >> + >> + return 0; >> +} >> + >> +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, >> + void __user *arg) >> +{ >> + int rc = 0; >> + struct ioctl_gntalloc_alloc_gref op; >> + >> + if (debug) >> + printk("%s: priv %p\n", __FUNCTION__, priv); >> + >> + if (copy_from_user(&op, arg, sizeof(op))) { >> + rc = -EFAULT; >> + goto alloc_grant_out; >> + } >> + rc = add_gref(op.foreign_domid, op.readonly, priv); >> + if (rc < 0) >> + goto alloc_grant_out; >> + >> + op.gref_id = rc; >> + op.page_idx = rc; > > Hm, see below. > >> + >> + rc = 0; >> + >> + if (copy_to_user((void __user *)arg, &op, sizeof(op))) { >> + rc = -EFAULT; >> + goto alloc_grant_out; >> + } >> + >> +alloc_grant_out: >> + return rc; >> +} >> + >> +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, >> + void __user *arg) >> +{ >> + int rc = 0; >> + struct ioctl_gntalloc_dealloc_gref op; >> + struct gntalloc_gref *gref; >> + >> + if (debug) >> + printk("%s: priv %p\n", __FUNCTION__, priv); >> + >> + if (copy_from_user(&op, arg, sizeof(op))) { >> + rc = -EFAULT; >> + goto dealloc_grant_out; >> + } >> + >> + spin_lock(&gref_lock); >> + gref = find_gref(priv, op.gref_id); >> + if (gref) { >> + list_del(&gref->next_file); >> + gref->users--; >> + rc = 0; >> + } else { >> + rc = -EINVAL; >> + } >> + >> + do_cleanup(); >> + spin_unlock(&gref_lock); >> +dealloc_grant_out: >> + return rc; >> +} >> + >> +static long gntalloc_ioctl(struct file *filp, unsigned int cmd, >> + unsigned long arg) >> +{ >> + struct gntalloc_file_private_data *priv = filp->private_data; >> + >> + switch (cmd) { >> + case IOCTL_GNTALLOC_ALLOC_GREF: >> + return gntalloc_ioctl_alloc(priv, (void __user*)arg); >> + >> + case IOCTL_GNTALLOC_DEALLOC_GREF: >> + return gntalloc_ioctl_dealloc(priv, (void __user*)arg); >> + >> + default: >> + return -ENOIOCTLCMD; >> + } >> + >> + return 0; >> +} >> + >> +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) >> +{ >> + struct gntalloc_gref *gref = vma->vm_private_data; >> + if (!gref) >> + return VM_FAULT_SIGBUS; >> + >> + vmf->page = gref->page; >> + get_page(vmf->page); >> + >> + return 0; >> +}; >> + >> +static void gntalloc_vma_close(struct vm_area_struct *vma) >> +{ >> + struct gntalloc_gref *gref = vma->vm_private_data; >> + if (!gref) >> + return; >> + >> + spin_lock(&gref_lock); >> + gref->users--; >> + if (gref->users == 0) >> + __del_gref(gref); >> + spin_unlock(&gref_lock); >> +} >> + >> +static struct vm_operations_struct gntalloc_vmops = { >> + .fault = gntalloc_vma_fault, >> + .close = gntalloc_vma_close, >> +}; >> + >> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) >> +{ >> + struct gntalloc_file_private_data *priv = filp->private_data; >> + struct gntalloc_gref *gref; >> + >> + if (debug) >> + printk("%s: priv %p, page %lu\n", __func__, >> + priv, vma->vm_pgoff); >> + >> + /* >> + * There is a 1-to-1 correspondence of grant references to shared >> + * pages, so it only makes sense to map exactly one page per >> + * call to mmap(). >> + */ > > Single-page mmap makes sense if the only possible use-cases are for > single-page mappings, but if you''re talking about framebuffers and the > like is seems like a very awkward way to use mmap. It would be cleaner > from an API perspective to have a user-mode defined flat address space > indexed by pgoff which maps to an array of grefs, so you can sensibly do > a multi-page mapping. > > It would also allow you to hide the grefs from usermode entirely. Then > its just up to usermode to choose suitable file offsets for itself.I considered this, but wanted to keep userspace compatability with the previously created interface. If there''s no reason to avoid doing so, I''ll change the ioctl interface to allocate an array of grants and calculate the offset similar to how gntdev does currently (picks a suitable open slot). Userspace does still have to know about grefs, of course, but only to pass to the domain doing the mapping, not for its own mmap().>> + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) { >> + printk(KERN_ERR "%s: Only one page can be memory-mapped " >> + "per grant reference.\n", __func__); >> + return -EINVAL; >> + } >> + >> + if (!(vma->vm_flags & VM_SHARED)) { >> + printk(KERN_ERR "%s: Mapping must be shared.\n", >> + __func__); >> + return -EINVAL; >> + } >> + >> + spin_lock(&gref_lock); >> + gref = find_gref(priv, vma->vm_pgoff); >> + if (gref == NULL) { >> + spin_unlock(&gref_lock); >> + printk(KERN_ERR "%s: Could not find a grant reference with " >> + "page index %lu.\n", __func__, vma->vm_pgoff); >> + return -ENOENT; >> + } >> + gref->users++; >> + spin_unlock(&gref_lock); >> + >> + vma->vm_private_data = gref; >> + >> + /* This flag prevents Bad PTE errors when the memory is unmapped. */ >> + vma->vm_flags |= VM_RESERVED; >> + vma->vm_flags |= VM_DONTCOPY; >> + vma->vm_flags |= VM_IO; > > If you set VM_PFNMAP then you don''t need to deal with faults.Useful to know. Is that more efficient/preferred to defining a .fault handler? I used this method because that''s what is used in kernel/relay.c.>> + >> + vma->vm_ops = &gntalloc_vmops; >> + >> + return 0; >> +} >> + >> +static const struct file_operations gntalloc_fops = { >> + .owner = THIS_MODULE, >> + .open = gntalloc_open, >> + .release = gntalloc_release, >> + .unlocked_ioctl = gntalloc_ioctl, >> + .mmap = gntalloc_mmap >> +}; >> + >> +/* >> + * ------------------------------------- >> + * Module creation/destruction. >> + * ------------------------------------- >> + */ >> +static struct miscdevice gntalloc_miscdev = { >> + .minor = MISC_DYNAMIC_MINOR, >> + .name = "xen/gntalloc", >> + .fops = &gntalloc_fops, >> +}; >> + >> +static int __init gntalloc_init(void) >> +{ >> + int err; >> + >> + if (!xen_domain()) { >> + if (debug) >> + printk(KERN_ERR "gntalloc: You must be running Xen\n"); >> + return -ENODEV; >> + } >> + >> + err = misc_register(&gntalloc_miscdev); >> + if (err != 0) { >> + printk(KERN_ERR "Could not register misc gntalloc device\n"); >> + return err; >> + } >> + >> + if (debug) >> + printk(KERN_INFO "Created grant allocation device at %d,%d\n", >> + MISC_MAJOR, gntalloc_miscdev.minor); >> + >> + return 0; >> +} >> + >> +static void __exit gntalloc_exit(void) >> +{ >> + misc_deregister(&gntalloc_miscdev); >> +} >> + >> +module_init(gntalloc_init); >> +module_exit(gntalloc_exit); >> + >> +MODULE_LICENSE("GPL"); >> +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, " >> + "Daniel De Graaf <dgdegra@tycho.nsa.gov>"); >> +MODULE_DESCRIPTION("User-space grant reference allocator driver"); >> diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h >> new file mode 100644 >> index 0000000..76b70d7 >> --- /dev/null >> +++ b/include/xen/gntalloc.h >> @@ -0,0 +1,68 @@ >> +/****************************************************************************** >> + * gntalloc.h >> + * >> + * Interface to /dev/xen/gntalloc. >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU General Public License version 2 >> + * as published by the Free Software Foundation; or, when distributed >> + * separately from the Linux kernel or incorporated into other >> + * software packages, subject to the following license: >> + * >> + * Permission is hereby granted, free of charge, to any person obtaining a copy >> + * of this source file (the "Software"), to deal in the Software without >> + * restriction, including without limitation the rights to use, copy, modify, >> + * merge, publish, distribute, sublicense, and/or sell copies of the Software, >> + * and to permit persons to whom the Software is furnished to do so, subject to >> + * the following conditions: >> + * >> + * The above copyright notice and this permission notice shall be included in >> + * all copies or substantial portions of the Software. >> + * >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING >> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS >> + * IN THE SOFTWARE. >> + */ >> + >> +#ifndef __LINUX_PUBLIC_GNTALLOC_H__ >> +#define __LINUX_PUBLIC_GNTALLOC_H__ >> + >> +/* >> + * Allocates a new page and creates a new grant reference. >> + * >> + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it''s the >> + * page number and not an actual address. It must be shifted again prior >> + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT). >> + */ >> +#define IOCTL_GNTALLOC_ALLOC_GREF \ >> +_IOC(_IOC_NONE, ''G'', 1, sizeof(struct ioctl_gntalloc_alloc_gref)) >> +struct ioctl_gntalloc_alloc_gref { >> + /* IN parameters */ >> + /* The ID of the domain creating the grant reference. */ >> + domid_t owner_domid; >> + /* The ID of the domain to be given access to the grant. */ >> + domid_t foreign_domid; >> + /* The type of access given to domid. */ >> + uint32_t readonly; >> + /* OUT parameters */ >> + /* The grant reference of the newly created grant. */ >> + grant_ref_t gref_id; >> + /* The page index (page number, NOT address) for grant mmap(). */ >> + uint32_t page_idx; >> +}; >> + >> +/* >> + * Deallocates the grant reference, freeing the associated page. >> + */ >> +#define IOCTL_GNTALLOC_DEALLOC_GREF \ >> +_IOC(_IOC_NONE, ''G'', 2, sizeof(struct ioctl_gntalloc_dealloc_gref)) >> +struct ioctl_gntalloc_dealloc_gref { >> + /* IN parameter */ >> + /* The grant reference to deallocate. */ >> + grant_ref_t gref_id; >> +}; >> +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */ > > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-14 22:27 UTC
Re: [Xen-devel] [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev
On 12/14/2010 04:45 PM, Jeremy Fitzhardinge wrote:> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >> HVM does not allow direct PTE modification, so guest pages must be >> allocated and used for grant mappings. If Xen does not provide an >> auto-translated physmap, the existing direct PTE modification is more >> efficient. > > I haven''t looked at this in detail, but doing a complete duplication of > the driver doesn''t seem like a great idea. > > Also, Stefano and I have been working on a modification to gntdev to > make sure that each granted page is backed with a proper struct page (so > that direct aio works on them). Would this help the hvm case? > > Thanks, > JOnly if this also allocates a guest frame number that can be passed to Xen in order to eliminate GNTMAP_contains_pte (which cannot work in HVM). I do have a first attempt to get this version working in the PV case (I can share it if you''re interested); mapping works, but unmap currently leaves the system in a bad state, triggering a number of BUG_ONs. One problem that I noticed soon after posting is that Xen does not restore the original mapping when the unmap hypercall is run; instead, it replaces the page with one that discards writes and returns all one bits on read. In the PV case, it seems like integrating with the balloon device would be the proper way to request that Xen restore a real MFN under the GFNs that used to map granted pages. I have tried to use XENMEM_populate_physmap to request that Xen restore page mappings, but this seems to be a noop on HVM. As an alternative to completely duplicating the module, the module could be expanded to check xen_feature(XENFEAT_auto_translated_physmap) and perform the correct type of mapping based on that. Eliminating PTE modification completely will make the module more maintainable, however. -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-14 22:40 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/14/2010 02:06 PM, Daniel De Graaf wrote:>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) >>> +{ >>> + struct gntalloc_file_private_data *priv = filp->private_data; >>> + struct gntalloc_gref *gref; >>> + >>> + if (debug) >>> + printk("%s: priv %p, page %lu\n", __func__, >>> + priv, vma->vm_pgoff); >>> + >>> + /* >>> + * There is a 1-to-1 correspondence of grant references to shared >>> + * pages, so it only makes sense to map exactly one page per >>> + * call to mmap(). >>> + */ >> Single-page mmap makes sense if the only possible use-cases are for >> single-page mappings, but if you''re talking about framebuffers and the >> like is seems like a very awkward way to use mmap. It would be cleaner >> from an API perspective to have a user-mode defined flat address space >> indexed by pgoff which maps to an array of grefs, so you can sensibly do >> a multi-page mapping. >> >> It would also allow you to hide the grefs from usermode entirely. Then >> its just up to usermode to choose suitable file offsets for itself. > I considered this, but wanted to keep userspace compatability with the > previously created interface.Is that private to you, or something in broader use?> If there''s no reason to avoid doing so, I''ll > change the ioctl interface to allocate an array of grants and calculate the > offset similar to how gntdev does currently (picks a suitable open slot).I guess there''s three options: you could get the kernel to allocate extents, make usermode do it, or have one fd per extent and always start from offset 0. I guess the last could get very messy if you want to have lots of mappings... Making usermode define the offsets seems simplest and most flexible, because then they can stitch together the file-offset space in any way that''s convenient to them (you just need to deal with overlaps in that space).> Userspace does still have to know about grefs, of course, but only to pass > to the domain doing the mapping, not for its own mmap().Ah, yes, of course.>>> + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) { >>> + printk(KERN_ERR "%s: Only one page can be memory-mapped " >>> + "per grant reference.\n", __func__); >>> + return -EINVAL; >>> + } >>> + >>> + if (!(vma->vm_flags & VM_SHARED)) { >>> + printk(KERN_ERR "%s: Mapping must be shared.\n", >>> + __func__); >>> + return -EINVAL; >>> + } >>> + >>> + spin_lock(&gref_lock); >>> + gref = find_gref(priv, vma->vm_pgoff); >>> + if (gref == NULL) { >>> + spin_unlock(&gref_lock); >>> + printk(KERN_ERR "%s: Could not find a grant reference with " >>> + "page index %lu.\n", __func__, vma->vm_pgoff); >>> + return -ENOENT; >>> + } >>> + gref->users++; >>> + spin_unlock(&gref_lock); >>> + >>> + vma->vm_private_data = gref; >>> + >>> + /* This flag prevents Bad PTE errors when the memory is unmapped. */ >>> + vma->vm_flags |= VM_RESERVED; >>> + vma->vm_flags |= VM_DONTCOPY; >>> + vma->vm_flags |= VM_IO; >> If you set VM_PFNMAP then you don''t need to deal with faults. > Useful to know. Is that more efficient/preferred to defining a > .fault handler? I used this method because that''s what is used in > kernel/relay.c.Well, as you currently have it, your mmap() function doesn''t map anything, so you''re relying on demand faulting to populate the ptes. Since you''ve already allocated the pages that''s just a soft fault, but it means you end up with a lot of per-page entries into the hypervisor. If you make mmap pre-populate all the ptes (a nice big fat batch operation), then you should never get faults on the vma. You can set PFNMAP to make sure of that (since you''re already setting all the "woowoo vma" flags, that makes sense). Its actual meaning is "this vma contains pages which are not really kernel memory, so paging them doesn''t make sense" - ie, device or foreign memory (we use it in gntdev). In this case, the pages are normal kernel pages but they''re being given over to a "device" and so are no longer subject to normal kernel lifetime rules. So I think PFNMAP makes sense in this case too. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-15 09:47 UTC
Re: [Xen-devel] [PATCH 1/6] xen-gntdev: Fix circular locking dependency
On Tue, 2010-12-14 at 21:40 +0000, Daniel De Graaf wrote:> On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote: > > On 12/14/2010 06:55 AM, Daniel De Graaf wrote: > >> apply_to_page_range will acquire PTE lock while priv->lock is held, and > >> mn_invl_range_start tries to acquire priv->lock with PTE already held. > >> Fix by not holding priv->lock during the entire map operation. > > > > Is priv->lock needed to protect the contents of map? > > > > J > > No, since the map can only be mapped once (checked by map->vma assignment > while the lock is held). The unmap ioctl will return -EBUSY until > an munmap() is called on the area, so it also can''t race, and the other > users are only active once the mmap operation completes.Sounds reasonable enough to me. There are a few unlocked accesses to vma->map: gntdev_del_map (when called from gntdev_ioctl_map_grant_ref) gntdev_vma_close are these safe? If so then it would be worth a comment about why. Anyway this patch appears to resolve the lockdep warning I was seeing with 2.6.37 with qemu-xen backed block devices. Ian.> > > > >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > >> --- > >> drivers/xen/gntdev.c | 19 +++++++++---------- > >> 1 files changed, 9 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > >> index a33e443..c582804 100644 > >> --- a/drivers/xen/gntdev.c > >> +++ b/drivers/xen/gntdev.c > >> @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) > >> if (!(vma->vm_flags & VM_WRITE)) > >> map->flags |= GNTMAP_readonly; > >> > >> + spin_unlock(&priv->lock); > >> + > >> err = apply_to_page_range(vma->vm_mm, vma->vm_start, > >> vma->vm_end - vma->vm_start, > >> find_grant_ptes, map); > >> - if (err) { > >> - goto unlock_out; > >> - if (debug) > >> - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__); > >> - } > >> + if (err) > >> + return err; > >> > >> err = map_grant_pages(map); > >> - if (err) { > >> - goto unlock_out; > >> - if (debug) > >> - printk("%s: map_grant_pages() failure.\n", __FUNCTION__); > >> - } > >> + if (err) > >> + return err; > >> + > >> map->is_mapped = 1; > >> > >> + return 0; > >> + > >> unlock_out: > >> spin_unlock(&priv->lock); > >> return err; > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-15 09:50 UTC
Re: [Xen-devel] [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
On Tue, 2010-12-14 at 21:42 +0000, Daniel De Graaf wrote:> On 12/14/2010 04:12 PM, Jeremy Fitzhardinge wrote: > > On 12/14/2010 06:55 AM, Daniel De Graaf wrote: > >> Because there is no limitation on how many times a user can open a > >> given device file, an per-file-description limit on the number of > >> pages granted offers little to no benefit. Change to a global limit > >> and remove the ioctl() as the parameter can now be changed via sysfs. > > > > Does anyone use this ioctl? Wouldn''t it be safer to replace it with a > > no-op version? > > > > J > > > > I do not know of any users. If it''s preferred to replace with a noop ioctl > instead of the current -ENOSYS return, that''s easy to do.It''s called by xc_gnttab_set_max_grants in libxc, although I don''t see any callers of that function. We should probably also remove the library function. Ian.> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > >> --- > >> drivers/xen/gntdev.c | 48 ++++++++++++++---------------------------------- > >> 1 files changed, 14 insertions(+), 34 deletions(-) > >> > >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > >> index c582804..62639af 100644 > >> --- a/drivers/xen/gntdev.c > >> +++ b/drivers/xen/gntdev.c > >> @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver"); > >> > >> static int debug = 0; > >> module_param(debug, int, 0644); > >> -static int limit = 1024; > >> +static int limit = 1024*1024; > >> module_param(limit, int, 0644); > >> +static atomic_t pages_mapped = ATOMIC_INIT(0); > >> > >> struct gntdev_priv { > >> struct list_head maps; > >> - uint32_t used; > >> - uint32_t limit; > >> spinlock_t lock; > >> struct mm_struct *mm; > >> struct mmu_notifier mn; > >> @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv, > >> { > >> struct grant_map *map; > >> > >> - printk("%s: maps list (priv %p, usage %d/%d)\n", > >> - __FUNCTION__, priv, priv->used, priv->limit); > >> + printk("%s: maps list (priv %p)\n", > >> + __FUNCTION__, priv); > >> list_for_each_entry(map, &priv->maps, next) > >> printk(" index %2d, count %2d %s\n", > >> map->index, map->count, > >> @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) > >> add->count = count; > >> add->priv = priv; > >> > >> - if (add->count + priv->used > priv->limit) > >> - goto err; > >> - > >> return add; > >> > >> err: > >> @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > >> list_add_tail(&add->next, &priv->maps); > >> > >> done: > >> - priv->used += add->count; > >> if (debug) > >> gntdev_print_maps(priv, "[new]", add->index); > >> } > >> @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map) > >> if (map->unmap_ops[i].handle) > >> return -EBUSY; > >> > >> - map->priv->used -= map->count; > >> + atomic_sub(map->count, &pages_mapped); > >> list_del(&map->next); > >> return 0; > >> } > >> @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip) > >> > >> INIT_LIST_HEAD(&priv->maps); > >> spin_lock_init(&priv->lock); > >> - priv->limit = limit; > >> > >> priv->mm = get_task_mm(current); > >> if (!priv->mm) { > >> @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > >> op.count); > >> if (unlikely(op.count <= 0)) > >> return -EINVAL; > >> - if (unlikely(op.count > priv->limit)) > >> - return -EINVAL; > >> > >> err = -ENOMEM; > >> map = gntdev_alloc_map(priv, op.count); > >> if (!map) > >> return err; > >> + > >> if (copy_from_user(map->grants, &u->refs, > >> sizeof(map->grants[0]) * op.count) != 0) { > >> gntdev_free_map(map); > >> return err; > >> } > >> > >> + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) > >> + { > >> + if (debug) > >> + printk("%s: can''t map: over limit\n", __FUNCTION__); > >> + gntdev_free_map(map); > >> + return err; > >> + } > >> + > >> spin_lock(&priv->lock); > >> gntdev_add_map(priv, map); > >> op.index = map->index << PAGE_SHIFT; > >> @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > >> return 0; > >> } > >> > >> -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv, > >> - struct ioctl_gntdev_set_max_grants __user *u) > >> -{ > >> - struct ioctl_gntdev_set_max_grants op; > >> - > >> - if (copy_from_user(&op, u, sizeof(op)) != 0) > >> - return -EFAULT; > >> - if (debug) > >> - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count); > >> - if (op.count > limit) > >> - return -EINVAL; > >> - > >> - spin_lock(&priv->lock); > >> - priv->limit = op.count; > >> - spin_unlock(&priv->lock); > >> - return 0; > >> -} > >> - > >> static long gntdev_ioctl(struct file *flip, > >> unsigned int cmd, unsigned long arg) > >> { > >> @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip, > >> case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: > >> return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); > >> > >> - case IOCTL_GNTDEV_SET_MAX_GRANTS: > >> - return gntdev_ioctl_set_max_grants(priv, ptr); > >> - > >> default: > >> if (debug) > >> printk("%s: priv %p, unknown cmd %x\n", > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-15 09:58 UTC
Re: [Xen-devel] [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
On Tue, 2010-12-14 at 21:20 +0000, Jeremy Fitzhardinge wrote:> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: > > @@ -74,6 +74,8 @@ struct grant_map { > > struct granted_page pages[0]; > > }; > > > > +static struct vm_operations_struct gntdev_vmops; > > Is it OK for this to be all NULL?It''s a forward declaration, the actual definition is later on. I think it would be clearer to move the existing declaration (plus gntdev_vma_{close,fault}) before gntdev_ioctl_get_offset_for_vaddr or vice versa. Oh wait, gntdev_ioctl_get_offset_for_vaddr is already after gntdev_vmops, isn''t it? Unless it changed in patches 2 or 3 which I don''t think it did? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-15 14:18 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/14/2010 05:40 PM, Jeremy Fitzhardinge wrote:> On 12/14/2010 02:06 PM, Daniel De Graaf wrote: >>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) >>>> +{ >>>> + struct gntalloc_file_private_data *priv = filp->private_data; >>>> + struct gntalloc_gref *gref; >>>> + >>>> + if (debug) >>>> + printk("%s: priv %p, page %lu\n", __func__, >>>> + priv, vma->vm_pgoff); >>>> + >>>> + /* >>>> + * There is a 1-to-1 correspondence of grant references to shared >>>> + * pages, so it only makes sense to map exactly one page per >>>> + * call to mmap(). >>>> + */ >>> Single-page mmap makes sense if the only possible use-cases are for >>> single-page mappings, but if you''re talking about framebuffers and the >>> like is seems like a very awkward way to use mmap. It would be cleaner >>> from an API perspective to have a user-mode defined flat address space >>> indexed by pgoff which maps to an array of grefs, so you can sensibly do >>> a multi-page mapping. >>> >>> It would also allow you to hide the grefs from usermode entirely. Then >>> its just up to usermode to choose suitable file offsets for itself. >> I considered this, but wanted to keep userspace compatability with the >> previously created interface. > > Is that private to you, or something in broader use?This module was used as part of Qubes (http://www.qubes-os.org). The device path has changed (/dev/gntalloc to /dev/xen/gntalloc), and the API change adds useful functionality, so I don''t think we must keep compatibility. This will also allow cleaning up the interface to remove parameters that make no sense (owner_domid, for example).>> If there''s no reason to avoid doing so, I''ll >> change the ioctl interface to allocate an array of grants and calculate the >> offset similar to how gntdev does currently (picks a suitable open slot). > > I guess there''s three options: you could get the kernel to allocate > extents, make usermode do it, or have one fd per extent and always start > from offset 0. I guess the last could get very messy if you want to > have lots of mappings... Making usermode define the offsets seems > simplest and most flexible, because then they can stitch together the > file-offset space in any way that''s convenient to them (you just need to > deal with overlaps in that space).Would it be useful to also give userspace control over the offsets in gntdev? One argument for doing it in the kernel is to avoid needing to track what offsets are already being used (and then having the kernel re-check that). While this isn''t hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in order to relieve userspace of the need to track its mappings, so this seems to have been a concern before. Another use case of gntalloc that may prove useful is to have more than one application able to map the same grant within the kernel. For gntdev, this can be done by mapping the pages multiple times (although that may not be the most efficient). Moving the gntalloc lists out of the file private data would allow any user of gntalloc to map any shared page. The primary reason not to do this is that it prevents automatic cleanup of allocated pages on close (which is important when the userspace app doing the mapping crashes).>> Userspace does still have to know about grefs, of course, but only to pass >> to the domain doing the mapping, not for its own mmap(). > > Ah, yes, of course. > >>>> + >>>> + /* This flag prevents Bad PTE errors when the memory is unmapped. */ >>>> + vma->vm_flags |= VM_RESERVED; >>>> + vma->vm_flags |= VM_DONTCOPY; >>>> + vma->vm_flags |= VM_IO; >>> If you set VM_PFNMAP then you don''t need to deal with faults. >> Useful to know. Is that more efficient/preferred to defining a >> .fault handler? I used this method because that''s what is used in >> kernel/relay.c. > > Well, as you currently have it, your mmap() function doesn''t map > anything, so you''re relying on demand faulting to populate the ptes. > Since you''ve already allocated the pages that''s just a soft fault, but > it means you end up with a lot of per-page entries into the hypervisor. > > If you make mmap pre-populate all the ptes (a nice big fat batch > operation), then you should never get faults on the vma. You can set > PFNMAP to make sure of that (since you''re already setting all the > "woowoo vma" flags, that makes sense). > > Its actual meaning is "this vma contains pages which are not really > kernel memory, so paging them doesn''t make sense" - ie, device or > foreign memory (we use it in gntdev). > > In this case, the pages are normal kernel pages but they''re being given > over to a "device" and so are no longer subject to normal kernel > lifetime rules. So I think PFNMAP makes sense in this case too. > > > J >Agreed; once mapped, the frame numbers (GFN & MFN) won''t change until they are unmapped, so pre-populating them will be better. -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-16 00:27 UTC
Re: [Xen-devel] [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
On 12/15/2010 01:50 AM, Ian Campbell wrote:> On Tue, 2010-12-14 at 21:42 +0000, Daniel De Graaf wrote: >> On 12/14/2010 04:12 PM, Jeremy Fitzhardinge wrote: >>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >>>> Because there is no limitation on how many times a user can open a >>>> given device file, an per-file-description limit on the number of >>>> pages granted offers little to no benefit. Change to a global limit >>>> and remove the ioctl() as the parameter can now be changed via sysfs. >>> Does anyone use this ioctl? Wouldn''t it be safer to replace it with a >>> no-op version? >>> >>> J >>> >> I do not know of any users. If it''s preferred to replace with a noop ioctl >> instead of the current -ENOSYS return, that''s easy to do. > It''s called by xc_gnttab_set_max_grants in libxc, although I don''t see > any callers of that function. > > We should probably also remove the library function.Seems like the right thing to do. If its original semantics were iffy and nobody is using it, then there''s no point keeping vestigial code around. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-16 00:28 UTC
Re: [Xen-devel] [PATCH 1/6] xen-gntdev: Fix circular locking dependency
On 12/15/2010 01:47 AM, Ian Campbell wrote:> On Tue, 2010-12-14 at 21:40 +0000, Daniel De Graaf wrote: >> On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote: >>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >>>> apply_to_page_range will acquire PTE lock while priv->lock is held, and >>>> mn_invl_range_start tries to acquire priv->lock with PTE already held. >>>> Fix by not holding priv->lock during the entire map operation. >>> Is priv->lock needed to protect the contents of map? >>> >>> J >> No, since the map can only be mapped once (checked by map->vma assignment >> while the lock is held). The unmap ioctl will return -EBUSY until >> an munmap() is called on the area, so it also can''t race, and the other >> users are only active once the mmap operation completes. > Sounds reasonable enough to me. There are a few unlocked accesses to > vma->map: > gntdev_del_map (when called from gntdev_ioctl_map_grant_ref) > gntdev_vma_close > are these safe? If so then it would be worth a comment about why. > > Anyway this patch appears to resolve the lockdep warning I was seeing > with 2.6.37 with qemu-xen backed block devices.Good. Stefano should stick this on his patch queue. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-16 00:29 UTC
Re: [Xen-devel] [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
On 12/15/2010 01:58 AM, Ian Campbell wrote:> On Tue, 2010-12-14 at 21:20 +0000, Jeremy Fitzhardinge wrote: >> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: >>> @@ -74,6 +74,8 @@ struct grant_map { >>> struct granted_page pages[0]; >>> }; >>> >>> +static struct vm_operations_struct gntdev_vmops; >> Is it OK for this to be all NULL? > It''s a forward declaration, the actual definition is later on.(Ugh, always hated that syntax.) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-16 01:05 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/15/2010 06:18 AM, Daniel De Graaf wrote:> On 12/14/2010 05:40 PM, Jeremy Fitzhardinge wrote: >> On 12/14/2010 02:06 PM, Daniel De Graaf wrote: >>>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) >>>>> +{ >>>>> + struct gntalloc_file_private_data *priv = filp->private_data; >>>>> + struct gntalloc_gref *gref; >>>>> + >>>>> + if (debug) >>>>> + printk("%s: priv %p, page %lu\n", __func__, >>>>> + priv, vma->vm_pgoff); >>>>> + >>>>> + /* >>>>> + * There is a 1-to-1 correspondence of grant references to shared >>>>> + * pages, so it only makes sense to map exactly one page per >>>>> + * call to mmap(). >>>>> + */ >>>> Single-page mmap makes sense if the only possible use-cases are for >>>> single-page mappings, but if you''re talking about framebuffers and the >>>> like is seems like a very awkward way to use mmap. It would be cleaner >>>> from an API perspective to have a user-mode defined flat address space >>>> indexed by pgoff which maps to an array of grefs, so you can sensibly do >>>> a multi-page mapping. >>>> >>>> It would also allow you to hide the grefs from usermode entirely. Then >>>> its just up to usermode to choose suitable file offsets for itself. >>> I considered this, but wanted to keep userspace compatability with the >>> previously created interface. >> Is that private to you, or something in broader use? > This module was used as part of Qubes (http://www.qubes-os.org). The device > path has changed (/dev/gntalloc to /dev/xen/gntalloc), and the API change > adds useful functionality, so I don''t think we must keep compatibility. This > will also allow cleaning up the interface to remove parameters that make no > sense (owner_domid, for example).Ah, right. Well that means it has at least been prototyped, but I don''t think we should be constrained by the original ABI if we can make clear improvements.>>> If there''s no reason to avoid doing so, I''ll >>> change the ioctl interface to allocate an array of grants and calculate the >>> offset similar to how gntdev does currently (picks a suitable open slot). >> I guess there''s three options: you could get the kernel to allocate >> extents, make usermode do it, or have one fd per extent and always start >> from offset 0. I guess the last could get very messy if you want to >> have lots of mappings... Making usermode define the offsets seems >> simplest and most flexible, because then they can stitch together the >> file-offset space in any way that''s convenient to them (you just need to >> deal with overlaps in that space). > Would it be useful to also give userspace control over the offsets in gntdev? > > One argument for doing it in the kernel is to avoid needing to track what > offsets are already being used (and then having the kernel re-check that).Hm, yeah, that could be a bit fiddly. I guess you''d need to stick them into an rbtree or something.> While this isn''t hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in > order to relieve userspace of the need to track its mappings, so this > seems to have been a concern before.It would be nice to have them symmetric. However, its easy to implement GET_OFFSET_FOR_VADDR either way - given a vaddr, you can look up the vma and return its pgoff. It looks like GET_OFFSET_FOR_VADDR is just used in xc_gnttab_munmap() so that libxc can recover the offset and the page count from the vaddr, so that it can pass them to IOCTL_GNTDEV_UNMAP_GRANT_REF. Also, it seems to fail unmaps which don''t exactly correspond to a MAP_GRANT_REF. I guess that''s OK, but it looks a bit strange.> Another use case of gntalloc that may prove useful is to have more than > one application able to map the same grant within the kernel.So you mean have gntalloc allocate one page and the allow multiple processes to map and use it? In that case it would probably be best implemented as a filesystem, so you can give proper globally visible names to the granted regions, and mmap them as normal files, like shm.> Agreed; once mapped, the frame numbers (GFN & MFN) won''t change until > they are unmapped, so pre-populating them will be better.Unless of course you don''t want to map the pages in dom0 at all; if you just want dom0 to be a facilitator for shared pages between two other domains. Does Xen allow a page to be granted to more than one domain at once? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Dec-16 15:09 UTC
Re: [Xen-devel] [PATCH 1/6] xen-gntdev: Fix circular locking dependency
On Thu, 16 Dec 2010, Jeremy Fitzhardinge wrote:> On 12/15/2010 01:47 AM, Ian Campbell wrote: > > On Tue, 2010-12-14 at 21:40 +0000, Daniel De Graaf wrote: > >> On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote: > >>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote: > >>>> apply_to_page_range will acquire PTE lock while priv->lock is held, and > >>>> mn_invl_range_start tries to acquire priv->lock with PTE already held. > >>>> Fix by not holding priv->lock during the entire map operation. > >>> Is priv->lock needed to protect the contents of map? > >>> > >>> J > >> No, since the map can only be mapped once (checked by map->vma assignment > >> while the lock is held). The unmap ioctl will return -EBUSY until > >> an munmap() is called on the area, so it also can''t race, and the other > >> users are only active once the mmap operation completes. > > Sounds reasonable enough to me. There are a few unlocked accesses to > > vma->map: > > gntdev_del_map (when called from gntdev_ioctl_map_grant_ref) > > gntdev_vma_close > > are these safe? If so then it would be worth a comment about why. > > > > Anyway this patch appears to resolve the lockdep warning I was seeing > > with 2.6.37 with qemu-xen backed block devices. > > Good. Stefano should stick this on his patch queue.Agreed. I''ll add it in the next version. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-16 15:22 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/15/2010 08:05 PM, Jeremy Fitzhardinge wrote:> On 12/15/2010 06:18 AM, Daniel De Graaf wrote: >> On 12/14/2010 05:40 PM, Jeremy Fitzhardinge wrote: >>> On 12/14/2010 02:06 PM, Daniel De Graaf wrote: >>>>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) >>>>>> +{ >>>>>> + struct gntalloc_file_private_data *priv = filp->private_data; >>>>>> + struct gntalloc_gref *gref; >>>>>> + >>>>>> + if (debug) >>>>>> + printk("%s: priv %p, page %lu\n", __func__, >>>>>> + priv, vma->vm_pgoff); >>>>>> + >>>>>> + /* >>>>>> + * There is a 1-to-1 correspondence of grant references to shared >>>>>> + * pages, so it only makes sense to map exactly one page per >>>>>> + * call to mmap(). >>>>>> + */ >>>>> Single-page mmap makes sense if the only possible use-cases are for >>>>> single-page mappings, but if you''re talking about framebuffers and the >>>>> like is seems like a very awkward way to use mmap. It would be cleaner >>>>> from an API perspective to have a user-mode defined flat address space >>>>> indexed by pgoff which maps to an array of grefs, so you can sensibly do >>>>> a multi-page mapping. >>>>> >>>>> It would also allow you to hide the grefs from usermode entirely. Then >>>>> its just up to usermode to choose suitable file offsets for itself. >>>> I considered this, but wanted to keep userspace compatability with the >>>> previously created interface. >>> Is that private to you, or something in broader use? >> This module was used as part of Qubes (http://www.qubes-os.org). The device >> path has changed (/dev/gntalloc to /dev/xen/gntalloc), and the API change >> adds useful functionality, so I don''t think we must keep compatibility. This >> will also allow cleaning up the interface to remove parameters that make no >> sense (owner_domid, for example). > > Ah, right. Well that means it has at least been prototyped, but I don''t > think we should be constrained by the original ABI if we can make clear > improvements. > >>>> If there''s no reason to avoid doing so, I''ll >>>> change the ioctl interface to allocate an array of grants and calculate the >>>> offset similar to how gntdev does currently (picks a suitable open slot). >>> I guess there''s three options: you could get the kernel to allocate >>> extents, make usermode do it, or have one fd per extent and always start >>> from offset 0. I guess the last could get very messy if you want to >>> have lots of mappings... Making usermode define the offsets seems >>> simplest and most flexible, because then they can stitch together the >>> file-offset space in any way that''s convenient to them (you just need to >>> deal with overlaps in that space). >> Would it be useful to also give userspace control over the offsets in gntdev? >> >> One argument for doing it in the kernel is to avoid needing to track what >> offsets are already being used (and then having the kernel re-check that). > > Hm, yeah, that could be a bit fiddly. I guess you''d need to stick them > into an rbtree or something.Another option that provides more flexibility - have a flag in the create operation, similar to MAP_FIXED in mmap, that allows userspace to mandate the offset if it wants control, but default to letting the kernel handle it. We already have a flags field for making the grant writable, this is just another bit.>> While this isn''t hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in >> order to relieve userspace of the need to track its mappings, so this >> seems to have been a concern before. > > It would be nice to have them symmetric. However, its easy to implement > GET_OFFSET_FOR_VADDR either way - given a vaddr, you can look up the vma > and return its pgoff. > > It looks like GET_OFFSET_FOR_VADDR is just used in xc_gnttab_munmap() so > that libxc can recover the offset and the page count from the vaddr, so > that it can pass them to IOCTL_GNTDEV_UNMAP_GRANT_REF. > > Also, it seems to fail unmaps which don''t exactly correspond to a > MAP_GRANT_REF. I guess that''s OK, but it looks a bit strange.So, implementing an IOCTL_GNTALLOC_GET_OFFSET_FOR_VADDR would be useful in order to allow gntalloc munmap() to be similar to gnttab''s. If we want to allow a given offset to be mapped to multiple domains, we couldn''t just return the offset; it would have to be a list of grant references, and the destroy ioctl would need to take the grant reference.>> Another use case of gntalloc that may prove useful is to have more than >> one application able to map the same grant within the kernel. > > So you mean have gntalloc allocate one page and the allow multiple > processes to map and use it? In that case it would probably be best > implemented as a filesystem, so you can give proper globally visible > names to the granted regions, and mmap them as normal files, like shm.That seems like a better way to expose this functionality. I didn''t have a use case for multiple processes mapping a grant, just didn''t want to prevent doing it in the future if it was a trivial change. Since it''s more complex to implement a filesystem, I think someone needs to find a use for it before it''s written. I believe the current code lets you map the areas in multiple processes if you pass the file descriptor around with fork() or using unix sockets; that seems sufficient to me.>> Agreed; once mapped, the frame numbers (GFN & MFN) won''t change until >> they are unmapped, so pre-populating them will be better. > > Unless of course you don''t want to map the pages in dom0 at all; if you > just want dom0 to be a facilitator for shared pages between two other > domains. Does Xen allow a page to be granted to more than one domain at > once?I think so; you''d have to use multiple grant table entries in order to do that, and it might trigger some hypervisor warnings when the shared page has an unexpectedly high refcount when being unmapped (HVM mappings already trigger such warnings, so perhaps they need to be removed). I can''t think of an immediate use for a 3-domain shared page, but that doesn''t mean that such a use doesn''t exist. Perhaps some kind of shared page cache, exported using read-only grants by dom0? Having dom0 create a pair of grant table entries to let two domUs communicate seems like a hack to get around the lack of gntalloc in either domU. This module works perfectly fine in a domU, and the use of a shared page must be asymmetric no matter how you set it up, so you don''t save all that much code by making the setup identical. Anyway, you don''t have to call mmap() to let another domain access the shared pages; they are mappable as soon as the ioctl() returns, and remain so until you call the removal ioctl(). So if you do call mmap(), you probably expect to use the mapping. -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-16 19:14 UTC
Re: [Xen-devel] [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
On 12/16/2010 07:22 AM, Daniel De Graaf wrote:>> Hm, yeah, that could be a bit fiddly. I guess you''d need to stick them >> into an rbtree or something. > Another option that provides more flexibility - have a flag in the create > operation, similar to MAP_FIXED in mmap, that allows userspace to mandate > the offset if it wants control, but default to letting the kernel handle it. > We already have a flags field for making the grant writable, this is just > another bit.I''d go for just implementing one way of doing it unless there''s a clear need for each of them. The choose-your-own-offset route is looking pretty complex. If you have the kernel allocate the offsets, but perhaps with the guarantee that subsequent allocations will get consecutive offsets, then that lets usermode set up a group of pages which can be mapped with a single mmap call, which is all I was really aiming for.>>> While this isn''t hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in >>> order to relieve userspace of the need to track its mappings, so this >>> seems to have been a concern before. >> It would be nice to have them symmetric. However, its easy to implement >> GET_OFFSET_FOR_VADDR either way - given a vaddr, you can look up the vma >> and return its pgoff. >> >> It looks like GET_OFFSET_FOR_VADDR is just used in xc_gnttab_munmap() so >> that libxc can recover the offset and the page count from the vaddr, so >> that it can pass them to IOCTL_GNTDEV_UNMAP_GRANT_REF. >> >> Also, it seems to fail unmaps which don''t exactly correspond to a >> MAP_GRANT_REF. I guess that''s OK, but it looks a bit strange. > So, implementing an IOCTL_GNTALLOC_GET_OFFSET_FOR_VADDR would be useful in > order to allow gntalloc munmap() to be similar to gnttab''s. If we want to > allow a given offset to be mapped to multiple domains, we couldn''t just > return the offset; it would have to be a list of grant references, and > the destroy ioctl would need to take the grant reference.See below.>>> Another use case of gntalloc that may prove useful is to have more than >>> one application able to map the same grant within the kernel. >> So you mean have gntalloc allocate one page and the allow multiple >> processes to map and use it? In that case it would probably be best >> implemented as a filesystem, so you can give proper globally visible >> names to the granted regions, and mmap them as normal files, like shm. > That seems like a better way to expose this functionality. I didn''t have > a use case for multiple processes mapping a grant, just didn''t want to > prevent doing it in the future if it was a trivial change. Since it''s > more complex to implement a filesystem, I think someone needs to find a > use for it before it''s written. I believe the current code lets you map > the areas in multiple processes if you pass the file descriptor around > with fork() or using unix sockets; that seems sufficient to me.That raises another quirk in the gntdev (which I think also applies to gntalloc) API - the relationship between munmap and IOCTL_GNTDEV_UNMAP_GRANT_REF. The current behaviour is that the ioctl will fail with EBUSY if there are still mappings of the granted pages. It would probably be better to make the map refcounted by the number of vmas+1 pointing to it so that the UNMAP_GRANT_REF would drop a reference, as would each vma as its unmapped, with the actual ungranting happening at ref==0. That would allow multiple uncoordinated processes to use the same mappings without having to work out who''s doing the cleanup. This would also allow auto-ungranting maps (xc_gnttab_map_grant_ref() could do the UNMAP_GRANT_REF immediately after the mmap(), so that xc_gnttab_munmap() can simply munmap() without the need for GET_OFFSET_FOR_VADDR).> Anyway, you don''t have to call mmap() to let another domain access the shared > pages; they are mappable as soon as the ioctl() returns, and remain so > until you call the removal ioctl(). So if you do call mmap(), you probably > expect to use the mapping.Yep. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel