Daniel De Graaf
2010-Dec-03 15:36 UTC
[Xen-devel] [PATCH 0/2] Userspace grant communication
For fast communication between userspace applications in different domains, it is useful to be able to set up a shared memory page. This can be used to implement device driver frontends and backends completely in userspace, or as a faster alternative to network communication. The current gntdev is limited to PV domains, and does not allow grants to be created. The following patches change gntdev to remapping existing pages, allowing the same code to be used in PV and HVM, and add a gntalloc driver to allow mappings to be created by userspace. These changes also make the mappings more application- friendly: the mmap() calls can be made multiple times, persist across fork(), and allow the device to be closed without invalidating the mapped areas. This matches the behavior of mmap() on a normal file. API changes from the existing /dev/xen/gntdev: The unused "pad" field in ioctl_gntdev_map_grant_ref is now used for flags on the mapping (currently used to specify if the mapping should be writable). This provides sufficient information to perform the mapping when the ioctl is called. To retain compatibility with current userspace, a new ioctl number is used for this functionality and the legacy error on first mapping is retained when the old ioctl is used. IOCTL_GNTDEV_SET_MAX_GRANTS is not exposed in the Xen userspace libraries, and is not very useful: it cannot be used to raise the limit of grants per file descriptor, and is trivial to bypass by opening the device multiple times. This version uses a global limit specified as a module parameter (modifiable at runtime via sysfs). -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-03 15:37 UTC
[Xen-devel] [PATCH 1/2] xen-gntdev: support mapping in HVM domains
This changes the /dev/xen/gntdev device to work in HVM domains, and also makes the mmap() behavior more closely match the behavior of files instead of requiring an ioctl() call for each mmap/munmap. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/Kconfig | 3 +- drivers/xen/gntdev.c | 612 +++++++++++++++++++++++--------------------------- include/xen/gntdev.h | 9 +- 3 files changed, 295 insertions(+), 329 deletions(-) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index fa9982e..a9f3a8f 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -176,9 +176,8 @@ config XEN_XENBUS_FRONTEND config XEN_GNTDEV tristate "userspace grant access device driver" depends on XEN - select MMU_NOTIFIER help - Allows userspace processes use grants. + Allows userspace processes to map grants from other domains. config XEN_S3 def_bool y diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index a33e443..15f5c9c 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -44,31 +44,37 @@ MODULE_DESCRIPTION("User-space granted page access driver"); static int debug = 0; module_param(debug, int, 0644); -static int limit = 1024; + +static int limit = 1024*1024; module_param(limit, int, 0644); +static atomic_t pages_mapped = ATOMIC_INIT(0); + struct gntdev_priv { - struct list_head maps; - uint32_t used; - uint32_t limit; spinlock_t lock; - struct mm_struct *mm; - struct mmu_notifier mn; + struct list_head maps; +}; + +struct granted_page { + struct page* page; + union { + struct ioctl_gntdev_grant_ref target; + grant_handle_t handle; + }; }; struct grant_map { - struct list_head next; - struct gntdev_priv *priv; - struct vm_area_struct *vma; - int index; - int count; - int flags; - int is_mapped; - struct ioctl_gntdev_grant_ref *grants; - struct gnttab_map_grant_ref *map_ops; - struct gnttab_unmap_grant_ref *unmap_ops; + struct list_head next; /* next in file */ + int index; /* offset in parent */ + int count; /* size in pages */ + atomic_t users; /* reference count */ + unsigned int is_mapped:1; /* has map hypercall been run? */ + unsigned int is_ro:1; /* is the map read-only? */ + struct granted_page pages[0]; /* pages used for mapping */ }; +static struct vm_operations_struct gntdev_vmops; + /* ------------------------------------------------------------------ */ static void gntdev_print_maps(struct gntdev_priv *priv, @@ -76,51 +82,46 @@ static void gntdev_print_maps(struct gntdev_priv *priv, { struct grant_map *map; - printk("%s: maps list (priv %p, usage %d/%d)\n", - __FUNCTION__, priv, priv->used, priv->limit); - list_for_each_entry(map, &priv->maps, next) - printk(" index %2d, count %2d %s\n", - map->index, map->count, + printk("%s: maps list (priv %p)\n", __FUNCTION__, priv); + list_for_each_entry(map, &priv->maps, next) { + printk(" %p: %2d+%2d, r%c, %s %d,%d %s\n", map, + map->index, map->count, map->is_ro ? ''o'' : ''w'', + map->is_mapped ? "use,hnd" : "dom,ref", + map->is_mapped ? atomic_read(&map->users) + : map->pages[0].target.domid, + map->is_mapped ? map->pages[0].handle + : map->pages[0].target.ref, map->index == text_index && text ? text : ""); + } } -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) +static struct grant_map *gntdev_alloc_map(int count, + struct ioctl_gntdev_grant_ref* grants) { struct grant_map *add; + int i; - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL); - if (NULL == add) + add = kzalloc(sizeof(struct grant_map) + + sizeof(struct granted_page) * count, GFP_KERNEL); + if (!add) return NULL; - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL); - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL); - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL); - if (NULL == add->grants || - NULL == add->map_ops || - NULL == add->unmap_ops) - goto err; - - add->index = 0; + atomic_set(&add->users, 1); add->count = count; - add->priv = priv; - if (add->count + priv->used > priv->limit) - goto err; + for (i = 0; i < count; i++) + add->pages[i].target = grants[i]; return add; - -err: - kfree(add->grants); - kfree(add->map_ops); - kfree(add->unmap_ops); - kfree(add); - return NULL; } static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) { struct grant_map *map; + spin_lock(&priv->lock); + + /* Try to fit in the new mapping as early as possible */ list_for_each_entry(map, &priv->maps, next) { if (add->index + add->count < map->index) { list_add_tail(&add->next, &map->next); @@ -131,225 +132,116 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) list_add_tail(&add->next, &priv->maps); done: - priv->used += add->count; if (debug) gntdev_print_maps(priv, "[new]", add->index); + + spin_unlock(&priv->lock); } -static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, - int count) +static void __gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) { - struct grant_map *map; + list_del(&map->next); +} - list_for_each_entry(map, &priv->maps, next) { - if (map->index != index) - continue; - if (map->count != count) - continue; - return map; - } - return NULL; +static void gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) +{ + spin_lock(&priv->lock); + __gntdev_del_map(priv, map); + spin_unlock(&priv->lock); } -static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv, - unsigned long vaddr) +static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, + int count) { struct grant_map *map; list_for_each_entry(map, &priv->maps, next) { - if (!map->vma) - continue; - if (vaddr < map->vma->vm_start) + if (map->index != index) continue; - if (vaddr >= map->vma->vm_end) + if (map->count != count) continue; return map; } return NULL; } -static int gntdev_del_map(struct grant_map *map) -{ - int i; - - if (map->vma) - return -EBUSY; - for (i = 0; i < map->count; i++) - if (map->unmap_ops[i].handle) - return -EBUSY; - - map->priv->used -= map->count; - list_del(&map->next); - return 0; -} - -static void gntdev_free_map(struct grant_map *map) -{ - if (!map) - return; - kfree(map->grants); - kfree(map->map_ops); - kfree(map->unmap_ops); - kfree(map); -} - -/* ------------------------------------------------------------------ */ - -static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void *data) -{ - struct grant_map *map = data; - unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT; - u64 pte_maddr; - - BUG_ON(pgnr >= map->count); - pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT; - pte_maddr += (unsigned long)pte & ~PAGE_MASK; - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags, - map->grants[pgnr].ref, - map->grants[pgnr].domid); - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags, - 0 /* handle */); - return 0; -} - -static int map_grant_pages(struct grant_map *map) +static void gntdev_unmap_fast(struct grant_map *map, + struct gnttab_unmap_grant_ref *unmap_ops) { - int i, err = 0; + int err, flags, i, unmap_size = 0; + phys_addr_t mfn; - if (debug) - printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count); - err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, - map->map_ops, map->count); - if (WARN_ON(err)) - return err; + flags = GNTMAP_host_map; + if (map->is_ro) + flags |= GNTMAP_readonly; - for (i = 0; i < map->count; i++) { - if (map->map_ops[i].status) - err = -EINVAL; - map->unmap_ops[i].handle = map->map_ops[i].handle; + for (i=0; i < map->count; i++) { + if (!map->pages[i].page) + continue; + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(map->pages[i].page)); + gnttab_set_unmap_op(&unmap_ops[unmap_size], mfn, flags, + map->pages[i].handle); + unmap_size++; } - return err; -} - -static int unmap_grant_pages(struct grant_map *map, int offset, int pages) -{ - int i, err = 0; - if (debug) - printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__, - map->index, map->count, offset, pages); err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, - map->unmap_ops + offset, pages); - if (WARN_ON(err)) - return err; + unmap_ops, unmap_size); + WARN_ON(err); - for (i = 0; i < pages; i++) { - if (map->unmap_ops[offset+i].status) - err = -EINVAL; - map->unmap_ops[offset+i].handle = 0; - } - return err; + for (i = 0; i < unmap_size; i++) + WARN_ON(unmap_ops[i].status); } -/* ------------------------------------------------------------------ */ - -static void gntdev_vma_close(struct vm_area_struct *vma) +// for the out-of-memory case +static void gntdev_unmap_slow(struct grant_map *map) { - struct grant_map *map = vma->vm_private_data; + int err, flags, i; + phys_addr_t mfn; + struct gnttab_unmap_grant_ref unmap_op; - if (debug) - printk("%s\n", __FUNCTION__); - map->is_mapped = 0; - map->vma = NULL; - vma->vm_private_data = NULL; -} + flags = GNTMAP_host_map; + if (map->is_ro) + flags |= GNTMAP_readonly; -static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) -{ - if (debug) - printk("%s: vaddr %p, pgoff %ld (shouldn''t happen)\n", - __FUNCTION__, vmf->virtual_address, vmf->pgoff); - vmf->flags = VM_FAULT_ERROR; - return 0; -} - -static struct vm_operations_struct gntdev_vmops = { - .close = gntdev_vma_close, - .fault = gntdev_vma_fault, -}; - -/* ------------------------------------------------------------------ */ - -static void mn_invl_range_start(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long start, unsigned long end) -{ - struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); - struct grant_map *map; - unsigned long mstart, mend; - int err; - - spin_lock(&priv->lock); - list_for_each_entry(map, &priv->maps, next) { - if (!map->vma) - continue; - if (!map->is_mapped) + for (i=0; i < map->count; i++) { + if (!map->pages[i].page) continue; - if (map->vma->vm_start >= end) - continue; - if (map->vma->vm_end <= start) - continue; - mstart = max(start, map->vma->vm_start); - mend = min(end, map->vma->vm_end); - if (debug) - printk("%s: map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n", - __FUNCTION__, map->index, map->count, - map->vma->vm_start, map->vma->vm_end, - start, end, mstart, mend); - err = unmap_grant_pages(map, - (mstart - map->vma->vm_start) >> PAGE_SHIFT, - (mend - mstart) >> PAGE_SHIFT); + + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(map->pages[i].page)); + gnttab_set_unmap_op(&unmap_op, mfn, flags, map->pages[i].handle); + err = HYPERVISOR_grant_table_op( + GNTTABOP_unmap_grant_ref, &unmap_op, 1); WARN_ON(err); + WARN_ON(unmap_op.status); } - spin_unlock(&priv->lock); -} - -static void mn_invl_page(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long address) -{ - mn_invl_range_start(mn, mm, address, address + PAGE_SIZE); } -static void mn_release(struct mmu_notifier *mn, - struct mm_struct *mm) +static void gntdev_put_map(struct grant_map *map) { - struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); - struct grant_map *map; - int err; - - spin_lock(&priv->lock); - list_for_each_entry(map, &priv->maps, next) { - if (!map->vma) - continue; - if (debug) - printk("%s: map %d+%d (%lx %lx)\n", - __FUNCTION__, map->index, map->count, - map->vma->vm_start, map->vma->vm_end); - err = unmap_grant_pages(map, 0, map->count); - WARN_ON(err); + struct gnttab_unmap_grant_ref *unmap_ops; + int i; + if (!map) + return; + if (!atomic_dec_and_test(&map->users)) + return; + if (debug) + printk("%s: unmap %p (%d pages)\n", __FUNCTION__, map, map->count); + if (map->is_mapped) { + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * map->count, + GFP_TEMPORARY); + if (likely(unmap_ops)) { + gntdev_unmap_fast(map, unmap_ops); + kfree(unmap_ops); + } else { + gntdev_unmap_slow(map); + } + atomic_sub(map->count, &pages_mapped); } - spin_unlock(&priv->lock); + for (i=0; i < map->count; i++) + __free_page(map->pages[i].page); + kfree(map); } -struct mmu_notifier_ops gntdev_mmu_ops = { - .release = mn_release, - .invalidate_page = mn_invl_page, - .invalidate_range_start = mn_invl_range_start, -}; - -/* ------------------------------------------------------------------ */ - static int gntdev_open(struct inode *inode, struct file *flip) { struct gntdev_priv *priv; @@ -360,16 +252,6 @@ static int gntdev_open(struct inode *inode, struct file *flip) INIT_LIST_HEAD(&priv->maps); spin_lock_init(&priv->lock); - priv->limit = limit; - - priv->mm = get_task_mm(current); - if (!priv->mm) { - kfree(priv); - return -ENOMEM; - } - priv->mn.ops = &gntdev_mmu_ops; - mmu_notifier_register(&priv->mn, priv->mm); - mmput(priv->mm); flip->private_data = priv; if (debug) @@ -382,31 +264,93 @@ static int gntdev_release(struct inode *inode, struct file *flip) { struct gntdev_priv *priv = flip->private_data; struct grant_map *map; - int err; - if (debug) + if (debug) { printk("%s: priv %p\n", __FUNCTION__, priv); + gntdev_print_maps(priv, NULL, 0); + } spin_lock(&priv->lock); while (!list_empty(&priv->maps)) { map = list_entry(priv->maps.next, struct grant_map, next); - err = gntdev_del_map(map); - if (WARN_ON(err)) - gntdev_free_map(map); - + list_del(&map->next); + gntdev_put_map(map); } spin_unlock(&priv->lock); - mmu_notifier_unregister(&priv->mn, priv->mm); kfree(priv); return 0; } +static int gntdev_do_map(struct grant_map *map) +{ + int err, flags, i; + struct page* page; + phys_addr_t mfn; + struct gnttab_map_grant_ref* map_ops; + + flags = GNTMAP_host_map; + if (map->is_ro) + flags |= GNTMAP_readonly; + + err = -ENOMEM; + + if (unlikely(atomic_add_return(map->count, &pages_mapped) > limit)) { + if (debug) + printk("%s: maps full\n", __FUNCTION__); + goto out; + } + + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); + if (!map_ops) + goto out; + + for (i = 0; i < map->count; i++) { + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO); + if (unlikely(!page)) + goto out_free; + map->pages[i].page = page; + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(page)); + gnttab_set_map_op(&map_ops[i], mfn, flags, + map->pages[i].target.ref, + map->pages[i].target.domid); + } + + err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, + map_ops, map->count); + if (WARN_ON(err)) + goto out_free; + + map->is_mapped = 1; + + for (i = 0; i < map->count; i++) { + if (map_ops[i].status) { + if (debug) + printk("%s: failed map at page %d: stat=%d\n", + __FUNCTION__, i, map_ops[i].status); + __free_page(map->pages[i].page); + map->pages[i].page = NULL; + err = -EINVAL; + } else { + map->pages[i].handle = map_ops[i].handle; + } + } + +out_free: + kfree(map_ops); +out: + if (!map->is_mapped) + atomic_sub(map->count, &pages_mapped); + return err; +} + static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, - struct ioctl_gntdev_map_grant_ref __user *u) + struct ioctl_gntdev_map_grant_ref __user *u, + int delay_map) { struct ioctl_gntdev_map_grant_ref op; struct grant_map *map; + struct ioctl_gntdev_grant_ref* grants; int err; if (copy_from_user(&op, u, sizeof(op)) != 0) @@ -416,32 +360,48 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, op.count); if (unlikely(op.count <= 0)) return -EINVAL; - if (unlikely(op.count > priv->limit)) - return -EINVAL; err = -ENOMEM; - map = gntdev_alloc_map(priv, op.count); + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); + if (!grants) + goto out_fail; + + err = -EFAULT; + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) + goto out_free; + + map = gntdev_alloc_map(op.count, grants); if (!map) - return err; - if (copy_from_user(map->grants, &u->refs, - sizeof(map->grants[0]) * op.count) != 0) { - gntdev_free_map(map); - return err; + goto out_free; + + if (!delay_map) { + if (!(op.flags & GNTDEV_MAP_WRITABLE)) + map->is_ro = 1; + err = gntdev_do_map(map); + if (err) + goto out_unmap; } - spin_lock(&priv->lock); gntdev_add_map(priv, map); + op.index = map->index << PAGE_SHIFT; - spin_unlock(&priv->lock); - if (copy_to_user(u, &op, sizeof(op)) != 0) { - spin_lock(&priv->lock); - gntdev_del_map(map); - spin_unlock(&priv->lock); - gntdev_free_map(map); - return err; - } - return 0; + err = -EFAULT; + if (copy_to_user(u, &op, sizeof(op)) != 0) + goto out_remove; + + err = 0; + +out_free: + kfree(grants); +out_fail: + return err; + +out_remove: + gntdev_del_map(priv, map); +out_unmap: + gntdev_put_map(map); + goto out_free; } static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, @@ -449,21 +409,24 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, { struct ioctl_gntdev_unmap_grant_ref op; struct grant_map *map; - int err = -EINVAL; + int err = 0; if (copy_from_user(&op, u, sizeof(op)) != 0) return -EFAULT; - if (debug) - printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv, - (int)op.index, (int)op.count); spin_lock(&priv->lock); map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count); - if (map) - err = gntdev_del_map(map); + if (map) { + __gntdev_del_map(priv, map); + } else + err = -EINVAL; spin_unlock(&priv->lock); - if (!err) - gntdev_free_map(map); + + if (debug) + printk("%s: priv %p, del %d+%d = %p\n", __FUNCTION__, priv, + (int)op.index, (int)op.count, map); + + gntdev_put_map(map); return err; } @@ -471,6 +434,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, struct ioctl_gntdev_get_offset_for_vaddr __user *u) { struct ioctl_gntdev_get_offset_for_vaddr op; + struct vm_area_struct *vma; struct grant_map *map; if (copy_from_user(&op, u, sizeof(op)) != 0) @@ -479,40 +443,22 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv, (unsigned long)op.vaddr); - spin_lock(&priv->lock); - map = gntdev_find_map_vaddr(priv, op.vaddr); - if (map == NULL || - map->vma->vm_start != op.vaddr) { - spin_unlock(&priv->lock); + vma = find_vma(current->mm, op.vaddr); + if (!vma) return -EINVAL; - } + + map = vma->vm_private_data; + if (vma->vm_ops != &gntdev_vmops || !map) + return -EINVAL; + op.offset = map->index << PAGE_SHIFT; op.count = map->count; - spin_unlock(&priv->lock); if (copy_to_user(u, &op, sizeof(op)) != 0) return -EFAULT; return 0; } -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv, - struct ioctl_gntdev_set_max_grants __user *u) -{ - struct ioctl_gntdev_set_max_grants op; - - if (copy_from_user(&op, u, sizeof(op)) != 0) - return -EFAULT; - if (debug) - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count); - if (op.count > limit) - return -EINVAL; - - spin_lock(&priv->lock); - priv->limit = op.count; - spin_unlock(&priv->lock); - return 0; -} - static long gntdev_ioctl(struct file *flip, unsigned int cmd, unsigned long arg) { @@ -521,7 +467,7 @@ static long gntdev_ioctl(struct file *flip, switch (cmd) { case IOCTL_GNTDEV_MAP_GRANT_REF: - return gntdev_ioctl_map_grant_ref(priv, ptr); + return gntdev_ioctl_map_grant_ref(priv, ptr, 1); case IOCTL_GNTDEV_UNMAP_GRANT_REF: return gntdev_ioctl_unmap_grant_ref(priv, ptr); @@ -529,8 +475,8 @@ static long gntdev_ioctl(struct file *flip, case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); - case IOCTL_GNTDEV_SET_MAX_GRANTS: - return gntdev_ioctl_set_max_grants(priv, ptr); + case IOCTL_GNTDEV_MAP_GRANT_REF_2: + return gntdev_ioctl_map_grant_ref(priv, ptr, 0); default: if (debug) @@ -542,6 +488,34 @@ static long gntdev_ioctl(struct file *flip, return 0; } +static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct grant_map *map = vma->vm_private_data; + pgoff_t pgoff = vmf->pgoff - vma->vm_pgoff; + + if (!map || !map->is_mapped || pgoff < 0 || pgoff > map->count) { + if (debug) + printk("%s: vaddr %p, pgoff %ld (shouldn''t happen)\n", + __FUNCTION__, vmf->virtual_address, pgoff); + return VM_FAULT_SIGBUS; + } + + vmf->page = map->pages[pgoff].page; + get_page(vmf->page); + return 0; +} + +static void gntdev_vma_close(struct vm_area_struct *vma) +{ + struct grant_map *map = vma->vm_private_data; + gntdev_put_map(map); +} + +static struct vm_operations_struct gntdev_vmops = { + .fault = gntdev_vma_fault, + .close = gntdev_vma_close, +}; + static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) { struct gntdev_priv *priv = flip->private_data; @@ -550,53 +524,39 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) struct grant_map *map; int err = -EINVAL; - if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED)) + if (!(vma->vm_flags & VM_SHARED)) return -EINVAL; - if (debug) - printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__, - index, count, vma->vm_start, vma->vm_pgoff); - spin_lock(&priv->lock); map = gntdev_find_map_index(priv, index, count); + + if (debug) + printk("%s: map %d+%d at %lx (priv %p, map %p)\n", __func__, + index, count, vma->vm_start, priv, map); + if (!map) goto unlock_out; - if (map->vma) - goto unlock_out; - if (priv->mm != vma->vm_mm) { - printk("%s: Huh? Other mm?\n", __FUNCTION__); - goto unlock_out; + + if (!map->is_mapped) { + map->is_ro = !(vma->vm_flags & VM_WRITE); + err = gntdev_do_map(map); + if (err) + goto unlock_out; } + if ((vma->vm_flags & VM_WRITE) && map->is_ro) + goto unlock_out; + + err = 0; vma->vm_ops = &gntdev_vmops; vma->vm_flags |= VM_RESERVED; - vma->vm_flags |= VM_DONTCOPY; vma->vm_flags |= VM_DONTEXPAND; + vma->vm_flags |= VM_FOREIGN; vma->vm_private_data = map; - map->vma = vma; - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; - if (!(vma->vm_flags & VM_WRITE)) - map->flags |= GNTMAP_readonly; - - err = apply_to_page_range(vma->vm_mm, vma->vm_start, - vma->vm_end - vma->vm_start, - find_grant_ptes, map); - if (err) { - goto unlock_out; - if (debug) - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__); - } - - err = map_grant_pages(map); - if (err) { - goto unlock_out; - if (debug) - printk("%s: map_grant_pages() failure.\n", __FUNCTION__); - } - map->is_mapped = 1; + atomic_inc(&map->users); unlock_out: spin_unlock(&priv->lock); diff --git a/include/xen/gntdev.h b/include/xen/gntdev.h index 8bd1467..9df1ae3 100644 --- a/include/xen/gntdev.h +++ b/include/xen/gntdev.h @@ -47,11 +47,17 @@ struct ioctl_gntdev_grant_ref { */ #define IOCTL_GNTDEV_MAP_GRANT_REF \ _IOC(_IOC_NONE, ''G'', 0, sizeof(struct ioctl_gntdev_map_grant_ref)) +#define IOCTL_GNTDEV_MAP_GRANT_REF_2 \ +_IOC(_IOC_NONE, ''G'', 4, sizeof(struct ioctl_gntdev_map_grant_ref)) struct ioctl_gntdev_map_grant_ref { /* IN parameters */ /* The number of grants to be mapped. */ uint32_t count; - uint32_t pad; + /* Flags for this mapping */ + union { + uint32_t flags; + uint32_t pad; + }; /* OUT parameters */ /* The offset to be used on a subsequent call to mmap(). */ uint64_t index; @@ -59,6 +65,7 @@ struct ioctl_gntdev_map_grant_ref { /* Array of grant references, of size @count. */ struct ioctl_gntdev_grant_ref refs[1]; }; +#define GNTDEV_MAP_WRITABLE 0x1 /* * Removes the grant references from the mapping table of an instance of -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-03 15:38 UTC
[Xen-devel] [PATCH 2/2] xen-gntalloc: Userspace grant allocation driver
This allows a userspace application to allocate a shared page for implementing inter-domain communication or device drivers. These shared pages can be mapped using the gntdev device or by the kernel in another domain. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/Kconfig | 7 + drivers/xen/Makefile | 2 + drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++ include/xen/gntalloc.h | 68 +++++++ 4 files changed, 533 insertions(+), 0 deletions(-) create mode 100644 drivers/xen/gntalloc.c create mode 100644 include/xen/gntalloc.h diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index a9f3a8f..0be0edc 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -179,6 +179,13 @@ config XEN_GNTDEV help Allows userspace processes to map grants from other domains. +config XEN_GRANT_DEV_ALLOC + tristate "User-space grant reference allocator driver" + depends on XEN + help + Allows userspace processes to create pages with access granted + to other domains. + config XEN_S3 def_bool y depends on XEN_DOM0 && ACPI diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index ef1ea63..9814c1d 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o obj-$(CONFIG_XEN_BALLOON) += balloon.o obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/ @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o +xen-gntalloc-y := gntalloc.o diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c new file mode 100644 index 0000000..f26adfd --- /dev/null +++ b/drivers/xen/gntalloc.c @@ -0,0 +1,456 @@ +/****************************************************************************** + * gntalloc.c + * + * Device for creating grant references (in user-space) that may be shared + * with other domains. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +/* + * This driver exists to allow userspace programs in Linux to allocate kernel + * memory that will later be shared with another domain. Without this device, + * Linux userspace programs cannot create grant references. + * + * How this stuff works: + * X -> granting a page to Y + * Y -> mapping the grant from X + * + * 1. X uses the gntalloc device to allocate a page of kernel memory, P. + * 2. X creates an entry in the grant table that says domid(Y) can + * access P. + * 3. X gives the grant reference identifier, GREF, to Y. + * 4. A program in Y uses the gntdev device to map the page (owned by X + * and identified by GREF) into domain(Y) and then into the address + * space of the program. Behind the scenes, this requires a + * hypercall in which Xen modifies the host CPU page tables to + * perform the sharing -- that''s where the actual cross-domain mapping + * occurs. + * 5. A program in X mmap()s a segment of the gntalloc device that + * corresponds to the shared page. + * 6. The two userspace programs can now communicate over the shared page. + * + * + * NOTE TO USERSPACE LIBRARIES: + * The grant allocation and mmap()ing are, naturally, two separate + * operations. You set up the sharing by calling the create ioctl() and + * then the mmap(). You must tear down the sharing in the reverse order + * (munmap() and then the destroy ioctl()). + * + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant + * reference, this device can be used to consume kernel memory by leaving grant + * references mapped by another domain when an application exits. Therefore, + * there is a global limit on the number of pages that can be allocated. When + * all references to the page are unmapped, it will be freed during the next + * grant operation. + */ + +#include <asm/atomic.h> +#include <linux/module.h> +#include <linux/miscdevice.h> +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/slab.h> +#include <linux/fs.h> +#include <linux/device.h> +#include <linux/mm.h> +#include <asm/uaccess.h> +#include <linux/types.h> +#include <linux/list.h> + +#include <xen/xen.h> +#include <xen/page.h> +#include <xen/grant_table.h> +#include <xen/gntalloc.h> + +static int debug = 0; +module_param(debug, int, 0644); + +static int limit = 1024; +module_param(limit, int, 0644); + +static LIST_HEAD(gref_list); +static DEFINE_SPINLOCK(gref_lock); +static int gref_size = 0; + +/* Metadata on a grant reference. */ +struct gntalloc_gref { + struct list_head next_all; /* list entry gref_list */ + struct list_head next_file; /* list entry file->list, if open */ + domid_t foreign_domid; /* The ID of the domain to share with. */ + grant_ref_t gref_id; /* The grant reference number. */ + unsigned int users; /* Use count - when zero, waiting on Xen */ + struct page* page; /* The shared page. */ +}; + +struct gntalloc_file_private_data { + struct list_head list; +}; + +static void __del_gref(struct gntalloc_gref *gref); + +static void do_cleanup(void) +{ + struct gntalloc_gref *gref, *n; + list_for_each_entry_safe(gref, n, &gref_list, next_all) { + if (!gref->users) + __del_gref(gref); + } +} + + +static int add_gref(domid_t foreign_domid, uint32_t readonly, + struct gntalloc_file_private_data *priv) +{ + int rc; + struct gntalloc_gref *gref; + + rc = -ENOMEM; + spin_lock(&gref_lock); + do_cleanup(); + if (gref_size >= limit) { + spin_unlock(&gref_lock); + rc = -ENOSPC; + goto out; + } + gref_size++; + spin_unlock(&gref_lock); + + gref = kzalloc(sizeof(*gref), GFP_KERNEL); + if (!gref) + goto out; + + gref->foreign_domid = foreign_domid; + gref->users = 1; + + /* Allocate the page to share. */ + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO); + if (!gref->page) + goto out_nopage; + + /* Grant foreign access to the page. */ + gref->gref_id = gnttab_grant_foreign_access(foreign_domid, + pfn_to_mfn(page_to_pfn(gref->page)), readonly); + if (gref->gref_id < 0) { + printk(KERN_ERR "%s: failed to grant foreign access for mfn " + "%lu to domain %u\n", __func__, + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid); + rc = -EFAULT; + goto out_no_foreign_gref; + } + + /* Add to gref lists. */ + spin_lock(&gref_lock); + list_add_tail(&gref->next_all, &gref_list); + list_add_tail(&gref->next_file, &priv->list); + spin_unlock(&gref_lock); + + return gref->gref_id; + +out_no_foreign_gref: + __free_page(gref->page); +out_nopage: + kfree(gref); +out: + return rc; +} + +static void __del_gref(struct gntalloc_gref *gref) +{ + if (gnttab_query_foreign_access(gref->gref_id)) + return; + + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0)) + return; + + gref_size--; + list_del(&gref->next_all); + + __free_page(gref->page); + kfree(gref); +} + +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv, + grant_ref_t gref_id) +{ + struct gntalloc_gref *gref; + list_for_each_entry(gref, &priv->list, next_file) { + if (gref->gref_id == gref_id) + return gref; + } + return NULL; +} + +/* + * ------------------------------------- + * File operations. + * ------------------------------------- + */ +static int gntalloc_open(struct inode *inode, struct file *filp) +{ + struct gntalloc_file_private_data *priv; + + try_module_get(THIS_MODULE); + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + goto out_nomem; + INIT_LIST_HEAD(&priv->list); + + filp->private_data = priv; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + return 0; + +out_nomem: + return -ENOMEM; +} + +static int gntalloc_release(struct inode *inode, struct file *filp) +{ + struct gntalloc_file_private_data *priv = filp->private_data; + struct gntalloc_gref *gref; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + spin_lock(&gref_lock); + while (!list_empty(&priv->list)) { + gref = list_entry(priv->list.next, + struct gntalloc_gref, next_file); + list_del(&gref->next_file); + gref->users--; + if (gref->users == 0) + __del_gref(gref); + } + kfree(priv); + spin_unlock(&gref_lock); + + module_put(THIS_MODULE); + + return 0; +} + +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, + void __user *arg) +{ + int rc = 0; + struct ioctl_gntalloc_alloc_gref op; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + if (copy_from_user(&op, arg, sizeof(op))) { + rc = -EFAULT; + goto alloc_grant_out; + } + rc = add_gref(op.foreign_domid, op.readonly, priv); + if (rc < 0) + goto alloc_grant_out; + + op.gref_id = rc; + op.page_idx = rc; + + rc = 0; + + if (copy_to_user((void __user *)arg, &op, sizeof(op))) { + rc = -EFAULT; + goto alloc_grant_out; + } + +alloc_grant_out: + return rc; +} + +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, + void __user *arg) +{ + int rc = 0; + struct ioctl_gntalloc_dealloc_gref op; + struct gntalloc_gref *gref; + + if (debug) + printk("%s: priv %p\n", __FUNCTION__, priv); + + if (copy_from_user(&op, arg, sizeof(op))) { + rc = -EFAULT; + goto dealloc_grant_out; + } + + spin_lock(&gref_lock); + gref = find_gref(priv, op.gref_id); + if (gref) { + list_del(&gref->next_file); + gref->users--; + rc = 0; + } else { + rc = -EINVAL; + } + + do_cleanup(); + spin_unlock(&gref_lock); +dealloc_grant_out: + return rc; +} + +static long gntalloc_ioctl(struct file *filp, unsigned int cmd, + unsigned long arg) +{ + struct gntalloc_file_private_data *priv = filp->private_data; + + switch (cmd) { + case IOCTL_GNTALLOC_ALLOC_GREF: + return gntalloc_ioctl_alloc(priv, (void __user*)arg); + + case IOCTL_GNTALLOC_DEALLOC_GREF: + return gntalloc_ioctl_dealloc(priv, (void __user*)arg); + + default: + return -ENOIOCTLCMD; + } + + return 0; +} + +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct gntalloc_gref *gref = vma->vm_private_data; + if (!gref) + return VM_FAULT_SIGBUS; + + vmf->page = gref->page; + get_page(vmf->page); + + return 0; +}; + +static void gntalloc_vma_close(struct vm_area_struct *vma) +{ + struct gntalloc_gref *gref = vma->vm_private_data; + if (!gref) + return; + + spin_lock(&gref_lock); + gref->users--; + if (gref->users == 0) + __del_gref(gref); + spin_unlock(&gref_lock); +} + +static struct vm_operations_struct gntalloc_vmops = { + .fault = gntalloc_vma_fault, + .close = gntalloc_vma_close, +}; + +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct gntalloc_file_private_data *priv = filp->private_data; + struct gntalloc_gref *gref; + + if (debug) + printk("%s: priv %p, page %lu\n", __func__, + priv, vma->vm_pgoff); + + /* + * There is a 1-to-1 correspondence of grant references to shared + * pages, so it only makes sense to map exactly one page per + * call to mmap(). + */ + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) { + printk(KERN_ERR "%s: Only one page can be memory-mapped " + "per grant reference.\n", __func__); + return -EINVAL; + } + + if (!(vma->vm_flags & VM_SHARED)) { + printk(KERN_ERR "%s: Mapping must be shared.\n", + __func__); + return -EINVAL; + } + + spin_lock(&gref_lock); + gref = find_gref(priv, vma->vm_pgoff); + if (gref == NULL) { + spin_unlock(&gref_lock); + printk(KERN_ERR "%s: Could not find a grant reference with " + "page index %lu.\n", __func__, vma->vm_pgoff); + return -ENOENT; + } + gref->users++; + spin_unlock(&gref_lock); + + vma->vm_private_data = gref; + + /* This flag prevents Bad PTE errors when the memory is unmapped. */ + vma->vm_flags |= VM_RESERVED; + vma->vm_flags |= VM_DONTCOPY; + vma->vm_flags |= VM_IO; + + vma->vm_ops = &gntalloc_vmops; + + return 0; +} + +static const struct file_operations gntalloc_fops = { + .owner = THIS_MODULE, + .open = gntalloc_open, + .release = gntalloc_release, + .unlocked_ioctl = gntalloc_ioctl, + .mmap = gntalloc_mmap +}; + +/* + * ------------------------------------- + * Module creation/destruction. + * ------------------------------------- + */ +static struct miscdevice gntalloc_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "xen/gntalloc", + .fops = &gntalloc_fops, +}; + +static int __init gntalloc_init(void) +{ + int err; + + if (!xen_domain()) { + if (debug) + printk(KERN_ERR "gntalloc: You must be running Xen\n"); + return -ENODEV; + } + + err = misc_register(&gntalloc_miscdev); + if (err != 0) { + printk(KERN_ERR "Could not register misc gntalloc device\n"); + return err; + } + + if (debug) + printk(KERN_INFO "Created grant allocation device at %d,%d\n", + MISC_MAJOR, gntalloc_miscdev.minor); + + return 0; +} + +static void __exit gntalloc_exit(void) +{ + misc_deregister(&gntalloc_miscdev); +} + +module_init(gntalloc_init); +module_exit(gntalloc_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, " + "Daniel De Graaf <dgdegra@tycho.nsa.gov>"); +MODULE_DESCRIPTION("User-space grant reference allocator driver"); diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h new file mode 100644 index 0000000..76b70d7 --- /dev/null +++ b/include/xen/gntalloc.h @@ -0,0 +1,68 @@ +/****************************************************************************** + * gntalloc.h + * + * Interface to /dev/xen/gntalloc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef __LINUX_PUBLIC_GNTALLOC_H__ +#define __LINUX_PUBLIC_GNTALLOC_H__ + +/* + * Allocates a new page and creates a new grant reference. + * + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it''s the + * page number and not an actual address. It must be shifted again prior + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT). + */ +#define IOCTL_GNTALLOC_ALLOC_GREF \ +_IOC(_IOC_NONE, ''G'', 1, sizeof(struct ioctl_gntalloc_alloc_gref)) +struct ioctl_gntalloc_alloc_gref { + /* IN parameters */ + /* The ID of the domain creating the grant reference. */ + domid_t owner_domid; + /* The ID of the domain to be given access to the grant. */ + domid_t foreign_domid; + /* The type of access given to domid. */ + uint32_t readonly; + /* OUT parameters */ + /* The grant reference of the newly created grant. */ + grant_ref_t gref_id; + /* The page index (page number, NOT address) for grant mmap(). */ + uint32_t page_idx; +}; + +/* + * Deallocates the grant reference, freeing the associated page. + */ +#define IOCTL_GNTALLOC_DEALLOC_GREF \ +_IOC(_IOC_NONE, ''G'', 2, sizeof(struct ioctl_gntalloc_dealloc_gref)) +struct ioctl_gntalloc_dealloc_gref { + /* IN parameter */ + /* The grant reference to deallocate. */ + grant_ref_t gref_id; +}; +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */ -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Dec-03 16:30 UTC
Re: [Xen-devel] [PATCH 0/2] Userspace grant communication
On Fri, Dec 03, 2010 at 10:36:17AM -0500, Daniel De Graaf wrote:> For fast communication between userspace applications in different domains, > it is useful to be able to set up a shared memory page. This can be used to > implement device driver frontends and backends completely in userspace, or > as a faster alternative to network communication. The current gntdev is > limited to PV domains, and does not allow grants to be created. The following > patches change gntdev to remapping existing pages, allowing the same code > to be used in PV and HVM, and add a gntalloc driver to allow mappings to be > created by userspace. These changes also make the mappings more application- > friendly: the mmap() calls can be made multiple times, persist across fork(), > and allow the device to be closed without invalidating the mapped areas. This > matches the behavior of mmap() on a normal file. >Btw are you aware of the new fast inter-domain communication method in XenClient ? -- Pasi> API changes from the existing /dev/xen/gntdev: > > The unused "pad" field in ioctl_gntdev_map_grant_ref is now used for flags > on the mapping (currently used to specify if the mapping should be writable). > This provides sufficient information to perform the mapping when the ioctl is > called. To retain compatibility with current userspace, a new ioctl number is > used for this functionality and the legacy error on first mapping is retained > when the old ioctl is used. > > IOCTL_GNTDEV_SET_MAX_GRANTS is not exposed in the Xen userspace libraries, > and is not very useful: it cannot be used to raise the limit of grants per > file descriptor, and is trivial to bypass by opening the device multiple > times. This version uses a global limit specified as a module parameter > (modifiable at runtime via sysfs). > > -- > Daniel De Graaf > National Security Agency > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-03 18:27 UTC
Re: [Xen-devel] [PATCH 0/2] Userspace grant communication
On 12/03/2010 11:30 AM, Pasi Kärkkäinen wrote:> On Fri, Dec 03, 2010 at 10:36:17AM -0500, Daniel De Graaf wrote: >> For fast communication between userspace applications in different domains, >> it is useful to be able to set up a shared memory page. This can be used to >> implement device driver frontends and backends completely in userspace, or >> as a faster alternative to network communication. The current gntdev is >> limited to PV domains, and does not allow grants to be created. The following >> patches change gntdev to remapping existing pages, allowing the same code >> to be used in PV and HVM, and add a gntalloc driver to allow mappings to be >> created by userspace. These changes also make the mappings more application- >> friendly: the mmap() calls can be made multiple times, persist across fork(), >> and allow the device to be closed without invalidating the mapped areas. This >> matches the behavior of mmap() on a normal file. >> > > Btw are you aware of the new fast inter-domain communication method in XenClient ? > > -- Pasi >No, I have not looked at that; is there a whitepaper or some information on it? If not, a pointer to the relevant source would be of interest. - Daniel> >> API changes from the existing /dev/xen/gntdev: >> >> The unused "pad" field in ioctl_gntdev_map_grant_ref is now used for flags >> on the mapping (currently used to specify if the mapping should be writable). >> This provides sufficient information to perform the mapping when the ioctl is >> called. To retain compatibility with current userspace, a new ioctl number is >> used for this functionality and the legacy error on first mapping is retained >> when the old ioctl is used. >> >> IOCTL_GNTDEV_SET_MAX_GRANTS is not exposed in the Xen userspace libraries, >> and is not very useful: it cannot be used to raise the limit of grants per >> file descriptor, and is trivial to bypass by opening the device multiple >> times. This version uses a global limit specified as a module parameter >> (modifiable at runtime via sysfs). >> >> -- >> Daniel De Graaf >> National Security Agency >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Dec-03 18:39 UTC
Re: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication
On Fri, Dec 03, 2010 at 01:27:54PM -0500, Daniel De Graaf wrote:> On 12/03/2010 11:30 AM, Pasi Kärkkäinen wrote: > > On Fri, Dec 03, 2010 at 10:36:17AM -0500, Daniel De Graaf wrote: > >> For fast communication between userspace applications in different domains, > >> it is useful to be able to set up a shared memory page. This can be used to > >> implement device driver frontends and backends completely in userspace, or > >> as a faster alternative to network communication. The current gntdev is > >> limited to PV domains, and does not allow grants to be created. The following > >> patches change gntdev to remapping existing pages, allowing the same code > >> to be used in PV and HVM, and add a gntalloc driver to allow mappings to be > >> created by userspace. These changes also make the mappings more application- > >> friendly: the mmap() calls can be made multiple times, persist across fork(), > >> and allow the device to be closed without invalidating the mapped areas. This > >> matches the behavior of mmap() on a normal file. > >> > > > > Btw are you aware of the new fast inter-domain communication method in XenClient ? > > > > -- Pasi > > > > No, I have not looked at that; is there a whitepaper or some information on it? > If not, a pointer to the relevant source would be of interest. >Unfortunately I can''t remember the name of the feature.. hopefully someone from Citrix working with XCI can help out. -- Pasi> - Daniel > > > > >> API changes from the existing /dev/xen/gntdev: > >> > >> The unused "pad" field in ioctl_gntdev_map_grant_ref is now used for flags > >> on the mapping (currently used to specify if the mapping should be writable). > >> This provides sufficient information to perform the mapping when the ioctl is > >> called. To retain compatibility with current userspace, a new ioctl number is > >> used for this functionality and the legacy error on first mapping is retained > >> when the old ioctl is used. > >> > >> IOCTL_GNTDEV_SET_MAX_GRANTS is not exposed in the Xen userspace libraries, > >> and is not very useful: it cannot be used to raise the limit of grants per > >> file descriptor, and is trivial to bypass by opening the device multiple > >> times. This version uses a global limit specified as a module parameter > >> (modifiable at runtime via sysfs). > >> > >> -- > >> Daniel De Graaf > >> National Security Agency > >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2010-Dec-04 19:54 UTC
RE: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication
> > > Btw are you aware of the new fast inter-domain communication method in > XenClient ? > > > > No, I have not looked at that; is there a whitepaper or some information > on it? > > If not, a pointer to the relevant source would be of interest.It''s called "V4V". The source is on the XenClient source iso available from citrix.com. Last I saw, the patches needed a little polish before being suitable for acceptance upstream, but were pretty close. There are xen patches, linux kernel patches, and user space libraries providing a socket-like API (for linux and windows). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Dec-04 20:32 UTC
Re: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication
On Sat, Dec 04, 2010 at 07:54:46PM +0000, Ian Pratt wrote:> > > > Btw are you aware of the new fast inter-domain communication method in > > XenClient ? > > > > > > No, I have not looked at that; is there a whitepaper or some information > > on it? > > > If not, a pointer to the relevant source would be of interest. > > It''s called "V4V". The source is on the XenClient source iso available from citrix.com. Last I saw, the patches needed a little polish before being suitable for acceptance upstream, but were pretty close. There are xen patches, linux kernel patches, and user space libraries providing a socket-like API (for linux and windows). >Thanks :) that was it. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy G Tolstov
2010-Dec-04 21:50 UTC
RE: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication
On Sat, 2010-12-04 at 19:54 +0000, Ian Pratt wrote:> > > > Btw are you aware of the new fast inter-domain communication method in > > XenClient ? > > > > > > No, I have not looked at that; is there a whitepaper or some information > > on it? > > > If not, a pointer to the relevant source would be of interest. > > It''s called "V4V". The source is on the XenClient source iso available > from citrix.com. Last I saw, the patches needed a little polish before > being suitable for acceptance upstream, but were pretty close. There > are xen patches, linux kernel patches, and user space libraries > providing a socket-like API (for linux and windows). > > IanDoes it possible to use this to communicati from dom0 to domU in socket-like interface? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ross Philipson
2010-Dec-04 22:04 UTC
RE: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication
Yes, that is one of the ways it can be used. Ross ________________________________________ From: xen-devel-bounces@lists.xensource.com [xen-devel-bounces@lists.xensource.com] On Behalf Of Vasiliy G Tolstov [v.tolstov@selfip.ru] Sent: Saturday, December 04, 2010 4:50 PM To: Ian Pratt Cc: Daniel De Graaf; xen-devel Subject: RE: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication On Sat, 2010-12-04 at 19:54 +0000, Ian Pratt wrote:> > > > Btw are you aware of the new fast inter-domain communication method in > > XenClient ? > > > > > > No, I have not looked at that; is there a whitepaper or some information > > on it? > > > If not, a pointer to the relevant source would be of interest. > > It''s called "V4V". The source is on the XenClient source iso available > from citrix.com. Last I saw, the patches needed a little polish before > being suitable for acceptance upstream, but were pretty close. There > are xen patches, linux kernel patches, and user space libraries > providing a socket-like API (for linux and windows). > > IanDoes it possible to use this to communicati from dom0 to domU in socket-like interface? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2010-Dec-07 00:30 UTC
RE: [Xen-devel] [PATCH 0/2] Userspace grant communication / XenClient (XCI) inter-domain communication
> > It's called "V4V". The source is on the XenClient source iso available > > from citrix.com. Last I saw, the patches needed a little polish before > > being suitable for acceptance upstream, but were pretty close. There > > are xen patches, linux kernel patches, and user space libraries > > providing a socket-like API (for linux and windows). > > Does it possible to use this to communicati from dom0 to domU in > socket-like interface?Yes, or guest to guest etc. Bandwidth is pretty good, better than bouncing packets via a dom0 bridge. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-08 16:40 UTC
Re: [Xen-devel] [PATCH 1/2] xen-gntdev: support mapping in HVM domains
Hi Daniel, On Fri, 2010-12-03 at 15:37 +0000, Daniel De Graaf wrote:> This changes the /dev/xen/gntdev device to work in HVM domains, and > also makes the mmap() behavior more closely match the behavior of > files instead of requiring an ioctl() call for each mmap/munmap.This patch seems to contain a lot more than the above very brief description would suggest. As well as those two changes it looks to include various refactoring and code-motion, some clean up and other changes. Please could you split it up into a series of separate logical single steps to get from the current driver to your final destination, this would make it much easier to review, as it stands it''s almost impossible to make any sensible comments on it or its correctness. The change to support HVM and the change in mmap/ioctl behaviour should certainly be separate patches but there seems to also be a some data structure refactoring going on, a change from using the mmu notifiers to something else, some sort of lazy mapping functionality etc these should all get their own individual patches with a changelog describing what they do and why. Please can you also go into more detail in the relevant patch on the new ioctl/mmap semantics rather than just mentioning that you''ve changed them. Do you retain compatibility with the old behaviour? Is there a corresponding toolstack change which makes use of this functionality? Thanks, Ian.> > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/Kconfig | 3 +- > drivers/xen/gntdev.c | 612 +++++++++++++++++++++++--------------------------- > include/xen/gntdev.h | 9 +- > 3 files changed, 295 insertions(+), 329 deletions(-) > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > index fa9982e..a9f3a8f 100644 > --- a/drivers/xen/Kconfig > +++ b/drivers/xen/Kconfig > @@ -176,9 +176,8 @@ config XEN_XENBUS_FRONTEND > config XEN_GNTDEV > tristate "userspace grant access device driver" > depends on XEN > - select MMU_NOTIFIER > help > - Allows userspace processes use grants. > + Allows userspace processes to map grants from other domains. > > config XEN_S3 > def_bool y > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > index a33e443..15f5c9c 100644 > --- a/drivers/xen/gntdev.c > +++ b/drivers/xen/gntdev.c > @@ -44,31 +44,37 @@ MODULE_DESCRIPTION("User-space granted page access driver"); > > static int debug = 0; > module_param(debug, int, 0644); > -static int limit = 1024; > + > +static int limit = 1024*1024; > module_param(limit, int, 0644); > > +static atomic_t pages_mapped = ATOMIC_INIT(0); > + > struct gntdev_priv { > - struct list_head maps; > - uint32_t used; > - uint32_t limit; > spinlock_t lock; > - struct mm_struct *mm; > - struct mmu_notifier mn; > + struct list_head maps; > +}; > + > +struct granted_page { > + struct page* page; > + union { > + struct ioctl_gntdev_grant_ref target; > + grant_handle_t handle; > + }; > }; > > struct grant_map { > - struct list_head next; > - struct gntdev_priv *priv; > - struct vm_area_struct *vma; > - int index; > - int count; > - int flags; > - int is_mapped; > - struct ioctl_gntdev_grant_ref *grants; > - struct gnttab_map_grant_ref *map_ops; > - struct gnttab_unmap_grant_ref *unmap_ops; > + struct list_head next; /* next in file */ > + int index; /* offset in parent */ > + int count; /* size in pages */ > + atomic_t users; /* reference count */ > + unsigned int is_mapped:1; /* has map hypercall been run? */ > + unsigned int is_ro:1; /* is the map read-only? */ > + struct granted_page pages[0]; /* pages used for mapping */ > }; > > +static struct vm_operations_struct gntdev_vmops; > + > /* ------------------------------------------------------------------ */ > > static void gntdev_print_maps(struct gntdev_priv *priv, > @@ -76,51 +82,46 @@ static void gntdev_print_maps(struct gntdev_priv *priv, > { > struct grant_map *map; > > - printk("%s: maps list (priv %p, usage %d/%d)\n", > - __FUNCTION__, priv, priv->used, priv->limit); > - list_for_each_entry(map, &priv->maps, next) > - printk(" index %2d, count %2d %s\n", > - map->index, map->count, > + printk("%s: maps list (priv %p)\n", __FUNCTION__, priv); > + list_for_each_entry(map, &priv->maps, next) { > + printk(" %p: %2d+%2d, r%c, %s %d,%d %s\n", map, > + map->index, map->count, map->is_ro ? ''o'' : ''w'', > + map->is_mapped ? "use,hnd" : "dom,ref", > + map->is_mapped ? atomic_read(&map->users) > + : map->pages[0].target.domid, > + map->is_mapped ? map->pages[0].handle > + : map->pages[0].target.ref, > map->index == text_index && text ? text : ""); > + } > } > > -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) > +static struct grant_map *gntdev_alloc_map(int count, > + struct ioctl_gntdev_grant_ref* grants) > { > struct grant_map *add; > + int i; > > - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL); > - if (NULL == add) > + add = kzalloc(sizeof(struct grant_map) + > + sizeof(struct granted_page) * count, GFP_KERNEL); > + if (!add) > return NULL; > > - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL); > - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL); > - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL); > - if (NULL == add->grants || > - NULL == add->map_ops || > - NULL == add->unmap_ops) > - goto err; > - > - add->index = 0; > + atomic_set(&add->users, 1); > add->count = count; > - add->priv = priv; > > - if (add->count + priv->used > priv->limit) > - goto err; > + for (i = 0; i < count; i++) > + add->pages[i].target = grants[i]; > > return add; > - > -err: > - kfree(add->grants); > - kfree(add->map_ops); > - kfree(add->unmap_ops); > - kfree(add); > - return NULL; > } > > static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > { > struct grant_map *map; > > + spin_lock(&priv->lock); > + > + /* Try to fit in the new mapping as early as possible */ > list_for_each_entry(map, &priv->maps, next) { > if (add->index + add->count < map->index) { > list_add_tail(&add->next, &map->next); > @@ -131,225 +132,116 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add) > list_add_tail(&add->next, &priv->maps); > > done: > - priv->used += add->count; > if (debug) > gntdev_print_maps(priv, "[new]", add->index); > + > + spin_unlock(&priv->lock); > } > > -static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, > - int count) > +static void __gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) > { > - struct grant_map *map; > + list_del(&map->next); > +} > > - list_for_each_entry(map, &priv->maps, next) { > - if (map->index != index) > - continue; > - if (map->count != count) > - continue; > - return map; > - } > - return NULL; > +static void gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map) > +{ > + spin_lock(&priv->lock); > + __gntdev_del_map(priv, map); > + spin_unlock(&priv->lock); > } > > -static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv, > - unsigned long vaddr) > +static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index, > + int count) > { > struct grant_map *map; > > list_for_each_entry(map, &priv->maps, next) { > - if (!map->vma) > - continue; > - if (vaddr < map->vma->vm_start) > + if (map->index != index) > continue; > - if (vaddr >= map->vma->vm_end) > + if (map->count != count) > continue; > return map; > } > return NULL; > } > > -static int gntdev_del_map(struct grant_map *map) > -{ > - int i; > - > - if (map->vma) > - return -EBUSY; > - for (i = 0; i < map->count; i++) > - if (map->unmap_ops[i].handle) > - return -EBUSY; > - > - map->priv->used -= map->count; > - list_del(&map->next); > - return 0; > -} > - > -static void gntdev_free_map(struct grant_map *map) > -{ > - if (!map) > - return; > - kfree(map->grants); > - kfree(map->map_ops); > - kfree(map->unmap_ops); > - kfree(map); > -} > - > -/* ------------------------------------------------------------------ */ > - > -static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void *data) > -{ > - struct grant_map *map = data; > - unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT; > - u64 pte_maddr; > - > - BUG_ON(pgnr >= map->count); > - pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT; > - pte_maddr += (unsigned long)pte & ~PAGE_MASK; > - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags, > - map->grants[pgnr].ref, > - map->grants[pgnr].domid); > - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags, > - 0 /* handle */); > - return 0; > -} > - > -static int map_grant_pages(struct grant_map *map) > +static void gntdev_unmap_fast(struct grant_map *map, > + struct gnttab_unmap_grant_ref *unmap_ops) > { > - int i, err = 0; > + int err, flags, i, unmap_size = 0; > + phys_addr_t mfn; > > - if (debug) > - printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count); > - err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, > - map->map_ops, map->count); > - if (WARN_ON(err)) > - return err; > + flags = GNTMAP_host_map; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > > - for (i = 0; i < map->count; i++) { > - if (map->map_ops[i].status) > - err = -EINVAL; > - map->unmap_ops[i].handle = map->map_ops[i].handle; > + for (i=0; i < map->count; i++) { > + if (!map->pages[i].page) > + continue; > + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(map->pages[i].page)); > + gnttab_set_unmap_op(&unmap_ops[unmap_size], mfn, flags, > + map->pages[i].handle); > + unmap_size++; > } > - return err; > -} > - > -static int unmap_grant_pages(struct grant_map *map, int offset, int pages) > -{ > - int i, err = 0; > > - if (debug) > - printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__, > - map->index, map->count, offset, pages); > err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, > - map->unmap_ops + offset, pages); > - if (WARN_ON(err)) > - return err; > + unmap_ops, unmap_size); > + WARN_ON(err); > > - for (i = 0; i < pages; i++) { > - if (map->unmap_ops[offset+i].status) > - err = -EINVAL; > - map->unmap_ops[offset+i].handle = 0; > - } > - return err; > + for (i = 0; i < unmap_size; i++) > + WARN_ON(unmap_ops[i].status); > } > > -/* ------------------------------------------------------------------ */ > - > -static void gntdev_vma_close(struct vm_area_struct *vma) > +// for the out-of-memory case > +static void gntdev_unmap_slow(struct grant_map *map) > { > - struct grant_map *map = vma->vm_private_data; > + int err, flags, i; > + phys_addr_t mfn; > + struct gnttab_unmap_grant_ref unmap_op; > > - if (debug) > - printk("%s\n", __FUNCTION__); > - map->is_mapped = 0; > - map->vma = NULL; > - vma->vm_private_data = NULL; > -} > + flags = GNTMAP_host_map; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > > -static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > -{ > - if (debug) > - printk("%s: vaddr %p, pgoff %ld (shouldn''t happen)\n", > - __FUNCTION__, vmf->virtual_address, vmf->pgoff); > - vmf->flags = VM_FAULT_ERROR; > - return 0; > -} > - > -static struct vm_operations_struct gntdev_vmops = { > - .close = gntdev_vma_close, > - .fault = gntdev_vma_fault, > -}; > - > -/* ------------------------------------------------------------------ */ > - > -static void mn_invl_range_start(struct mmu_notifier *mn, > - struct mm_struct *mm, > - unsigned long start, unsigned long end) > -{ > - struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); > - struct grant_map *map; > - unsigned long mstart, mend; > - int err; > - > - spin_lock(&priv->lock); > - list_for_each_entry(map, &priv->maps, next) { > - if (!map->vma) > - continue; > - if (!map->is_mapped) > + for (i=0; i < map->count; i++) { > + if (!map->pages[i].page) > continue; > - if (map->vma->vm_start >= end) > - continue; > - if (map->vma->vm_end <= start) > - continue; > - mstart = max(start, map->vma->vm_start); > - mend = min(end, map->vma->vm_end); > - if (debug) > - printk("%s: map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n", > - __FUNCTION__, map->index, map->count, > - map->vma->vm_start, map->vma->vm_end, > - start, end, mstart, mend); > - err = unmap_grant_pages(map, > - (mstart - map->vma->vm_start) >> PAGE_SHIFT, > - (mend - mstart) >> PAGE_SHIFT); > + > + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(map->pages[i].page)); > + gnttab_set_unmap_op(&unmap_op, mfn, flags, map->pages[i].handle); > + err = HYPERVISOR_grant_table_op( > + GNTTABOP_unmap_grant_ref, &unmap_op, 1); > WARN_ON(err); > + WARN_ON(unmap_op.status); > } > - spin_unlock(&priv->lock); > -} > - > -static void mn_invl_page(struct mmu_notifier *mn, > - struct mm_struct *mm, > - unsigned long address) > -{ > - mn_invl_range_start(mn, mm, address, address + PAGE_SIZE); > } > > -static void mn_release(struct mmu_notifier *mn, > - struct mm_struct *mm) > +static void gntdev_put_map(struct grant_map *map) > { > - struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); > - struct grant_map *map; > - int err; > - > - spin_lock(&priv->lock); > - list_for_each_entry(map, &priv->maps, next) { > - if (!map->vma) > - continue; > - if (debug) > - printk("%s: map %d+%d (%lx %lx)\n", > - __FUNCTION__, map->index, map->count, > - map->vma->vm_start, map->vma->vm_end); > - err = unmap_grant_pages(map, 0, map->count); > - WARN_ON(err); > + struct gnttab_unmap_grant_ref *unmap_ops; > + int i; > + if (!map) > + return; > + if (!atomic_dec_and_test(&map->users)) > + return; > + if (debug) > + printk("%s: unmap %p (%d pages)\n", __FUNCTION__, map, map->count); > + if (map->is_mapped) { > + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * map->count, > + GFP_TEMPORARY); > + if (likely(unmap_ops)) { > + gntdev_unmap_fast(map, unmap_ops); > + kfree(unmap_ops); > + } else { > + gntdev_unmap_slow(map); > + } > + atomic_sub(map->count, &pages_mapped); > } > - spin_unlock(&priv->lock); > + for (i=0; i < map->count; i++) > + __free_page(map->pages[i].page); > + kfree(map); > } > > -struct mmu_notifier_ops gntdev_mmu_ops = { > - .release = mn_release, > - .invalidate_page = mn_invl_page, > - .invalidate_range_start = mn_invl_range_start, > -}; > - > -/* ------------------------------------------------------------------ */ > - > static int gntdev_open(struct inode *inode, struct file *flip) > { > struct gntdev_priv *priv; > @@ -360,16 +252,6 @@ static int gntdev_open(struct inode *inode, struct file *flip) > > INIT_LIST_HEAD(&priv->maps); > spin_lock_init(&priv->lock); > - priv->limit = limit; > - > - priv->mm = get_task_mm(current); > - if (!priv->mm) { > - kfree(priv); > - return -ENOMEM; > - } > - priv->mn.ops = &gntdev_mmu_ops; > - mmu_notifier_register(&priv->mn, priv->mm); > - mmput(priv->mm); > > flip->private_data = priv; > if (debug) > @@ -382,31 +264,93 @@ static int gntdev_release(struct inode *inode, struct file *flip) > { > struct gntdev_priv *priv = flip->private_data; > struct grant_map *map; > - int err; > > - if (debug) > + if (debug) { > printk("%s: priv %p\n", __FUNCTION__, priv); > + gntdev_print_maps(priv, NULL, 0); > + } > > spin_lock(&priv->lock); > while (!list_empty(&priv->maps)) { > map = list_entry(priv->maps.next, struct grant_map, next); > - err = gntdev_del_map(map); > - if (WARN_ON(err)) > - gntdev_free_map(map); > - > + list_del(&map->next); > + gntdev_put_map(map); > } > spin_unlock(&priv->lock); > > - mmu_notifier_unregister(&priv->mn, priv->mm); > kfree(priv); > return 0; > } > > +static int gntdev_do_map(struct grant_map *map) > +{ > + int err, flags, i; > + struct page* page; > + phys_addr_t mfn; > + struct gnttab_map_grant_ref* map_ops; > + > + flags = GNTMAP_host_map; > + if (map->is_ro) > + flags |= GNTMAP_readonly; > + > + err = -ENOMEM; > + > + if (unlikely(atomic_add_return(map->count, &pages_mapped) > limit)) { > + if (debug) > + printk("%s: maps full\n", __FUNCTION__); > + goto out; > + } > + > + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY); > + if (!map_ops) > + goto out; > + > + for (i = 0; i < map->count; i++) { > + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO); > + if (unlikely(!page)) > + goto out_free; > + map->pages[i].page = page; > + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(page)); > + gnttab_set_map_op(&map_ops[i], mfn, flags, > + map->pages[i].target.ref, > + map->pages[i].target.domid); > + } > + > + err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, > + map_ops, map->count); > + if (WARN_ON(err)) > + goto out_free; > + > + map->is_mapped = 1; > + > + for (i = 0; i < map->count; i++) { > + if (map_ops[i].status) { > + if (debug) > + printk("%s: failed map at page %d: stat=%d\n", > + __FUNCTION__, i, map_ops[i].status); > + __free_page(map->pages[i].page); > + map->pages[i].page = NULL; > + err = -EINVAL; > + } else { > + map->pages[i].handle = map_ops[i].handle; > + } > + } > + > +out_free: > + kfree(map_ops); > +out: > + if (!map->is_mapped) > + atomic_sub(map->count, &pages_mapped); > + return err; > +} > + > static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > - struct ioctl_gntdev_map_grant_ref __user *u) > + struct ioctl_gntdev_map_grant_ref __user *u, > + int delay_map) > { > struct ioctl_gntdev_map_grant_ref op; > struct grant_map *map; > + struct ioctl_gntdev_grant_ref* grants; > int err; > > if (copy_from_user(&op, u, sizeof(op)) != 0) > @@ -416,32 +360,48 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > op.count); > if (unlikely(op.count <= 0)) > return -EINVAL; > - if (unlikely(op.count > priv->limit)) > - return -EINVAL; > > err = -ENOMEM; > - map = gntdev_alloc_map(priv, op.count); > + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY); > + if (!grants) > + goto out_fail; > + > + err = -EFAULT; > + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) > + goto out_free; > + > + map = gntdev_alloc_map(op.count, grants); > if (!map) > - return err; > - if (copy_from_user(map->grants, &u->refs, > - sizeof(map->grants[0]) * op.count) != 0) { > - gntdev_free_map(map); > - return err; > + goto out_free; > + > + if (!delay_map) { > + if (!(op.flags & GNTDEV_MAP_WRITABLE)) > + map->is_ro = 1; > + err = gntdev_do_map(map); > + if (err) > + goto out_unmap; > } > > - spin_lock(&priv->lock); > gntdev_add_map(priv, map); > + > op.index = map->index << PAGE_SHIFT; > - spin_unlock(&priv->lock); > > - if (copy_to_user(u, &op, sizeof(op)) != 0) { > - spin_lock(&priv->lock); > - gntdev_del_map(map); > - spin_unlock(&priv->lock); > - gntdev_free_map(map); > - return err; > - } > - return 0; > + err = -EFAULT; > + if (copy_to_user(u, &op, sizeof(op)) != 0) > + goto out_remove; > + > + err = 0; > + > +out_free: > + kfree(grants); > +out_fail: > + return err; > + > +out_remove: > + gntdev_del_map(priv, map); > +out_unmap: > + gntdev_put_map(map); > + goto out_free; > } > > static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, > @@ -449,21 +409,24 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, > { > struct ioctl_gntdev_unmap_grant_ref op; > struct grant_map *map; > - int err = -EINVAL; > + int err = 0; > > if (copy_from_user(&op, u, sizeof(op)) != 0) > return -EFAULT; > - if (debug) > - printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv, > - (int)op.index, (int)op.count); > > spin_lock(&priv->lock); > map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count); > - if (map) > - err = gntdev_del_map(map); > + if (map) { > + __gntdev_del_map(priv, map); > + } else > + err = -EINVAL; > spin_unlock(&priv->lock); > - if (!err) > - gntdev_free_map(map); > + > + if (debug) > + printk("%s: priv %p, del %d+%d = %p\n", __FUNCTION__, priv, > + (int)op.index, (int)op.count, map); > + > + gntdev_put_map(map); > return err; > } > > @@ -471,6 +434,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > struct ioctl_gntdev_get_offset_for_vaddr __user *u) > { > struct ioctl_gntdev_get_offset_for_vaddr op; > + struct vm_area_struct *vma; > struct grant_map *map; > > if (copy_from_user(&op, u, sizeof(op)) != 0) > @@ -479,40 +443,22 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, > printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv, > (unsigned long)op.vaddr); > > - spin_lock(&priv->lock); > - map = gntdev_find_map_vaddr(priv, op.vaddr); > - if (map == NULL || > - map->vma->vm_start != op.vaddr) { > - spin_unlock(&priv->lock); > + vma = find_vma(current->mm, op.vaddr); > + if (!vma) > return -EINVAL; > - } > + > + map = vma->vm_private_data; > + if (vma->vm_ops != &gntdev_vmops || !map) > + return -EINVAL; > + > op.offset = map->index << PAGE_SHIFT; > op.count = map->count; > - spin_unlock(&priv->lock); > > if (copy_to_user(u, &op, sizeof(op)) != 0) > return -EFAULT; > return 0; > } > > -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv, > - struct ioctl_gntdev_set_max_grants __user *u) > -{ > - struct ioctl_gntdev_set_max_grants op; > - > - if (copy_from_user(&op, u, sizeof(op)) != 0) > - return -EFAULT; > - if (debug) > - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count); > - if (op.count > limit) > - return -EINVAL; > - > - spin_lock(&priv->lock); > - priv->limit = op.count; > - spin_unlock(&priv->lock); > - return 0; > -} > - > static long gntdev_ioctl(struct file *flip, > unsigned int cmd, unsigned long arg) > { > @@ -521,7 +467,7 @@ static long gntdev_ioctl(struct file *flip, > > switch (cmd) { > case IOCTL_GNTDEV_MAP_GRANT_REF: > - return gntdev_ioctl_map_grant_ref(priv, ptr); > + return gntdev_ioctl_map_grant_ref(priv, ptr, 1); > > case IOCTL_GNTDEV_UNMAP_GRANT_REF: > return gntdev_ioctl_unmap_grant_ref(priv, ptr); > @@ -529,8 +475,8 @@ static long gntdev_ioctl(struct file *flip, > case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: > return gntdev_ioctl_get_offset_for_vaddr(priv, ptr); > > - case IOCTL_GNTDEV_SET_MAX_GRANTS: > - return gntdev_ioctl_set_max_grants(priv, ptr); > + case IOCTL_GNTDEV_MAP_GRANT_REF_2: > + return gntdev_ioctl_map_grant_ref(priv, ptr, 0); > > default: > if (debug) > @@ -542,6 +488,34 @@ static long gntdev_ioctl(struct file *flip, > return 0; > } > > +static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > +{ > + struct grant_map *map = vma->vm_private_data; > + pgoff_t pgoff = vmf->pgoff - vma->vm_pgoff; > + > + if (!map || !map->is_mapped || pgoff < 0 || pgoff > map->count) { > + if (debug) > + printk("%s: vaddr %p, pgoff %ld (shouldn''t happen)\n", > + __FUNCTION__, vmf->virtual_address, pgoff); > + return VM_FAULT_SIGBUS; > + } > + > + vmf->page = map->pages[pgoff].page; > + get_page(vmf->page); > + return 0; > +} > + > +static void gntdev_vma_close(struct vm_area_struct *vma) > +{ > + struct grant_map *map = vma->vm_private_data; > + gntdev_put_map(map); > +} > + > +static struct vm_operations_struct gntdev_vmops = { > + .fault = gntdev_vma_fault, > + .close = gntdev_vma_close, > +}; > + > static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) > { > struct gntdev_priv *priv = flip->private_data; > @@ -550,53 +524,39 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma) > struct grant_map *map; > int err = -EINVAL; > > - if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED)) > + if (!(vma->vm_flags & VM_SHARED)) > return -EINVAL; > > - if (debug) > - printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__, > - index, count, vma->vm_start, vma->vm_pgoff); > - > spin_lock(&priv->lock); > map = gntdev_find_map_index(priv, index, count); > + > + if (debug) > + printk("%s: map %d+%d at %lx (priv %p, map %p)\n", __func__, > + index, count, vma->vm_start, priv, map); > + > if (!map) > goto unlock_out; > - if (map->vma) > - goto unlock_out; > - if (priv->mm != vma->vm_mm) { > - printk("%s: Huh? Other mm?\n", __FUNCTION__); > - goto unlock_out; > + > + if (!map->is_mapped) { > + map->is_ro = !(vma->vm_flags & VM_WRITE); > + err = gntdev_do_map(map); > + if (err) > + goto unlock_out; > } > > + if ((vma->vm_flags & VM_WRITE) && map->is_ro) > + goto unlock_out; > + > + err = 0; > vma->vm_ops = &gntdev_vmops; > > vma->vm_flags |= VM_RESERVED; > - vma->vm_flags |= VM_DONTCOPY; > vma->vm_flags |= VM_DONTEXPAND; > + vma->vm_flags |= VM_FOREIGN; > > vma->vm_private_data = map; > - map->vma = vma; > > - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte; > - if (!(vma->vm_flags & VM_WRITE)) > - map->flags |= GNTMAP_readonly; > - > - err = apply_to_page_range(vma->vm_mm, vma->vm_start, > - vma->vm_end - vma->vm_start, > - find_grant_ptes, map); > - if (err) { > - goto unlock_out; > - if (debug) > - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__); > - } > - > - err = map_grant_pages(map); > - if (err) { > - goto unlock_out; > - if (debug) > - printk("%s: map_grant_pages() failure.\n", __FUNCTION__); > - } > - map->is_mapped = 1; > + atomic_inc(&map->users); > > unlock_out: > spin_unlock(&priv->lock); > diff --git a/include/xen/gntdev.h b/include/xen/gntdev.h > index 8bd1467..9df1ae3 100644 > --- a/include/xen/gntdev.h > +++ b/include/xen/gntdev.h > @@ -47,11 +47,17 @@ struct ioctl_gntdev_grant_ref { > */ > #define IOCTL_GNTDEV_MAP_GRANT_REF \ > _IOC(_IOC_NONE, ''G'', 0, sizeof(struct ioctl_gntdev_map_grant_ref)) > +#define IOCTL_GNTDEV_MAP_GRANT_REF_2 \ > +_IOC(_IOC_NONE, ''G'', 4, sizeof(struct ioctl_gntdev_map_grant_ref)) > struct ioctl_gntdev_map_grant_ref { > /* IN parameters */ > /* The number of grants to be mapped. */ > uint32_t count; > - uint32_t pad; > + /* Flags for this mapping */ > + union { > + uint32_t flags; > + uint32_t pad; > + }; > /* OUT parameters */ > /* The offset to be used on a subsequent call to mmap(). */ > uint64_t index; > @@ -59,6 +65,7 @@ struct ioctl_gntdev_map_grant_ref { > /* Array of grant references, of size @count. */ > struct ioctl_gntdev_grant_ref refs[1]; > }; > +#define GNTDEV_MAP_WRITABLE 0x1 > > /* > * Removes the grant references from the mapping table of an instance of > -- > 1.7.2.3 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-08 16:47 UTC
Re: [Xen-devel] [PATCH 2/2] xen-gntalloc: Userspace grant allocation driver
On Fri, 2010-12-03 at 15:38 +0000, Daniel De Graaf wrote:> This allows a userspace application to allocate a shared page for > implementing inter-domain communication or device drivers. These > shared pages can be mapped using the gntdev device or by the kernel > in another domain.This seems like useful functionality but is it really necessary for it to be a separate driver to the existing gntdev driver? The broad high level semantics of ioctl+mmap seem pretty similar. It also has some similarities with the sort of device we will need in order to properly allocate memory which is safe to use as an argument to a hypercall. Do you have an example of a user of the driver? Thanks, Ian.> > Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> > --- > drivers/xen/Kconfig | 7 + > drivers/xen/Makefile | 2 + > drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++ > include/xen/gntalloc.h | 68 +++++++ > 4 files changed, 533 insertions(+), 0 deletions(-) > create mode 100644 drivers/xen/gntalloc.c > create mode 100644 include/xen/gntalloc.h > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > index a9f3a8f..0be0edc 100644 > --- a/drivers/xen/Kconfig > +++ b/drivers/xen/Kconfig > @@ -179,6 +179,13 @@ config XEN_GNTDEV > help > Allows userspace processes to map grants from other domains. > > +config XEN_GRANT_DEV_ALLOC > + tristate "User-space grant reference allocator driver" > + depends on XEN > + help > + Allows userspace processes to create pages with access granted > + to other domains. > + > config XEN_S3 > def_bool y > depends on XEN_DOM0 && ACPI > diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile > index ef1ea63..9814c1d 100644 > --- a/drivers/xen/Makefile > +++ b/drivers/xen/Makefile > @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o > obj-$(CONFIG_XEN_BALLOON) += balloon.o > obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o > obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o > +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o > obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/ > obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/ > obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/ > @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o > > xen-evtchn-y := evtchn.o > xen-gntdev-y := gntdev.o > +xen-gntalloc-y := gntalloc.o > diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c > new file mode 100644 > index 0000000..f26adfd > --- /dev/null > +++ b/drivers/xen/gntalloc.c > @@ -0,0 +1,456 @@ > +/****************************************************************************** > + * gntalloc.c > + * > + * Device for creating grant references (in user-space) that may be shared > + * with other domains. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > + */ > + > +/* > + * This driver exists to allow userspace programs in Linux to allocate kernel > + * memory that will later be shared with another domain. Without this device, > + * Linux userspace programs cannot create grant references. > + * > + * How this stuff works: > + * X -> granting a page to Y > + * Y -> mapping the grant from X > + * > + * 1. X uses the gntalloc device to allocate a page of kernel memory, P. > + * 2. X creates an entry in the grant table that says domid(Y) can > + * access P. > + * 3. X gives the grant reference identifier, GREF, to Y. > + * 4. A program in Y uses the gntdev device to map the page (owned by X > + * and identified by GREF) into domain(Y) and then into the address > + * space of the program. Behind the scenes, this requires a > + * hypercall in which Xen modifies the host CPU page tables to > + * perform the sharing -- that''s where the actual cross-domain mapping > + * occurs. > + * 5. A program in X mmap()s a segment of the gntalloc device that > + * corresponds to the shared page. > + * 6. The two userspace programs can now communicate over the shared page. > + * > + * > + * NOTE TO USERSPACE LIBRARIES: > + * The grant allocation and mmap()ing are, naturally, two separate > + * operations. You set up the sharing by calling the create ioctl() and > + * then the mmap(). You must tear down the sharing in the reverse order > + * (munmap() and then the destroy ioctl()). > + * > + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant > + * reference, this device can be used to consume kernel memory by leaving grant > + * references mapped by another domain when an application exits. Therefore, > + * there is a global limit on the number of pages that can be allocated. When > + * all references to the page are unmapped, it will be freed during the next > + * grant operation. > + */ > + > +#include <asm/atomic.h> > +#include <linux/module.h> > +#include <linux/miscdevice.h> > +#include <linux/kernel.h> > +#include <linux/init.h> > +#include <linux/slab.h> > +#include <linux/fs.h> > +#include <linux/device.h> > +#include <linux/mm.h> > +#include <asm/uaccess.h> > +#include <linux/types.h> > +#include <linux/list.h> > + > +#include <xen/xen.h> > +#include <xen/page.h> > +#include <xen/grant_table.h> > +#include <xen/gntalloc.h> > + > +static int debug = 0; > +module_param(debug, int, 0644); > + > +static int limit = 1024; > +module_param(limit, int, 0644); > + > +static LIST_HEAD(gref_list); > +static DEFINE_SPINLOCK(gref_lock); > +static int gref_size = 0; > + > +/* Metadata on a grant reference. */ > +struct gntalloc_gref { > + struct list_head next_all; /* list entry gref_list */ > + struct list_head next_file; /* list entry file->list, if open */ > + domid_t foreign_domid; /* The ID of the domain to share with. */ > + grant_ref_t gref_id; /* The grant reference number. */ > + unsigned int users; /* Use count - when zero, waiting on Xen */ > + struct page* page; /* The shared page. */ > +}; > + > +struct gntalloc_file_private_data { > + struct list_head list; > +}; > + > +static void __del_gref(struct gntalloc_gref *gref); > + > +static void do_cleanup(void) > +{ > + struct gntalloc_gref *gref, *n; > + list_for_each_entry_safe(gref, n, &gref_list, next_all) { > + if (!gref->users) > + __del_gref(gref); > + } > +} > + > + > +static int add_gref(domid_t foreign_domid, uint32_t readonly, > + struct gntalloc_file_private_data *priv) > +{ > + int rc; > + struct gntalloc_gref *gref; > + > + rc = -ENOMEM; > + spin_lock(&gref_lock); > + do_cleanup(); > + if (gref_size >= limit) { > + spin_unlock(&gref_lock); > + rc = -ENOSPC; > + goto out; > + } > + gref_size++; > + spin_unlock(&gref_lock); > + > + gref = kzalloc(sizeof(*gref), GFP_KERNEL); > + if (!gref) > + goto out; > + > + gref->foreign_domid = foreign_domid; > + gref->users = 1; > + > + /* Allocate the page to share. */ > + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO); > + if (!gref->page) > + goto out_nopage; > + > + /* Grant foreign access to the page. */ > + gref->gref_id = gnttab_grant_foreign_access(foreign_domid, > + pfn_to_mfn(page_to_pfn(gref->page)), readonly); > + if (gref->gref_id < 0) { > + printk(KERN_ERR "%s: failed to grant foreign access for mfn " > + "%lu to domain %u\n", __func__, > + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid); > + rc = -EFAULT; > + goto out_no_foreign_gref; > + } > + > + /* Add to gref lists. */ > + spin_lock(&gref_lock); > + list_add_tail(&gref->next_all, &gref_list); > + list_add_tail(&gref->next_file, &priv->list); > + spin_unlock(&gref_lock); > + > + return gref->gref_id; > + > +out_no_foreign_gref: > + __free_page(gref->page); > +out_nopage: > + kfree(gref); > +out: > + return rc; > +} > + > +static void __del_gref(struct gntalloc_gref *gref) > +{ > + if (gnttab_query_foreign_access(gref->gref_id)) > + return; > + > + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0)) > + return; > + > + gref_size--; > + list_del(&gref->next_all); > + > + __free_page(gref->page); > + kfree(gref); > +} > + > +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv, > + grant_ref_t gref_id) > +{ > + struct gntalloc_gref *gref; > + list_for_each_entry(gref, &priv->list, next_file) { > + if (gref->gref_id == gref_id) > + return gref; > + } > + return NULL; > +} > + > +/* > + * ------------------------------------- > + * File operations. > + * ------------------------------------- > + */ > +static int gntalloc_open(struct inode *inode, struct file *filp) > +{ > + struct gntalloc_file_private_data *priv; > + > + try_module_get(THIS_MODULE); > + > + priv = kzalloc(sizeof(*priv), GFP_KERNEL); > + if (!priv) > + goto out_nomem; > + INIT_LIST_HEAD(&priv->list); > + > + filp->private_data = priv; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + return 0; > + > +out_nomem: > + return -ENOMEM; > +} > + > +static int gntalloc_release(struct inode *inode, struct file *filp) > +{ > + struct gntalloc_file_private_data *priv = filp->private_data; > + struct gntalloc_gref *gref; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + spin_lock(&gref_lock); > + while (!list_empty(&priv->list)) { > + gref = list_entry(priv->list.next, > + struct gntalloc_gref, next_file); > + list_del(&gref->next_file); > + gref->users--; > + if (gref->users == 0) > + __del_gref(gref); > + } > + kfree(priv); > + spin_unlock(&gref_lock); > + > + module_put(THIS_MODULE); > + > + return 0; > +} > + > +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv, > + void __user *arg) > +{ > + int rc = 0; > + struct ioctl_gntalloc_alloc_gref op; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + if (copy_from_user(&op, arg, sizeof(op))) { > + rc = -EFAULT; > + goto alloc_grant_out; > + } > + rc = add_gref(op.foreign_domid, op.readonly, priv); > + if (rc < 0) > + goto alloc_grant_out; > + > + op.gref_id = rc; > + op.page_idx = rc; > + > + rc = 0; > + > + if (copy_to_user((void __user *)arg, &op, sizeof(op))) { > + rc = -EFAULT; > + goto alloc_grant_out; > + } > + > +alloc_grant_out: > + return rc; > +} > + > +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv, > + void __user *arg) > +{ > + int rc = 0; > + struct ioctl_gntalloc_dealloc_gref op; > + struct gntalloc_gref *gref; > + > + if (debug) > + printk("%s: priv %p\n", __FUNCTION__, priv); > + > + if (copy_from_user(&op, arg, sizeof(op))) { > + rc = -EFAULT; > + goto dealloc_grant_out; > + } > + > + spin_lock(&gref_lock); > + gref = find_gref(priv, op.gref_id); > + if (gref) { > + list_del(&gref->next_file); > + gref->users--; > + rc = 0; > + } else { > + rc = -EINVAL; > + } > + > + do_cleanup(); > + spin_unlock(&gref_lock); > +dealloc_grant_out: > + return rc; > +} > + > +static long gntalloc_ioctl(struct file *filp, unsigned int cmd, > + unsigned long arg) > +{ > + struct gntalloc_file_private_data *priv = filp->private_data; > + > + switch (cmd) { > + case IOCTL_GNTALLOC_ALLOC_GREF: > + return gntalloc_ioctl_alloc(priv, (void __user*)arg); > + > + case IOCTL_GNTALLOC_DEALLOC_GREF: > + return gntalloc_ioctl_dealloc(priv, (void __user*)arg); > + > + default: > + return -ENOIOCTLCMD; > + } > + > + return 0; > +} > + > +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > +{ > + struct gntalloc_gref *gref = vma->vm_private_data; > + if (!gref) > + return VM_FAULT_SIGBUS; > + > + vmf->page = gref->page; > + get_page(vmf->page); > + > + return 0; > +}; > + > +static void gntalloc_vma_close(struct vm_area_struct *vma) > +{ > + struct gntalloc_gref *gref = vma->vm_private_data; > + if (!gref) > + return; > + > + spin_lock(&gref_lock); > + gref->users--; > + if (gref->users == 0) > + __del_gref(gref); > + spin_unlock(&gref_lock); > +} > + > +static struct vm_operations_struct gntalloc_vmops = { > + .fault = gntalloc_vma_fault, > + .close = gntalloc_vma_close, > +}; > + > +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma) > +{ > + struct gntalloc_file_private_data *priv = filp->private_data; > + struct gntalloc_gref *gref; > + > + if (debug) > + printk("%s: priv %p, page %lu\n", __func__, > + priv, vma->vm_pgoff); > + > + /* > + * There is a 1-to-1 correspondence of grant references to shared > + * pages, so it only makes sense to map exactly one page per > + * call to mmap(). > + */ > + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) { > + printk(KERN_ERR "%s: Only one page can be memory-mapped " > + "per grant reference.\n", __func__); > + return -EINVAL; > + } > + > + if (!(vma->vm_flags & VM_SHARED)) { > + printk(KERN_ERR "%s: Mapping must be shared.\n", > + __func__); > + return -EINVAL; > + } > + > + spin_lock(&gref_lock); > + gref = find_gref(priv, vma->vm_pgoff); > + if (gref == NULL) { > + spin_unlock(&gref_lock); > + printk(KERN_ERR "%s: Could not find a grant reference with " > + "page index %lu.\n", __func__, vma->vm_pgoff); > + return -ENOENT; > + } > + gref->users++; > + spin_unlock(&gref_lock); > + > + vma->vm_private_data = gref; > + > + /* This flag prevents Bad PTE errors when the memory is unmapped. */ > + vma->vm_flags |= VM_RESERVED; > + vma->vm_flags |= VM_DONTCOPY; > + vma->vm_flags |= VM_IO; > + > + vma->vm_ops = &gntalloc_vmops; > + > + return 0; > +} > + > +static const struct file_operations gntalloc_fops = { > + .owner = THIS_MODULE, > + .open = gntalloc_open, > + .release = gntalloc_release, > + .unlocked_ioctl = gntalloc_ioctl, > + .mmap = gntalloc_mmap > +}; > + > +/* > + * ------------------------------------- > + * Module creation/destruction. > + * ------------------------------------- > + */ > +static struct miscdevice gntalloc_miscdev = { > + .minor = MISC_DYNAMIC_MINOR, > + .name = "xen/gntalloc", > + .fops = &gntalloc_fops, > +}; > + > +static int __init gntalloc_init(void) > +{ > + int err; > + > + if (!xen_domain()) { > + if (debug) > + printk(KERN_ERR "gntalloc: You must be running Xen\n"); > + return -ENODEV; > + } > + > + err = misc_register(&gntalloc_miscdev); > + if (err != 0) { > + printk(KERN_ERR "Could not register misc gntalloc device\n"); > + return err; > + } > + > + if (debug) > + printk(KERN_INFO "Created grant allocation device at %d,%d\n", > + MISC_MAJOR, gntalloc_miscdev.minor); > + > + return 0; > +} > + > +static void __exit gntalloc_exit(void) > +{ > + misc_deregister(&gntalloc_miscdev); > +} > + > +module_init(gntalloc_init); > +module_exit(gntalloc_exit); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, " > + "Daniel De Graaf <dgdegra@tycho.nsa.gov>"); > +MODULE_DESCRIPTION("User-space grant reference allocator driver"); > diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h > new file mode 100644 > index 0000000..76b70d7 > --- /dev/null > +++ b/include/xen/gntalloc.h > @@ -0,0 +1,68 @@ > +/****************************************************************************** > + * gntalloc.h > + * > + * Interface to /dev/xen/gntalloc. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License version 2 > + * as published by the Free Software Foundation; or, when distributed > + * separately from the Linux kernel or incorporated into other > + * software packages, subject to the following license: > + * > + * Permission is hereby granted, free of charge, to any person obtaining a copy > + * of this source file (the "Software"), to deal in the Software without > + * restriction, including without limitation the rights to use, copy, modify, > + * merge, publish, distribute, sublicense, and/or sell copies of the Software, > + * and to permit persons to whom the Software is furnished to do so, subject to > + * the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#ifndef __LINUX_PUBLIC_GNTALLOC_H__ > +#define __LINUX_PUBLIC_GNTALLOC_H__ > + > +/* > + * Allocates a new page and creates a new grant reference. > + * > + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it''s the > + * page number and not an actual address. It must be shifted again prior > + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT). > + */ > +#define IOCTL_GNTALLOC_ALLOC_GREF \ > +_IOC(_IOC_NONE, ''G'', 1, sizeof(struct ioctl_gntalloc_alloc_gref)) > +struct ioctl_gntalloc_alloc_gref { > + /* IN parameters */ > + /* The ID of the domain creating the grant reference. */ > + domid_t owner_domid; > + /* The ID of the domain to be given access to the grant. */ > + domid_t foreign_domid; > + /* The type of access given to domid. */ > + uint32_t readonly; > + /* OUT parameters */ > + /* The grant reference of the newly created grant. */ > + grant_ref_t gref_id; > + /* The page index (page number, NOT address) for grant mmap(). */ > + uint32_t page_idx; > +}; > + > +/* > + * Deallocates the grant reference, freeing the associated page. > + */ > +#define IOCTL_GNTALLOC_DEALLOC_GREF \ > +_IOC(_IOC_NONE, ''G'', 2, sizeof(struct ioctl_gntalloc_dealloc_gref)) > +struct ioctl_gntalloc_dealloc_gref { > + /* IN parameter */ > + /* The grant reference to deallocate. */ > + grant_ref_t gref_id; > +}; > +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-08 18:15 UTC
Re: [Xen-devel] [PATCH 1/2] xen-gntdev: support mapping in HVM domains
On 12/08/2010 11:40 AM, Ian Campbell wrote:> Hi Daniel, > > On Fri, 2010-12-03 at 15:37 +0000, Daniel De Graaf wrote: >> This changes the /dev/xen/gntdev device to work in HVM domains, and >> also makes the mmap() behavior more closely match the behavior of >> files instead of requiring an ioctl() call for each mmap/munmap. > > This patch seems to contain a lot more than the above very brief > description would suggest. As well as those two changes it looks to > include various refactoring and code-motion, some clean up and other > changes.Sorry about that, I should have made a better effort to explain the reason for the refactoring in the patch comment. In the old gntdev device, a hypercall was made to adjust the actual page tables of the userspace process according to the mapping set up in the previous ioctl(). This direct page table manipulation does not work in HVM, because the page tables refer to guest-physical addresses. The solution is to not use the GNTMAP_contains_pte flag when making the hypercall, and instead use guest-physical address space to contain the mapping. Due to this change, the mmap() call is no longer the best place to do the mapping; the eventual userspace address does not matter and the cleanup is handled differently. This is most of the refactoring, elimination of MMU notifier dependency, and the patch for this is probably going to be large no matter what I do in order to keep it working before and after.> Please can you also go into more detail in the relevant patch on the new > ioctl/mmap semantics rather than just mentioning that you''ve changed > them. Do you retain compatibility with the old behaviour? Is there a > corresponding toolstack change which makes use of this functionality?The change is fully backwards compatible, so the current Xen toolchain will work with the changed API. I have a test program that uses the new ioctl, which I could include if it would be useful; the modifications to the vchan library that use the new ioctl() are not yet finished. The other mentioned changes in mmap() aren''t used in any code I have; they are just limitations caused by direct PTE manipulation that no longer apply.> The change to support HVM and the change in mmap/ioctl behaviour should > certainly be separate patches but there seems to also be a some data > structure refactoring going on, a change from using the mmu notifiers to > something else, some sort of lazy mapping functionality etc these should > all get their own individual patches with a changelog describing what > they do and why.I will try to strip down the patch to only support the old ioctl() API, and then introduce the new call in another patch; same for the semantic change in limit tracking. That will make it easier to evaluate changes that are visible from userspace. The data structure changes, mmu notifier elimination, and a large part of the refactoring can''t be split up because of the change from PTE to guest-physical remapping.> Thanks, > Ian. >-- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-08 18:15 UTC
Re: [Xen-devel] [PATCH 2/2] xen-gntalloc: Userspace grant allocation driver
On 12/08/2010 11:47 AM, Ian Campbell wrote:> On Fri, 2010-12-03 at 15:38 +0000, Daniel De Graaf wrote: >> This allows a userspace application to allocate a shared page for >> implementing inter-domain communication or device drivers. These >> shared pages can be mapped using the gntdev device or by the kernel >> in another domain. > > This seems like useful functionality but is it really necessary for it > to be a separate driver to the existing gntdev driver? The broad high > level semantics of ioctl+mmap seem pretty similar. It also has some > similarities with the sort of device we will need in order to properly > allocate memory which is safe to use as an argument to a hypercall.The functionality is similar enough that I considered changing this to additional ioctl() in the gntdev device, but decided to leave them split because the semantics of creating a shared page are slightly more dangerous than simply mapping pages from other domains. As noted in the driver, due to a limitation of Xen''s grant table API, there is no way for a guest to force other guests to unmap shared pages once they have been mapped. This means that if a userspace application using gntalloc crashes, the other end may not notice and would keep the page mapped until another event (application restarts and requests peers to clean up old session, or the peer itself terminates and unmaps the pages). This will use up both guest memory and space in the grant table (normally limited to 32 pages, so the limit on gntalloc will not allow exhaustion). If the devices are distinct, it is possible to allow applications access to one without allowing access to both; I am not aware of any easy way to do this if they are both implemented by similar ioctl()/mmap() calls on a single device node. A hypercall-safe memory allocation device will likely share code from this device, as it shares some of the mapping code with gntdev. Are there existing patches for the hypercall buffer allocation? It may be useful to try to factor out a some common code for dealing with pages used to communicate between Xen and userspace.> Do you have an example of a user of the driver?I do have an communication library (vchan, based on code from Qubes); I am currently modifying it to allow the use of more than one page for the ring to reduce context switches when passing large amounts of data (this cost is increased due to both ends being in userspace, rather than kernel space). If that isn''t ready soon, I will just post the version of vchan using this device and the modified gntdev API.> Thanks, > Ian. >-- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Dec-08 20:17 UTC
Re: [Xen-devel] [PATCH 2/2] xen-gntalloc: Userspace grant allocation driver
On 12/08/2010 01:15 PM, Daniel De Graaf wrote:> On 12/08/2010 11:47 AM, Ian Campbell wrote: >> Do you have an example of a user of the driver? > > I do have an communication library (vchan, based on code from Qubes); I am > currently modifying it to allow the use of more than one page for the ring > to reduce context switches when passing large amounts of data (this cost is > increased due to both ends being in userspace, rather than kernel space). If > that isn''t ready soon, I will just post the version of vchan using this > device and the modified gntdev API. >Actually, I''ll just post the vchan code that I have now. It''s a simple wrapper around Xen''s idea of a ring buffer implemented in userspace. The gntdev use is currently abstracted using xc library calls, which doesn''t match how gntalloc is used; changing to direct use of gntdev will be better especially since the calling xc_gnttab_open with a NULL argument is not a documented use of the API. The gntdev and gntalloc devices are useful for more than just stream communication; they can be used as building blocks for implementing the majority of Xen devices as userspace drivers. -- Daniel De Graaf National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Eamon Walsh
2010-Dec-09 01:14 UTC
Re: [Xen-devel] [PATCH 2/2] xen-gntalloc: Userspace grant allocation driver
On 12/08/2010 11:47 AM, Ian Campbell wrote:> On Fri, 2010-12-03 at 15:38 +0000, Daniel De Graaf wrote: >> This allows a userspace application to allocate a shared page for >> implementing inter-domain communication or device drivers. These >> shared pages can be mapped using the gntdev device or by the kernel >> in another domain. > This seems like useful functionality but is it really necessary for it > to be a separate driver to the existing gntdev driver? The broad high > level semantics of ioctl+mmap seem pretty similar. It also has some > similarities with the sort of device we will need in order to properly > allocate memory which is safe to use as an argument to a hypercall. > > Do you have an example of a user of the driver? > > Thanks, > Ian. >We are using gntalloc with the vchan library that Daniel posted for doing userspace ring buffers. Another use of it that we are pursuing is for graphics. Gntalloc would allow the X server to share one or more virtual framebuffers directly instead of using the xen-fbfront kernel driver. This would allow multiple X sessions or screens to be forwarded from the same guest. We are also investigating the possibility of applications rendering individual windows into their own buffers and then sharing them to a display server in another domain. -- Eamon Walsh National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel