Konrad Rzeszutek Wilk
2012-Dec-21 20:18 UTC
Re: oopsable race in xen-gntdev (unsafe vma access)
On Sat, Dec 15, 2012 at 06:12:11PM +0000, Al Viro wrote:> 1) find_vma() is *not* safe without ->mmap_sem and its result may > very well be freed just as it''s returned to caller. IOW, > gntdev_ioctl_get_offset_for_vaddr() is racy and may end up with > dereferencing freed memory. > > 2) gntdev_vma_close() is putting NULL into map->vma with only > ->mmap_sem held by caller. Things like > if (!map->vma) > continue; > if (map->vma->vm_start >= end) > continue; > if (map->vma->vm_end <= start) > done with just priv->lock held are racy. > > I''m not familiar with the code, but it looks like we need to > protect gntdev_vma_close() guts with the same spinlock and probably > hold ->mmap_sem shared around the "find_vma()+get to map->{index,count}" > in the ioctl. Or replace the logics in ioctl with search through the > list of grant_map under the same spinlock... > > Comments?Hey Al, Thank you for your analysis. CC-ing Daniel, David and Stefano. I recall we had some priv->lock movement in the past and there is also interaction with another piece of code - the balloon code so we better be circumspect of not blowing up. Al, it is around holidays and folks are mostly gone - so this will take a bit of time to get sorted out.
On 12/21/2012 03:18 PM, Konrad Rzeszutek Wilk wrote:> On Sat, Dec 15, 2012 at 06:12:11PM +0000, Al Viro wrote: >> 1) find_vma() is *not* safe without ->mmap_sem and its result may >> very well be freed just as it''s returned to caller. IOW, >> gntdev_ioctl_get_offset_for_vaddr() is racy and may end up with >> dereferencing freed memory.I agree, this one should be fixed by taking mmap_sem in gntdev_ioctl_get_offset_for_vaddr. Iterating the grant_map list here will not work under HVM, where map->vma is not filled in.>> 2) gntdev_vma_close() is putting NULL into map->vma with only >> ->mmap_sem held by caller. Things like >> if (!map->vma) >> continue; >> if (map->vma->vm_start >= end) >> continue; >> if (map->vma->vm_end <= start) >> done with just priv->lock held are racy. >> >> I''m not familiar with the code, but it looks like we need to >> protect gntdev_vma_close() guts with the same spinlock and probably >> hold ->mmap_sem shared around the "find_vma()+get to map->{index,count}" >> in the ioctl. Or replace the logics in ioctl with search through the >> list of grant_map under the same spinlock... >> >> Comments?Although I don''t think the mmu notifier is ever called without mmap_sem on this particular device file (we map only with VM_DONTCOPY and other paths like truncate generally aren''t triggered), it''s probably best not to rely on that behavior, so adding the spinlock in gntdev_vma_close seems to be the best solution.> Hey Al, > > Thank you for your analysis. > > CC-ing Daniel, David and Stefano. I recall we had some priv->lock movement > in the past and there is also interaction with another piece of code - > the balloon code so we better be circumspect of not blowing up. > > Al, it is around holidays and folks are mostly gone - so this will take > a bit of time to get sorted out.While I was digging in this code, I found a related bug in mn_invl_range_start: if gntdev_ioctl_unmap_grant_ref is called on a range before unmapping it, the entry is removed from priv->maps and the later call to mn_invl_range_start won''t find it to do the unmapping. This could be fixed by using find_vma, but I don''t think there''s a safe way to do that from inside the mmu notifier, so instead I created a list of these unlinked but still mapped pages. The third patch is a fix to an unrelated bug that I found while testing the fixes in the other two patches. [PATCH 1/3] xen/gntdev: fix unsafe vma access [PATCH 2/3] xen/gntdev: correctly unmap unlinked maps in mmu [PATCH 3/3] xen/gntdev: remove erronous use of copy_to_user
In gntdev_ioctl_get_offset_for_vaddr, we need to hold mmap_sem while calling find_vma() to avoid potentially having the result freed out from under us. Similarly, the MMU notifier functions need to synchronize with gntdev_vma_close to avoid map->vma being freed during their iteration. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> --- drivers/xen/gntdev.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 2e22df2..cca62cc 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -378,7 +378,20 @@ static void gntdev_vma_close(struct vm_area_struct *vma) struct grant_map *map = vma->vm_private_data; pr_debug("gntdev_vma_close %p\n", vma); - map->vma = NULL; + if (use_ptemod) { + struct file *file = vma->vm_file; + struct gntdev_priv *priv = file->private_data; + /* It is possible that an mmu notifier could be running + * concurrently, so take priv->lock to ensure that the vma won''t + * vanishing during the unmap_grant_pages call, since we will + * spin here until that completes. Such a concurrent call will + * not do any unmapping, since that has been done prior to + * closing the vma, but it may still iterate the unmap_ops list. + */ + spin_lock(&priv->lock); + map->vma = NULL; + spin_unlock(&priv->lock); + } vma->vm_private_data = NULL; gntdev_put_map(map); } @@ -579,25 +592,31 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, struct ioctl_gntdev_get_offset_for_vaddr op; struct vm_area_struct *vma; struct grant_map *map; + int rv = -EINVAL; if (copy_from_user(&op, u, sizeof(op)) != 0) return -EFAULT; pr_debug("priv %p, offset for vaddr %lx\n", priv, (unsigned long)op.vaddr); + down_read(¤t->mm->mmap_sem); vma = find_vma(current->mm, op.vaddr); if (!vma || vma->vm_ops != &gntdev_vmops) - return -EINVAL; + goto out_unlock; map = vma->vm_private_data; if (!map) - return -EINVAL; + goto out_unlock; op.offset = map->index << PAGE_SHIFT; op.count = map->count; + rv = 0; - if (copy_to_user(u, &op, sizeof(op)) != 0) + out_unlock: + up_read(¤t->mm->mmap_sem); + + if (rv == 0 && copy_to_user(u, &op, sizeof(op)) != 0) return -EFAULT; - return 0; + return rv; } static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u) -- 1.7.11.7
Daniel De Graaf
2013-Jan-02 22:57 UTC
[PATCH 2/3] xen/gntdev: correctly unmap unlinked maps in mmu notifier
If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping it, the entry is removed from priv->maps and the later call to mn_invl_range_start won''t find it to do the unmapping. Fix this by creating another list of freeable maps that the mmu notifier can search and use to unmap grants. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 92 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 63 insertions(+), 29 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index cca62cc..9be3e5e 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -56,10 +56,15 @@ MODULE_PARM_DESC(limit, "Maximum number of grants that may be mapped by " static atomic_t pages_mapped = ATOMIC_INIT(0); static int use_ptemod; +#define populate_freeable_maps use_ptemod struct gntdev_priv { + /* maps with visible offsets in the file descriptor */ struct list_head maps; - /* lock protects maps from concurrent changes */ + /* maps that are not visible; will be freed on munmap. + * Only populated if populate_freeable_maps == 1 */ + struct list_head freeable_maps; + /* lock protects maps and freeable_maps */ spinlock_t lock; struct mm_struct *mm; struct mmu_notifier mn; @@ -193,7 +198,7 @@ static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, return NULL; } -static void gntdev_put_map(struct grant_map *map) +static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map) { if (!map) return; @@ -208,6 +213,12 @@ static void gntdev_put_map(struct grant_map *map) evtchn_put(map->notify.event); } + if (populate_freeable_maps && priv) { + spin_lock(&priv->lock); + list_del(&map->next); + spin_unlock(&priv->lock); + } + if (map->pages && !use_ptemod) unmap_grant_pages(map, 0, map->count); gntdev_free_map(map); @@ -376,11 +387,11 @@ static void gntdev_vma_open(struct vm_area_struct *vma) static void gntdev_vma_close(struct vm_area_struct *vma) { struct grant_map *map = vma->vm_private_data; + struct file *file = vma->vm_file; + struct gntdev_priv *priv = file->private_data; pr_debug("gntdev_vma_close %p\n", vma); if (use_ptemod) { - struct file *file = vma->vm_file; - struct gntdev_priv *priv = file->private_data; /* It is possible that an mmu notifier could be running * concurrently, so take priv->lock to ensure that the vma won''t * vanishing during the unmap_grant_pages call, since we will @@ -393,7 +404,7 @@ static void gntdev_vma_close(struct vm_area_struct *vma) spin_unlock(&priv->lock); } vma->vm_private_data = NULL; - gntdev_put_map(map); + gntdev_put_map(priv, map); } static struct vm_operations_struct gntdev_vmops = { @@ -403,33 +414,43 @@ static struct vm_operations_struct gntdev_vmops = { /* ------------------------------------------------------------------ */ +static void unmap_if_in_range(struct grant_map *map, + unsigned long start, unsigned long end) +{ + unsigned long mstart, mend; + int err; + + if (!map->vma) + return; + if (map->vma->vm_start >= end) + return; + if (map->vma->vm_end <= start) + return; + mstart = max(start, map->vma->vm_start); + mend = min(end, map->vma->vm_end); + pr_debug("map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n", + map->index, map->count, + map->vma->vm_start, map->vma->vm_end, + start, end, mstart, mend); + err = unmap_grant_pages(map, + (mstart - map->vma->vm_start) >> PAGE_SHIFT, + (mend - mstart) >> PAGE_SHIFT); + WARN_ON(err); +} + static void mn_invl_range_start(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end) { struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn); struct grant_map *map; - unsigned long mstart, mend; - int err; spin_lock(&priv->lock); list_for_each_entry(map, &priv->maps, next) { - if (!map->vma) - continue; - if (map->vma->vm_start >= end) - continue; - if (map->vma->vm_end <= start) - continue; - mstart = max(start, map->vma->vm_start); - mend = min(end, map->vma->vm_end); - pr_debug("map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n", - map->index, map->count, - map->vma->vm_start, map->vma->vm_end, - start, end, mstart, mend); - err = unmap_grant_pages(map, - (mstart - map->vma->vm_start) >> PAGE_SHIFT, - (mend - mstart) >> PAGE_SHIFT); - WARN_ON(err); + unmap_if_in_range(map, start, end); + } + list_for_each_entry(map, &priv->freeable_maps, next) { + unmap_if_in_range(map, start, end); } spin_unlock(&priv->lock); } @@ -458,6 +479,15 @@ static void mn_release(struct mmu_notifier *mn, err = unmap_grant_pages(map, /* offset */ 0, map->count); WARN_ON(err); } + list_for_each_entry(map, &priv->freeable_maps, next) { + if (!map->vma) + continue; + pr_debug("map %d+%d (%lx %lx)\n", + map->index, map->count, + map->vma->vm_start, map->vma->vm_end); + err = unmap_grant_pages(map, /* offset */ 0, map->count); + WARN_ON(err); + } spin_unlock(&priv->lock); } @@ -479,6 +509,7 @@ static int gntdev_open(struct inode *inode, struct file *flip) return -ENOMEM; INIT_LIST_HEAD(&priv->maps); + INIT_LIST_HEAD(&priv->freeable_maps); spin_lock_init(&priv->lock); if (use_ptemod) { @@ -513,8 +544,9 @@ static int gntdev_release(struct inode *inode, struct file *flip) while (!list_empty(&priv->maps)) { map = list_entry(priv->maps.next, struct grant_map, next); list_del(&map->next); - gntdev_put_map(map); + gntdev_put_map(NULL /* already removed */, map); } + WARN_ON(!list_empty(&priv->freeable_maps)); if (use_ptemod) mmu_notifier_unregister(&priv->mn, priv->mm); @@ -542,14 +574,14 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit)) { pr_debug("can''t map: over limit\n"); - gntdev_put_map(map); + gntdev_put_map(NULL, map); return err; } if (copy_from_user(map->grants, &u->refs, sizeof(map->grants[0]) * op.count) != 0) { - gntdev_put_map(map); - return err; + gntdev_put_map(NULL, map); + return -EFAULT; } spin_lock(&priv->lock); @@ -578,11 +610,13 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv, map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count); if (map) { list_del(&map->next); + if (populate_freeable_maps) + list_add_tail(&map->next, &priv->freeable_maps); err = 0; } spin_unlock(&priv->lock); if (map) - gntdev_put_map(map); + gntdev_put_map(priv, map); return err; } @@ -797,7 +831,7 @@ out_unlock_put: out_put_map: if (use_ptemod) map->vma = NULL; - gntdev_put_map(map); + gntdev_put_map(priv, map); return err; } -- 1.7.11.7
Daniel De Graaf
2013-Jan-02 22:57 UTC
[PATCH 3/3] xen/gntdev: remove erronous use of copy_to_user
Since there is now a mapping of granted pages in kernel address space in both PV and HVM, use it for UNMAP_NOTIFY_CLEAR_BYTE instead of accessing memory via copy_to_user and triggering sleep-in-atomic warnings. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> --- drivers/xen/gntdev.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 9be3e5e..3c8803f 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -312,17 +312,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages) if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) { int pgno = (map->notify.addr >> PAGE_SHIFT); - if (pgno >= offset && pgno < offset + pages && use_ptemod) { - void __user *tmp = (void __user *) - map->vma->vm_start + map->notify.addr; - err = copy_to_user(tmp, &err, 1); - if (err) - return -EFAULT; - map->notify.flags &= ~UNMAP_NOTIFY_CLEAR_BYTE; - } else if (pgno >= offset && pgno < offset + pages) { - uint8_t *tmp = kmap(map->pages[pgno]); + if (pgno >= offset && pgno < offset + pages) { + /* No need for kmap, pages are in lowmem */ + uint8_t *tmp = pfn_to_kaddr(page_to_pfn(map->pages[pgno])); tmp[map->notify.addr & (PAGE_SIZE-1)] = 0; - kunmap(map->pages[pgno]); map->notify.flags &= ~UNMAP_NOTIFY_CLEAR_BYTE; } } -- 1.7.11.7
On Wed, Jan 02, 2013 at 05:57:10PM -0500, Daniel De Graaf wrote:> On 12/21/2012 03:18 PM, Konrad Rzeszutek Wilk wrote: > > On Sat, Dec 15, 2012 at 06:12:11PM +0000, Al Viro wrote: > >> 1) find_vma() is *not* safe without ->mmap_sem and its result may > >> very well be freed just as it''s returned to caller. IOW, > >> gntdev_ioctl_get_offset_for_vaddr() is racy and may end up with > >> dereferencing freed memory. > > I agree, this one should be fixed by taking mmap_sem in > gntdev_ioctl_get_offset_for_vaddr. Iterating the grant_map list here > will not work under HVM, where map->vma is not filled in. > > >> 2) gntdev_vma_close() is putting NULL into map->vma with only > >> ->mmap_sem held by caller. Things like > >> if (!map->vma) > >> continue; > >> if (map->vma->vm_start >= end) > >> continue; > >> if (map->vma->vm_end <= start) > >> done with just priv->lock held are racy. > >> > >> I''m not familiar with the code, but it looks like we need to > >> protect gntdev_vma_close() guts with the same spinlock and probably > >> hold ->mmap_sem shared around the "find_vma()+get to map->{index,count}" > >> in the ioctl. Or replace the logics in ioctl with search through the > >> list of grant_map under the same spinlock... > >> > >> Comments? > > Although I don''t think the mmu notifier is ever called without mmap_sem > on this particular device file (we map only with VM_DONTCOPY and other > paths like truncate generally aren''t triggered), it''s probably best not > to rely on that behavior, so adding the spinlock in gntdev_vma_close > seems to be the best solution. > > > Hey Al, > > > > Thank you for your analysis.Are you OK with the patches or have comments about them? Thanks.> > > > CC-ing Daniel, David and Stefano. I recall we had some priv->lock movement > > in the past and there is also interaction with another piece of code - > > the balloon code so we better be circumspect of not blowing up. > > > > Al, it is around holidays and folks are mostly gone - so this will take > > a bit of time to get sorted out. > > While I was digging in this code, I found a related bug in > mn_invl_range_start: if gntdev_ioctl_unmap_grant_ref is called on > a range before unmapping it, the entry is removed from priv->maps and > the later call to mn_invl_range_start won''t find it to do the unmapping. > This could be fixed by using find_vma, but I don''t think there''s a safe > way to do that from inside the mmu notifier, so instead I created a list > of these unlinked but still mapped pages. > > The third patch is a fix to an unrelated bug that I found while testing > the fixes in the other two patches. > > [PATCH 1/3] xen/gntdev: fix unsafe vma access > [PATCH 2/3] xen/gntdev: correctly unmap unlinked maps in mmu > [PATCH 3/3] xen/gntdev: remove erronous use of copy_to_user