Alistair Popple
2021-Mar-02 08:57 UTC
[Nouveau] [PATCH v3 5/8] mm: Device exclusive memory access
On Tuesday, 2 March 2021 11:05:59 AM AEDT Jason Gunthorpe wrote:> On Fri, Feb 26, 2021 at 06:18:29PM +1100, Alistair Popple wrote: > > > +/** > > + * make_device_exclusive_range() - Mark a range for exclusive use by adevice> > + * @mm: mm_struct of assoicated target process > > + * @start: start of the region to mark for exclusive device access > > + * @end: end address of region > > + * @pages: returns the pages which were successfully mark for exclusiveacces> > + * > > + * Returns: number of pages successfully marked for exclusive access > > + * > > + * This function finds the ptes mapping page(s) to the given addressrange and> > + * replaces them with special swap entries preventing userspace CPUaccess. On> > + * fault these entries are replaced with the original mapping aftercalling MMU> > + * notifiers. > > + */ > > +int make_device_exclusive_range(struct mm_struct *mm, unsigned longstart,> > + unsigned long end, struct page **pages) > > +{ > > + long npages = (end - start) >> PAGE_SHIFT; > > + long i; > > + > > + npages = get_user_pages_remote(mm, start, npages, > > + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, > > + pages, NULL, NULL); > > + for (i = 0; i < npages; i++) { > > + if (!trylock_page(pages[i])) { > > + put_page(pages[i]); > > + pages[i] = NULL; > > + continue; > > + } > > + > > + if (!try_to_protect(pages[i])) { > > Isn't this racy? get_user_pages returns the ptes at an instant in > time, they could have already been changed to something else?Right. On it's own this does not guarantee that the page is mapped at the given location, only that a mapping won't get established without an mmu notifier callback to clear the swap entry. The intent was a driver could use HMM or some other mechanism to keep PTEs synchronised if required. However I just looked at patch 8 in the series again and it appears I got this wrong when converting from the old migration approach: + mutex_unlock(&svmm->mutex); + ret = nouveau_atomic_range_fault(svmm, drm, args, + size, hmm_flags, mm); The mutex needs to be unlocked after the range fault to ensure the PTE hasn't changed. But this ends up being a problem because try_to_protect() calls notifiers which need to take that mutex and hence deadlocks.> I would think you'd want to switch to the swap entry atomically under > th PTLs?That is one approach, but the reuse of get_user_pages() to walk the page tables and fault/gather the pages is a nice simplification and adding a new FOLL flag/mode to atomically swap entries doesn't seem right. However try_to_protect() scans the PTEs again under the PTL so checking the mapping of interest actually gets replaced during the rmap walk seems like a reasonable solution. Thanks for the comments. - Alistair> Jason >
Jason Gunthorpe
2021-Mar-02 12:41 UTC
[Nouveau] [PATCH v3 5/8] mm: Device exclusive memory access
On Tue, Mar 02, 2021 at 07:57:58PM +1100, Alistair Popple wrote:> The intent was a driver could use HMM or some other mechanism to keep PTEs > synchronised if required. However I just looked at patch 8 in the series again > and it appears I got this wrong when converting from the old migration > approach: > > + mutex_unlock(&svmm->mutex); > + ret = nouveau_atomic_range_fault(svmm, drm, args, > + size, hmm_flags, mm); > > The mutex needs to be unlocked after the range fault to ensure the PTE hasn't > changed. But this ends up being a problem because try_to_protect() calls > notifiers which need to take that mutex and hence deadlocks.you have to check the notifier sequence under the mutex and loop again. The mutex should only cover programming the HW to use the pages, nothing else.> However try_to_protect() scans the PTEs again under the PTL so checking the > mapping of interest actually gets replaced during the rmap walk seems like a > reasonable solution. Thanks for the comments.It does seem cleaner if you can manage it, the notifier will still be needd to program the HW though Jason