David Hildenbrand
2025-Jan-29 11:57 UTC
[PATCH v1 0/4] mm: cleanups for device-exclusive entries (hmm)
This is a follow-up to [1], performing some related cleanups. There are more cleanups to be had, but I'll have to focus on some other stuff next. I might come back to that once I'm stuck on (annoyed by :) ) other things. Cc: Andrew Morton <akpm at linux-foundation.org> Cc: "J?r?me Glisse" <jglisse at redhat.com> Cc: Jonathan Corbet <corbet at lwn.net> Cc: Alex Shi <alexs at kernel.org> Cc: Yanteng Si <si.yanteng at linux.dev> Cc: Karol Herbst <kherbst at redhat.com> Cc: Lyude Paul <lyude at redhat.com> Cc: Danilo Krummrich <dakr at kernel.org> Cc: David Airlie <airlied at gmail.com> Cc: Simona Vetter <simona at ffwll.ch> Cc: "Liam R. Howlett" <Liam.Howlett at oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes at oracle.com> Cc: Vlastimil Babka <vbabka at suse.cz> Cc: Jann Horn <jannh at google.com> Cc: Pasha Tatashin <pasha.tatashin at soleen.com> Cc: Peter Xu <peterx at redhat.com> Cc: Alistair Popple <apopple at nvidia.com> Cc: Jason Gunthorpe <jgg at nvidia.com> [1] https://lkml.kernel.org/r/20250129115411.2077152-1-david at redhat.com David Hildenbrand (4): lib/test_hmm: make dmirror_atomic_map() consume a single page mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE mm/memory: pass folio and pte to restore_exclusive_pte() mm/memory: document restore_exclusive_pte() drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +-- include/linux/mmu_notifier.h | 4 +- include/linux/rmap.h | 2 +- lib/test_hmm.c | 35 ++++++----------- mm/memory.c | 54 +++++++++++++++++++-------- mm/rmap.c | 3 +- 6 files changed, 54 insertions(+), 50 deletions(-) -- 2.48.1
David Hildenbrand
2025-Jan-29 11:57 UTC
[PATCH v1 1/4] lib/test_hmm: make dmirror_atomic_map() consume a single page
The caller now always passes a single page; let's simplify, and return "0" on success. Signed-off-by: David Hildenbrand <david at redhat.com> --- lib/test_hmm.c | 33 ++++++++++----------------------- 1 file changed, 10 insertions(+), 23 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 9e1b07a227a3..1c0a58279db9 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -706,34 +706,23 @@ static int dmirror_check_atomic(struct dmirror *dmirror, unsigned long start, return 0; } -static int dmirror_atomic_map(unsigned long start, unsigned long end, - struct page **pages, struct dmirror *dmirror) +static int dmirror_atomic_map(unsigned long addr, struct page *page, + struct dmirror *dmirror) { - unsigned long pfn, mapped = 0; - int i; + void *entry; /* Map the migrated pages into the device's page tables. */ mutex_lock(&dmirror->mutex); - for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++, i++) { - void *entry; - - if (!pages[i]) - continue; - - entry = pages[i]; - entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC); - entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC); - if (xa_is_err(entry)) { - mutex_unlock(&dmirror->mutex); - return xa_err(entry); - } - - mapped++; + entry = xa_tag_pointer(page, DPT_XA_TAG_ATOMIC); + entry = xa_store(&dmirror->pt, addr >> PAGE_SHIFT, entry, GFP_ATOMIC); + if (xa_is_err(entry)) { + mutex_unlock(&dmirror->mutex); + return xa_err(entry); } mutex_unlock(&dmirror->mutex); - return mapped; + return 0; } static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, @@ -803,9 +792,7 @@ static int dmirror_exclusive(struct dmirror *dmirror, break; } - ret = dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror); - if (!ret) - ret = -EBUSY; + ret = dmirror_atomic_map(addr, page, dmirror); folio_unlock(folio); folio_put(folio); -- 2.48.1
David Hildenbrand
2025-Jan-29 11:58 UTC
[PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE
We no longer get a MMU_NOTIFY_EXCLUSIVE on conversion with the owner set that one has to filter out: if there already *is* a device-exclusive entry (e.g., other device, we don't have that information), GUP will convert it back to an ordinary PTE and notify via remove_device_exclusive_entry(). Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +----- include/linux/mmu_notifier.h | 4 +--- include/linux/rmap.h | 2 +- lib/test_hmm.c | 2 +- mm/rmap.c | 3 +-- 5 files changed, 5 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 39e3740980bb..4758fee182b4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -510,10 +510,6 @@ static bool nouveau_svm_range_invalidate(struct mmu_interval_notifier *mni, struct svm_notifier *sn container_of(mni, struct svm_notifier, notifier); - if (range->event == MMU_NOTIFY_EXCLUSIVE && - range->owner == sn->svmm->vmm->cli->drm->dev) - return true; - /* * serializes the update to mni->invalidate_seq done by caller and * prevents invalidation of the PTE from progressing while HW is being @@ -609,7 +605,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, notifier_seq = mmu_interval_read_begin(¬ifier->notifier); mmap_read_lock(mm); - page = make_device_exclusive(mm, start, drm->dev, &folio); + page = make_device_exclusive(mm, start, &folio); mmap_read_unlock(mm); if (IS_ERR(page)) { ret = -EINVAL; diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index d4e714661826..bac2385099dd 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -44,9 +44,7 @@ struct mmu_interval_notifier; * owner field matches the driver's device private pgmap owner. * * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no - * longer have exclusive access to the page. When sent during creation of an - * exclusive range the owner will be initialised to the value provided by the - * caller of make_device_exclusive(), otherwise the owner will be NULL. + * longer have exclusive access to the page. */ enum mmu_notifier_event { MMU_NOTIFY_UNMAP = 0, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 86425d42c1a9..3b216b91d2e5 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -664,7 +664,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, - void *owner, struct folio **foliop); + struct folio **foliop); /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 1c0a58279db9..8520c1d1b21b 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -786,7 +786,7 @@ static int dmirror_exclusive(struct dmirror *dmirror, struct folio *folio; struct page *page; - page = make_device_exclusive(mm, addr, &folio, NULL); + page = make_device_exclusive(mm, addr, &folio); if (IS_ERR(page)) { ret = PTR_ERR(page); break; diff --git a/mm/rmap.c b/mm/rmap.c index 4acc9f6d743a..d99dbf59adc6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2397,7 +2397,6 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) * make_device_exclusive() - Mark an address for exclusive use by a device * @mm: mm_struct of associated target process * @addr: the virtual address to mark for exclusive device access - * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering * @foliop: folio pointer will be stored here on success. * * This function looks up the page mapped at the given address, grabs a @@ -2421,7 +2420,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) * Returns: pointer to mapped page on success, otherwise a negative error. */ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, - void *owner, struct folio **foliop) + struct folio **foliop) { struct folio *folio, *fw_folio; struct vm_area_struct *vma; -- 2.48.1
David Hildenbrand
2025-Jan-29 11:58 UTC
[PATCH v1 3/4] mm/memory: pass folio and pte to restore_exclusive_pte()
Let's pass the folio and the pte to restore_exclusive_pte(), so we can avoid repeated page_folio() and ptep_get(). To do that, pass the pte to try_restore_exclusive_pte() and use a folio in there already. While at it, just avoid the "swp_entry_t entry" variable in try_restore_exclusive_pte() and add a folio-locked check to restore_exclusive_pte(). Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/memory.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index cd689cd8a7c8..46956994aaff 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -719,14 +719,13 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma, #endif static void restore_exclusive_pte(struct vm_area_struct *vma, - struct page *page, unsigned long address, - pte_t *ptep) + struct folio *folio, struct page *page, unsigned long address, + pte_t *ptep, pte_t orig_pte) { - struct folio *folio = page_folio(page); - pte_t orig_pte; pte_t pte; - orig_pte = ptep_get(ptep); + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); + pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(orig_pte)) pte = pte_mksoft_dirty(pte); @@ -756,16 +755,15 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, * Tries to restore an exclusive pte if the page lock can be acquired without * sleeping. */ -static int -try_restore_exclusive_pte(pte_t *src_pte, struct vm_area_struct *vma, - unsigned long addr) +static int try_restore_exclusive_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, pte_t orig_pte) { - swp_entry_t entry = pte_to_swp_entry(ptep_get(src_pte)); - struct page *page = pfn_swap_entry_to_page(entry); + struct page *page = pfn_swap_entry_to_page(pte_to_swp_entry(orig_pte)); + struct folio *folio = page_folio(page); - if (trylock_page(page)) { - restore_exclusive_pte(vma, page, addr, src_pte); - unlock_page(page); + if (folio_trylock(folio)) { + restore_exclusive_pte(vma, folio, page, addr, ptep, orig_pte); + folio_unlock(folio); return 0; } @@ -871,7 +869,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, * (ie. COW) mappings. */ VM_BUG_ON(!is_cow_mapping(src_vma->vm_flags)); - if (try_restore_exclusive_pte(src_pte, src_vma, addr)) + if (try_restore_exclusive_pte(src_vma, addr, src_pte, orig_pte)) return -EBUSY; return -ENOENT; } else if (is_pte_marker_entry(entry)) { @@ -3979,7 +3977,8 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); if (likely(vmf->pte && pte_same(ptep_get(vmf->pte), vmf->orig_pte))) - restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte); + restore_exclusive_pte(vma, folio, vmf->page, vmf->address, + vmf->pte, vmf->orig_pte); if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); -- 2.48.1
David Hildenbrand
2025-Jan-29 11:58 UTC
[PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
Let's document how this function is to be used, and why the requirement for the folio lock might maybe be dropped in the future. Signed-off-by: David Hildenbrand <david at redhat.com> --- mm/memory.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 46956994aaff..caaae8df11a9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -718,6 +718,31 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma, } #endif +/** + * restore_exclusive_pte - Restore a device-exclusive entry + * @vma: VMA covering @address + * @folio: the mapped folio + * @page: the mapped folio page + * @address: the virtual address + * @ptep: PTE pointer into the locked page table mapping the folio page + * @orig_pte: PTE value at @ptep + * + * Restore a device-exclusive non-swap entry to an ordinary present PTE. + * + * The folio and the page table must be locked, and MMU notifiers must have + * been called to invalidate any (exclusive) device mappings. In case of + * fork(), MMU_NOTIFY_PROTECTION_PAGE is triggered, and in case of a page + * fault MMU_NOTIFY_EXCLUSIVE is triggered. + * + * Locking the folio makes sure that anybody who just converted the PTE to + * a device-private entry can map it into the device, before unlocking it; so + * the folio lock prevents concurrent conversion to device-exclusive. + * + * TODO: the folio lock does not protect against all cases of concurrent + * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers + * must already use MMU notifiers to sync against any concurrent changes + * Maybe the requirement for the folio lock can be dropped in the future. + */ static void restore_exclusive_pte(struct vm_area_struct *vma, struct folio *folio, struct page *page, unsigned long address, pte_t *ptep, pte_t orig_pte) -- 2.48.1