David Hildenbrand
2025-Jan-24 18:15 UTC
[PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault()
One fix and a minor cleanup. Only compile-tested due to lack of HW, so I'd be happy if someone with access to HW could test. But not sure how easy this is to trigger. Likely some concurrent MADV_DONTNEED on the PTE we just converted might be able to trigger it. Cc: Karol Herbst <kherbst at redhat.com> Cc: Lyude Paul <lyude at redhat.com> Cc: Danilo Krummrich <dakr at kernel.org> Cc: David Airlie <airlied at gmail.com> Cc: Simona Vetter <simona at ffwll.ch> Cc: Alistair Popple <apopple at nvidia.com> David Hildenbrand (2): nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) -- 2.47.1
David Hildenbrand
2025-Jan-24 18:15 UTC
[PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range()
In case we have to retry the loop, we are missing to unlock+put the folio. In that case, we will keep failing make_device_exclusive_range() because we cannot grab the folio lock, and even return from the function with the folio locked and referenced, effectively never succeeding the make_device_exclusive_range(). While at it, convert the other unlock+put to use a folio as well. This was found by code inspection. Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access") Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index b4da82ddbb6b2..8ea98f06d39af 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, unsigned long timeout jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); struct mm_struct *mm = svmm->notifier.mm; + struct folio *folio; struct page *page; unsigned long start = args->p.addr; unsigned long notifier_seq; @@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = -EINVAL; goto out; } + folio = page_folio(page); mutex_lock(&svmm->mutex); if (!mmu_interval_read_retry(¬ifier->notifier, notifier_seq)) break; mutex_unlock(&svmm->mutex); + + folio_unlock(folio); + folio_put(folio); } /* Map the page on the GPU. */ @@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); mutex_unlock(&svmm->mutex); - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); out: mmu_interval_notifier_remove(¬ifier->notifier); -- 2.47.1
David Hildenbrand
2025-Jan-24 18:15 UTC
[PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault()
ret will be modified immediately afterwards. Signed-off-by: David Hildenbrand <david at redhat.com> --- drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 8ea98f06d39af..2f8b8b978fc08 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -594,7 +594,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, struct page *page; unsigned long start = args->p.addr; unsigned long notifier_seq; - int ret = 0; + int ret; ret = mmu_interval_notifier_insert(¬ifier->notifier, mm, args->p.addr, args->p.size, -- 2.47.1
Alistair Popple
2025-Jan-28 00:13 UTC
[PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault()
On Fri, Jan 24, 2025 at 07:15:22PM +0100, David Hildenbrand wrote:> One fix and a minor cleanup. > > Only compile-tested due to lack of HW, so I'd be happy if someone with > access to HW could test. But not sure how easy this is to trigger. Likely > some concurrent MADV_DONTNEED on the PTE we just converted might be able > to trigger it.I have this setup so will certainly test it and see if I can trigger the bug as well. Thanks for the fixes.> Cc: Karol Herbst <kherbst at redhat.com> > Cc: Lyude Paul <lyude at redhat.com> > Cc: Danilo Krummrich <dakr at kernel.org> > Cc: David Airlie <airlied at gmail.com> > Cc: Simona Vetter <simona at ffwll.ch> > Cc: Alistair Popple <apopple at nvidia.com> > > David Hildenbrand (2): > nouveau/svm: fix missing folio unlock + put after > make_device_exclusive_range() > nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() > > drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > -- > 2.47.1 >
Reasonably Related Threads
- [PATCH hmm 0/5] Adjust hmm_range_fault() API
- [PATCH v3 0/4] nouveau/hmm: map pages after migration
- [PATCH hmm v2 0/5] Adjust hmm_range_fault() API
- [PATCH v6 0/6] mm/hmm/test: add self tests for HMM
- [PATCH hmm v3 00/14] Consolidate the mmu notifier interval_tree and locking