David Hildenbrand
2025-Jan-30 09:28 UTC
[PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE
On 30.01.25 06:34, Alistair Popple wrote:> On Wed, Jan 29, 2025 at 12:58:00PM +0100, David Hildenbrand wrote: >> We no longer get a MMU_NOTIFY_EXCLUSIVE on conversion with the owner set >> that one has to filter out: if there already *is* a device-exclusive >> entry (e.g., other device, we don't have that information), GUP will >> convert it back to an ordinary PTE and notify via >> remove_device_exclusive_entry(). > > What tree is this against? I tried applying to v6.13 and Linus current master > but neither applied cleanly.See the cover letter. This is on top of the fixes series, which is based on mm-unstable from yesterday.> >> Signed-off-by: David Hildenbrand <david at redhat.com> >> --- >> drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +----- >> include/linux/mmu_notifier.h | 4 +--- >> include/linux/rmap.h | 2 +- >> lib/test_hmm.c | 2 +- >> mm/rmap.c | 3 +-- >> 5 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c >> index 39e3740980bb..4758fee182b4 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c >> @@ -510,10 +510,6 @@ static bool nouveau_svm_range_invalidate(struct mmu_interval_notifier *mni, >> struct svm_notifier *sn >> container_of(mni, struct svm_notifier, notifier); >> >> - if (range->event == MMU_NOTIFY_EXCLUSIVE && >> - range->owner == sn->svmm->vmm->cli->drm->dev) >> - return true; > > I think this will cause a live-lock because make_device_exclusive_range() > will call the notifier which without the filtering will increment the sequence > count and cause endless retries of the loop in nouveau_atomic_range_fault(). > The notifier needs to be able to figure out if it was called in response to > something this thread did (ie. make_device_exclusive_range) and can therefore > ignore the invalidation, or from some other thread.Yes, as discussed in the other patch, this must stay to inform secondary MMUs about the conversion *to* device exclusive.> > Looking at hmm_test I see that doesn't use the sequence counter to ensure > the PTE remains valid whilst it is mapped. I think that is probably wrong, so > apologies if that lead you astray.Yes, the hmm_test does not completely follow the same model the nouveau implementation does; so it might not be completely correct. -- Cheers, David / dhildenb
Simona Vetter
2025-Jan-30 13:29 UTC
[PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE
On Thu, Jan 30, 2025 at 10:28:00AM +0100, David Hildenbrand wrote:> On 30.01.25 06:34, Alistair Popple wrote: > > Looking at hmm_test I see that doesn't use the sequence counter to ensure > > the PTE remains valid whilst it is mapped. I think that is probably wrong, so > > apologies if that lead you astray. > > Yes, the hmm_test does not completely follow the same model the nouveau > implementation does; so it might not be completely correct.But unrelated but just crossed my mind: I guess another crucial difference is that the hw (probably, not sure) will restart the fault if we don't repair it to its liking. So the hmm-test does need some kind of retry loop too somewhere to match that. But might be good to also still land some of the other improvements discussed in these threads to make make_device_exclusive a bit more reliable instead of relying on busy-looping throug the hw fault handler for everything. -Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
David Hildenbrand
2025-Jan-30 15:26 UTC
[PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE
On 30.01.25 14:29, Simona Vetter wrote:> On Thu, Jan 30, 2025 at 10:28:00AM +0100, David Hildenbrand wrote: >> On 30.01.25 06:34, Alistair Popple wrote: >>> Looking at hmm_test I see that doesn't use the sequence counter to ensure >>> the PTE remains valid whilst it is mapped. I think that is probably wrong, so >>> apologies if that lead you astray. >> >> Yes, the hmm_test does not completely follow the same model the nouveau >> implementation does; so it might not be completely correct. > > But unrelated but just crossed my mind: > > I guess another crucial difference is that the hw (probably, not sure) > will restart the fault if we don't repair it to its liking. So the > hmm-test does need some kind of retry loop too somewhere to match that.Yes. Especially for the folio lock spinning is a rather suboptimal approach. So we likely would want the option to just lock it instead of try-locking it. (or getting rid of it entirely :) )> But might be good to also still land some of the other improvements > discussed in these threads to make make_device_exclusive a bit more > reliable instead of relying on busy-looping throug the hw fault handler > for everything.Right. -- Cheers, David / dhildenb