Ralph Campbell
2019-Jul-23 21:05 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
The hmm_mirror_ops callback function sync_cpu_device_pagetables() passes a struct hmm_update which is a simplified version of struct mmu_notifier_range. This is unnecessary so replace hmm_update with mmu_notifier_range directly. Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> Cc: "Jérôme Glisse" <jglisse at redhat.com> Cc: Jason Gunthorpe <jgg at mellanox.com> Cc: Christoph Hellwig <hch at lst.de> Cc: Ben Skeggs <bskeggs at redhat.com> --- This is based on 5.3.0-rc1 plus Christoph Hellwig's 6 patches ("hmm_range_fault related fixes and legacy API removal v2"). Jason, I believe this is the patch you were requesting. drivers/gpu/drm/nouveau/nouveau_svm.c | 4 ++-- include/linux/hmm.h | 31 ++++----------------------- mm/hmm.c | 13 ++++------- 3 files changed, 10 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index a9c5c58d425b..6298d2dadb55 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -252,13 +252,13 @@ nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit) static int nouveau_svmm_sync_cpu_device_pagetables(struct hmm_mirror *mirror, - const struct hmm_update *update) + const struct mmu_notifier_range *update) { struct nouveau_svmm *svmm = container_of(mirror, typeof(*svmm), mirror); unsigned long start = update->start; unsigned long limit = update->end; - if (!update->blockable) + if (!mmu_notifier_range_blockable(update)) return -EAGAIN; SVMM_DBG(svmm, "invalidate %016lx-%016lx", start, limit); diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 9f32586684c9..659e25a15700 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -340,29 +340,6 @@ static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range, struct hmm_mirror; -/* - * enum hmm_update_event - type of update - * @HMM_UPDATE_INVALIDATE: invalidate range (no indication as to why) - */ -enum hmm_update_event { - HMM_UPDATE_INVALIDATE, -}; - -/* - * struct hmm_update - HMM update information for callback - * - * @start: virtual start address of the range to update - * @end: virtual end address of the range to update - * @event: event triggering the update (what is happening) - * @blockable: can the callback block/sleep ? - */ -struct hmm_update { - unsigned long start; - unsigned long end; - enum hmm_update_event event; - bool blockable; -}; - /* * struct hmm_mirror_ops - HMM mirror device operations callback * @@ -383,9 +360,9 @@ struct hmm_mirror_ops { /* sync_cpu_device_pagetables() - synchronize page tables * * @mirror: pointer to struct hmm_mirror - * @update: update information (see struct hmm_update) - * Return: -EAGAIN if update.blockable false and callback need to - * block, 0 otherwise. + * @update: update information (see struct mmu_notifier_range) + * Return: -EAGAIN if mmu_notifier_range_blockable(update) is false + * and callback needs to block, 0 otherwise. * * This callback ultimately originates from mmu_notifiers when the CPU * page table is updated. The device driver must update its page table @@ -397,7 +374,7 @@ struct hmm_mirror_ops { * synchronous call. */ int (*sync_cpu_device_pagetables)(struct hmm_mirror *mirror, - const struct hmm_update *update); + const struct mmu_notifier_range *update); }; /* diff --git a/mm/hmm.c b/mm/hmm.c index 16b6731a34db..b810a4fa3de9 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -165,7 +165,6 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn, { struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier); struct hmm_mirror *mirror; - struct hmm_update update; struct hmm_range *range; unsigned long flags; int ret = 0; @@ -173,15 +172,10 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn, if (!kref_get_unless_zero(&hmm->kref)) return 0; - update.start = nrange->start; - update.end = nrange->end; - update.event = HMM_UPDATE_INVALIDATE; - update.blockable = mmu_notifier_range_blockable(nrange); - spin_lock_irqsave(&hmm->ranges_lock, flags); hmm->notifiers++; list_for_each_entry(range, &hmm->ranges, list) { - if (update.end < range->start || update.start >= range->end) + if (nrange->end < range->start || nrange->start >= range->end) continue; range->valid = false; @@ -198,9 +192,10 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn, list_for_each_entry(mirror, &hmm->mirrors, list) { int rc; - rc = mirror->ops->sync_cpu_device_pagetables(mirror, &update); + rc = mirror->ops->sync_cpu_device_pagetables(mirror, nrange); if (rc) { - if (WARN_ON(update.blockable || rc != -EAGAIN)) + if (WARN_ON(mmu_notifier_range_blockable(nrange) || + rc != -EAGAIN)) continue; ret = -EAGAIN; break; -- 2.20.1
Christoph Hellwig
2019-Jul-24 07:05 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
Looks good: Reviewed-by: Christoph Hellwig <hch at lst.de> One comment on a related cleanup:> list_for_each_entry(mirror, &hmm->mirrors, list) { > int rc; > > - rc = mirror->ops->sync_cpu_device_pagetables(mirror, &update); > + rc = mirror->ops->sync_cpu_device_pagetables(mirror, nrange); > if (rc) { > - if (WARN_ON(update.blockable || rc != -EAGAIN)) > + if (WARN_ON(mmu_notifier_range_blockable(nrange) || > + rc != -EAGAIN)) > continue; > ret = -EAGAIN; > break;This magic handling of error seems odd. I think we should merge rc and ret into one variable and just break out if any error happens instead or claiming in the comments -EAGAIN is the only valid error and then ignoring all others here.
Jason Gunthorpe
2019-Jul-24 15:28 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
On Wed, Jul 24, 2019 at 09:05:53AM +0200, Christoph Hellwig wrote:> Looks good: > > Reviewed-by: Christoph Hellwig <hch at lst.de> > > One comment on a related cleanup: > > > list_for_each_entry(mirror, &hmm->mirrors, list) { > > int rc; > > > > - rc = mirror->ops->sync_cpu_device_pagetables(mirror, &update); > > + rc = mirror->ops->sync_cpu_device_pagetables(mirror, nrange); > > if (rc) { > > - if (WARN_ON(update.blockable || rc != -EAGAIN)) > > + if (WARN_ON(mmu_notifier_range_blockable(nrange) || > > + rc != -EAGAIN)) > > continue; > > ret = -EAGAIN; > > break; > > This magic handling of error seems odd. I think we should merge rc and > ret into one variable and just break out if any error happens instead > or claiming in the comments -EAGAIN is the only valid error and then > ignoring all others here.The WARN_ON is enforcing the rules already commented near mmuu_notifier_ops.invalidate_start - we could break or continue, it doesn't much matter how to recover from a broken driver, but since we did the WARN_ON this should sanitize the ret to EAGAIN or 0 Humm. Actually having looked this some more, I wonder if this is a problem: I see in __oom_reap_task_mm(): if (mmu_notifier_invalidate_range_start_nonblock(&range)) { tlb_finish_mmu(&tlb, range.start, range.end); ret = false; continue; } unmap_page_range(&tlb, vma, range.start, range.end, NULL); mmu_notifier_invalidate_range_end(&range); Which looks like it creates an unbalanced start/end pairing if any start returns EAGAIN? This does not seem OK.. Many users require start/end to be paired to keep track of their internal locking. Ie for instance hmm breaks because the hmm->notifiers counter becomes unable to get to 0. Below is the best idea I've had so far.. Michal, what do you think?>From 53638cd1cb02e65e670c5d4edfd36d067bb48912 Mon Sep 17 00:00:00 2001From: Jason Gunthorpe <jgg at mellanox.com> Date: Wed, 24 Jul 2019 12:15:40 -0300 Subject: [PATCH] mm/mmu_notifiers: ensure invalidate_start and invalidate_end occur in pairs Many callers of mmu_notifiers invalidate_range callbacks maintain locking/counters/etc on a paired basis and have long expected that invalidate_range start/end are always paired. The recent change to add non-blocking notifiers breaks this assumption as an EAGAIN return from any notifier causes all notifiers to get their invalidate_range_end() skipped. If there is only a single mmu notifier in the list, this may work OK as the single subscriber may assume that the end is not called when EAGAIN is returned, however if there are multiple subcribers then there is no way for a notifier that succeeds to recover if another in the list triggers EAGAIN and causes the expected end to be skipped. Due to the RCU locking we can't reliably generate a subset of the linked list representing the notifiers already called, so the best option is to call all notifiers in the start path (even if EAGAIN is detected), and again in the error path to ensure there is proper pairing. Users that care about start/end pairing must be (re)written so that an EAGAIN return from their start method expects the end method to be called. Since incorect return codes will now cause a functional problem, add a WARN_ON to detect buggy users. RFC: Need to audit/fix callers to ensure they order their EAGAIN returns properly. hmm is OK, ODP is not. Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Jason Gunthorpe <jgg at mellanox.com> --- mm/mmu_notifier.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index b5670620aea0fc..7d8eca35f1627a 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -176,6 +176,7 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) if (mn->ops->invalidate_range_start) { int _ret = mn->ops->invalidate_range_start(mn, range); if (_ret) { + WARN_ON(mmu_notifier_range_blockable(range) || rc != -EAGAIN); pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !mmu_notifier_range_blockable(range) ? "non-" : ""); @@ -183,6 +184,19 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) } } } + + if (unlikely(ret)) { + /* + * If we hit an -EAGAIN then we have to create a paired + * range_end for the above range_start. Callers must be + * arranged so that they can handle the range_end even if the + * range_start returns EAGAIN. + */ + hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) + if (mn->ops->invalidate_range_end) + mn->ops->invalidate_range_end(mn, range); + } + srcu_read_unlock(&srcu, id); return ret; -- 2.22.0
Jason Gunthorpe
2019-Jul-25 01:14 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
On Tue, Jul 23, 2019 at 02:05:06PM -0700, Ralph Campbell wrote:> The hmm_mirror_ops callback function sync_cpu_device_pagetables() passes > a struct hmm_update which is a simplified version of struct > mmu_notifier_range. This is unnecessary so replace hmm_update with > mmu_notifier_range directly. > > Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> > Cc: "Jérôme Glisse" <jglisse at redhat.com> > Cc: Jason Gunthorpe <jgg at mellanox.com> > Cc: Christoph Hellwig <hch at lst.de> > Cc: Ben Skeggs <bskeggs at redhat.com> > > This is based on 5.3.0-rc1 plus Christoph Hellwig's 6 patches > ("hmm_range_fault related fixes and legacy API removal v2"). > Jason, I believe this is the patch you were requesting.Doesn't this need revision to include amgpu? drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c: .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_gfx, drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c: .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_hsa, Thanks, Jason
Ralph Campbell
2019-Jul-25 17:53 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
On 7/24/19 6:14 PM, Jason Gunthorpe wrote:> On Tue, Jul 23, 2019 at 02:05:06PM -0700, Ralph Campbell wrote: >> The hmm_mirror_ops callback function sync_cpu_device_pagetables() passes >> a struct hmm_update which is a simplified version of struct >> mmu_notifier_range. This is unnecessary so replace hmm_update with >> mmu_notifier_range directly. >> >> Signed-off-by: Ralph Campbell <rcampbell at nvidia.com> >> Cc: "Jérôme Glisse" <jglisse at redhat.com> >> Cc: Jason Gunthorpe <jgg at mellanox.com> >> Cc: Christoph Hellwig <hch at lst.de> >> Cc: Ben Skeggs <bskeggs at redhat.com> >> >> This is based on 5.3.0-rc1 plus Christoph Hellwig's 6 patches >> ("hmm_range_fault related fixes and legacy API removal v2"). >> Jason, I believe this is the patch you were requesting. > > Doesn't this need revision to include amgpu? > > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c: .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_gfx, > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c: .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_hsa, > > Thanks, > Jason >Yes. I have added this to v2 which I'll send out with Christoph's 2 patches and the hmm_range.vma removal patch you suggested.
Reasonably Related Threads
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range