Michal Hocko
2019-Jul-24 18:59 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
On Wed 24-07-19 20:56:17, Michal Hocko wrote:> On Wed 24-07-19 15:08:37, Jason Gunthorpe wrote: > > On Wed, Jul 24, 2019 at 07:58:58PM +0200, Michal Hocko wrote: > [...] > > > Maybe new users have started relying on a new semantic in the meantime, > > > back then, none of the notifier has even started any action in blocking > > > mode on a EAGAIN bailout. Most of them simply did trylock early in the > > > process and bailed out so there was nothing to do for the range_end > > > callback. > > > > Single notifiers are not the problem. I tried to make this clear in > > the commit message, but lets be more explicit. > > > > We have *two* notifiers registered to the mm, A and B: > > > > A invalidate_range_start: (has no blocking) > > spin_lock() > > counter++ > > spin_unlock() > > > > A invalidate_range_end: > > spin_lock() > > counter-- > > spin_unlock() > > > > And this one: > > > > B invalidate_range_start: (has blocking) > > if (!try_mutex_lock()) > > return -EAGAIN; > > counter++ > > mutex_unlock() > > > > B invalidate_range_end: > > spin_lock() > > counter-- > > spin_unlock() > > > > So now the oom path does: > > > > invalidate_range_start_non_blocking: > > for each mn: > > a->invalidate_range_start > > b->invalidate_range_start > > rc = EAGAIN > > > > Now we SKIP A's invalidate_range_end even though A had no idea this > > would happen has state that needs to be unwound. A is broken. > > > > B survived just fine. > > > > A and B *alone* work fine, combined they fail. > > But that requires that they share some state, right? > > > When the commit was landed you can use KVM as an example of A and RDMA > > ODP as an example of B > > Could you point me where those two share the state please? KVM seems to > be using kvm->mmu_notifier_count but I do not know where to look for the > RDMA...Scratch that. ELONGDAY... I can see your point. It is all or nothing that doesn't really work here. Looking back at your patch it seems reasonable but I am not sure what is supposed to be a behavior for notifiers that failed. -- Michal Hocko SUSE Labs
Jason Gunthorpe
2019-Jul-24 19:21 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
On Wed, Jul 24, 2019 at 08:59:10PM +0200, Michal Hocko wrote:> On Wed 24-07-19 20:56:17, Michal Hocko wrote: > > On Wed 24-07-19 15:08:37, Jason Gunthorpe wrote: > > > On Wed, Jul 24, 2019 at 07:58:58PM +0200, Michal Hocko wrote: > > [...] > > > > Maybe new users have started relying on a new semantic in the meantime, > > > > back then, none of the notifier has even started any action in blocking > > > > mode on a EAGAIN bailout. Most of them simply did trylock early in the > > > > process and bailed out so there was nothing to do for the range_end > > > > callback. > > > > > > Single notifiers are not the problem. I tried to make this clear in > > > the commit message, but lets be more explicit. > > > > > > We have *two* notifiers registered to the mm, A and B: > > > > > > A invalidate_range_start: (has no blocking) > > > spin_lock() > > > counter++ > > > spin_unlock() > > > > > > A invalidate_range_end: > > > spin_lock() > > > counter-- > > > spin_unlock() > > > > > > And this one: > > > > > > B invalidate_range_start: (has blocking) > > > if (!try_mutex_lock()) > > > return -EAGAIN; > > > counter++ > > > mutex_unlock() > > > > > > B invalidate_range_end: > > > spin_lock() > > > counter-- > > > spin_unlock() > > > > > > So now the oom path does: > > > > > > invalidate_range_start_non_blocking: > > > for each mn: > > > a->invalidate_range_start > > > b->invalidate_range_start > > > rc = EAGAIN > > > > > > Now we SKIP A's invalidate_range_end even though A had no idea this > > > would happen has state that needs to be unwound. A is broken. > > > > > > B survived just fine. > > > > > > A and B *alone* work fine, combined they fail. > > > > But that requires that they share some state, right? > > > > > When the commit was landed you can use KVM as an example of A and RDMA > > > ODP as an example of B > > > > Could you point me where those two share the state please? KVM seems to > > be using kvm->mmu_notifier_count but I do not know where to look for the > > RDMA... > > Scratch that. ELONGDAY... I can see your point. It is all or nothing > that doesn't really work here. Looking back at your patch it seems > reasonable but I am not sure what is supposed to be a behavior for > notifiers that failed.Okay, good to know I'm not missing something. The idea was the failed notifier would have to handle the mandatory _end callback. I've reflected on it some more, and I have a scheme to be able to 'undo' that is safe against concurrent hlist_del_rcu. If we change the register to keep the hlist sorted by address then we can do a targetted 'undo' of past starts terminated by address less-than comparison of the first failing struct mmu_notifier. It relies on the fact that rcu is only used to remove items, the list adds are all protected by mm locks, and the number of mmu notifiers is very small. This seems workable and does not need more driver review/update... However, hmm's implementation still needs more fixing. Thanks, Jason
Christoph Hellwig
2019-Jul-24 19:48 UTC
[Nouveau] [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
On Wed, Jul 24, 2019 at 04:21:55PM -0300, Jason Gunthorpe wrote:> If we change the register to keep the hlist sorted by address then we > can do a targetted 'undo' of past starts terminated by address > less-than comparison of the first failing struct mmu_notifier. > > It relies on the fact that rcu is only used to remove items, the list > adds are all protected by mm locks, and the number of mmu notifiers is > very small. > > This seems workable and does not need more driver review/update... > > However, hmm's implementation still needs more fixing.Can we take one step back, please? The only reason why drivers implement both ->invalidate_range_start and ->invalidate_range_end and expect them to be called paired is to keep some form of counter of active invalidation "sections". So instead of doctoring around undo schemes the only sane answer is to take such a counter into the core VM code instead of having each driver struggle with it.
Apparently Analagous Threads
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range
- [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range