Dave Chinner
2023-Jul-26 23:09 UTC
[PATCH v2 44/47] mm: shrinker: make global slab shrink lockless
On Wed, Jul 26, 2023 at 05:14:09PM +0800, Qi Zheng wrote:> On 2023/7/26 16:08, Dave Chinner wrote: > > On Mon, Jul 24, 2023 at 05:43:51PM +0800, Qi Zheng wrote: > > > @@ -122,6 +126,13 @@ void shrinker_free_non_registered(struct shrinker *shrinker); > > > void shrinker_register(struct shrinker *shrinker); > > > void shrinker_unregister(struct shrinker *shrinker); > > > +static inline bool shrinker_try_get(struct shrinker *shrinker) > > > +{ > > > + return READ_ONCE(shrinker->registered) && > > > + refcount_inc_not_zero(&shrinker->refcount); > > > +} > > > > Why do we care about shrinker->registered here? If we don't set > > the refcount to 1 until we have fully initialised everything, then > > the shrinker code can key entirely off the reference count and > > none of the lookup code needs to care about whether the shrinker is > > registered or not. > > The purpose of checking shrinker->registered here is to stop running > shrinker after calling shrinker_free(), which can prevent the following > situations from happening: > > CPU 0 CPU 1 > > shrinker_try_get() > > shrinker_try_get() > > shrinker_put() > shrinker_try_get() > shrinker_put()I don't see any race here? What is wrong with having multiple active users at once?> > > > This should use a completion, then it is always safe under > > rcu_read_lock(). This also gets rid of the shrinker_lock spin lock, > > which only exists because we can't take a blocking lock under > > rcu_read_lock(). i.e: > > > > > > void shrinker_put(struct shrinker *shrinker) > > { > > if (refcount_dec_and_test(&shrinker->refcount)) > > complete(&shrinker->done); > > } > > > > void shrinker_free() > > { > > ..... > > refcount_dec(&shrinker->refcount); > > I guess what you mean is shrinker_put(), because here may be the last > refcount.Yes, I did.> > wait_for_completion(&shrinker->done); > > /* > > * lookups on the shrinker will now all fail as refcount has > > * fallen to zero. We can now remove it from the lists and > > * free it. > > */ > > down_write(shrinker_rwsem); > > list_del_rcu(&shrinker->list); > > up_write(&shrinker_rwsem); > > call_rcu(shrinker->rcu_head, shrinker_free_rcu_cb); > > } > > > > .... > > > > > @@ -686,11 +711,14 @@ EXPORT_SYMBOL(shrinker_free_non_registered); > > > void shrinker_register(struct shrinker *shrinker) > > > { > > > - down_write(&shrinker_rwsem); > > > - list_add_tail(&shrinker->list, &shrinker_list); > > > - shrinker->flags |= SHRINKER_REGISTERED; > > > + refcount_set(&shrinker->refcount, 1); > > > + > > > + spin_lock(&shrinker_lock); > > > + list_add_tail_rcu(&shrinker->list, &shrinker_list); > > > + spin_unlock(&shrinker_lock); > > > + > > > shrinker_debugfs_add(shrinker); > > > - up_write(&shrinker_rwsem); > > > + WRITE_ONCE(shrinker->registered, true); > > > } > > > EXPORT_SYMBOL(shrinker_register); > > > > This just looks wrong - you are trying to use WRITE_ONCE() as a > > release barrier to indicate that the shrinker is now set up fully. > > That's not necessary - the refcount is an atomic and along with the > > rcu locks they should provides all the barriers we need. i.e. > > The reason I used WRITE_ONCE() here is because the shrinker->registered > will be read and written concurrently (read in shrinker_try_get() and > written in shrinker_free()), which is why I added shrinker::registered > field instead of using SHRINKER_REGISTERED flag (this can reduce the > addition of WRITE_ONCE()/READ_ONCE()).Using WRITE_ONCE/READ_ONCE doesn't provide memory barriers needed to use the field like this. You need release/acquire memory ordering here. i.e. smp_store_release()/smp_load_acquire(). As it is, the refcount_inc_not_zero() provides a control dependency, as documented in include/linux/refcount.h, refcount_dec_and_test() provides release memory ordering. The only thing I think we may need is a write barrier before refcount_set(), such that if refcount_inc_not_zero() sees a non-zero value, it is guaranteed to see an initialised structure... i.e. refcounts provide all the existence and initialisation guarantees. Hence I don't see the need to use shrinker->registered like this and it can remain a bit flag protected by the shrinker_rwsem().> > void shrinker_register(struct shrinker *shrinker) > > { > > down_write(&shrinker_rwsem); > > list_add_tail_rcu(&shrinker->list, &shrinker_list); > > shrinker->flags |= SHRINKER_REGISTERED; > > shrinker_debugfs_add(shrinker); > > up_write(&shrinker_rwsem); > > > > /* > > * now the shrinker is fully set up, take the first > > * reference to it to indicate that lookup operations are > > * now allowed to use it via shrinker_try_get(). > > */ > > refcount_set(&shrinker->refcount, 1); > > } > > > > > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c > > > index f1becfd45853..c5573066adbf 100644 > > > --- a/mm/shrinker_debug.c > > > +++ b/mm/shrinker_debug.c > > > @@ -5,6 +5,7 @@ > > > #include <linux/seq_file.h> > > > #include <linux/shrinker.h> > > > #include <linux/memcontrol.h> > > > +#include <linux/rculist.h> > > > /* defined in vmscan.c */ > > > extern struct rw_semaphore shrinker_rwsem; > > > @@ -161,17 +162,21 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > > > { > > > struct dentry *entry; > > > char buf[128]; > > > - int id; > > > - > > > - lockdep_assert_held(&shrinker_rwsem); > > > + int id, ret = 0; > > > /* debugfs isn't initialized yet, add debugfs entries later. */ > > > if (!shrinker_debugfs_root) > > > return 0; > > > + down_write(&shrinker_rwsem); > > > + if (shrinker->debugfs_entry) > > > + goto fail; > > > + > > > id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL); > > > - if (id < 0) > > > - return id; > > > + if (id < 0) { > > > + ret = id; > > > + goto fail; > > > + } > > > shrinker->debugfs_id = id; > > > snprintf(buf, sizeof(buf), "%s-%d", shrinker->name, id); > > > @@ -180,7 +185,8 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > > > entry = debugfs_create_dir(buf, shrinker_debugfs_root); > > > if (IS_ERR(entry)) { > > > ida_free(&shrinker_debugfs_ida, id); > > > - return PTR_ERR(entry); > > > + ret = PTR_ERR(entry); > > > + goto fail; > > > } > > > shrinker->debugfs_entry = entry; > > > @@ -188,7 +194,10 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > > > &shrinker_debugfs_count_fops); > > > debugfs_create_file("scan", 0220, entry, shrinker, > > > &shrinker_debugfs_scan_fops); > > > - return 0; > > > + > > > +fail: > > > + up_write(&shrinker_rwsem); > > > + return ret; > > > } > > > int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) > > > @@ -243,6 +252,11 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker, > > > shrinker->name = NULL; > > > *debugfs_id = entry ? shrinker->debugfs_id : -1; > > > + /* > > > + * Ensure that shrinker->registered has been set to false before > > > + * shrinker->debugfs_entry is set to NULL. > > > + */ > > > + smp_wmb(); > > > shrinker->debugfs_entry = NULL; > > > return entry; > > > @@ -266,14 +280,26 @@ static int __init shrinker_debugfs_init(void) > > > shrinker_debugfs_root = dentry; > > > /* Create debugfs entries for shrinkers registered at boot */ > > > - down_write(&shrinker_rwsem); > > > - list_for_each_entry(shrinker, &shrinker_list, list) > > > + rcu_read_lock(); > > > + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { > > > + if (!shrinker_try_get(shrinker)) > > > + continue; > > > + rcu_read_unlock(); > > > + > > > if (!shrinker->debugfs_entry) { > > > - ret = shrinker_debugfs_add(shrinker); > > > - if (ret) > > > - break; > > > + /* Paired with smp_wmb() in shrinker_debugfs_detach() */ > > > + smp_rmb(); > > > + if (READ_ONCE(shrinker->registered)) > > > + ret = shrinker_debugfs_add(shrinker); > > > } > > > - up_write(&shrinker_rwsem); > > > + > > > + rcu_read_lock(); > > > + shrinker_put(shrinker); > > > + > > > + if (ret) > > > + break; > > > + } > > > + rcu_read_unlock(); > > > return ret; > > > } > > > > And all this churn and complexity can go away because the > > shrinker_rwsem is still used to protect shrinker_register() > > entirely.... > > My consideration is that during this process, there may be a > driver probe failure and then shrinker_free() is called (the > shrinker_debugfs_init() is called in late_initcall stage). In > this case, we need to use RCU+refcount to ensure that the shrinker > is not freed.Yeah, you're trying to work around the lack of a wait_for_completion() call in shrinker_free(). With that, this doesn't need RCU at all, and the iteration can be done fully under the shrinker_rwsem() safely and so none of this code needs to change. Cheers, Dave. -- Dave Chinner david at fromorbit.com