thr3ads.net - Virtualization - [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker [Jun 2022]

If this information is useful, please help other people find it:
Share via:

Rob Clark

2022-Jun-20 15:18 UTC

[PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

On Mon, Jun 20, 2022 at 7:09 AM Dmitry Osipenko
<dmitry.osipenko at collabora.com> wrote:>
> On 6/19/22 20:53, Rob Clark wrote:
> ...
> >> +static unsigned long
> >> +drm_gem_shmem_shrinker_count_objects(struct shrinker *shrinker,
> >> +                                    struct shrink_control *sc)
> >> +{
> >> +       struct drm_gem_shmem_shrinker *gem_shrinker =
to_drm_shrinker(shrinker);
> >> +       struct drm_gem_shmem_object *shmem;
> >> +       unsigned long count = 0;
> >> +
> >> +       if (!mutex_trylock(&gem_shrinker->lock))
> >> +               return 0;
> >> +
> >> +       list_for_each_entry(shmem,
&gem_shrinker->lru_evictable, madv_list) {
> >> +               count += shmem->base.size;
> >> +
> >> +               if (count >= SHRINK_EMPTY)
> >> +                       break;
> >> +       }
> >> +
> >> +       mutex_unlock(&gem_shrinker->lock);
> >
> > As I mentioned on other thread, count_objects, being approximate but
> > lockless and fast is the important thing.  Otherwise when you start
> > hitting the shrinker on many threads, you end up serializing them all,
> > even if you have no pages to return to the system at that point.
>
> Daniel's point for dropping the lockless variant was that we're
already
> in trouble if we're hitting shrinker too often and extra optimizations
> won't bring much benefits to us.
At least with zram swap (which I highly recommend using even if you
are not using a physical swap file/partition), swapin/out is actually
quite fast.  And if you are leaning on zram swap to fit 8GB of chrome
browser on a 4GB device, the shrinker gets hit quite a lot.  Lower
spec (4GB RAM) chromebooks can be under constant memory pressure and
can quite easily get into a situation where you are hitting the
shrinker on many threads simultaneously.  So it is pretty important
for all shrinkers in the system (not just drm driver) to be as
concurrent as possible.  As long as you avoid serializing reclaim on
all the threads, performance can still be quite good, but if you don't
performance will fall off a cliff.

jfwiw, we are seeing pretty good results (iirc 40-70% increase in open
tab counts) with the combination of eviction + multigen LRU[1] +
sizing zram swap to be 2x physical RAM

[1] https://lwn.net/Articles/856931/
> Alright, I'll add back the lockless variant (or will use yours
> drm_gem_lru) in the next revision. The code difference is very small
> after all.
>
> ...
> >> +               /* prevent racing with the dma-buf
importing/exporting */
> >> +               if
(!mutex_trylock(&gem_shrinker->dev->object_name_lock)) {
> >> +                       *lock_contention |= true;
> >> +                       goto resv_unlock;
> >> +               }
> >
> > I'm not sure this is a good idea to serialize on object_name_lock.
> > Purgeable buffers should never be shared (imported or exported).  So
> > at best you are avoiding evicting and immediately swapping back in, in
> > a rare case, at the cost of serializing multiple threads trying to
> > reclaim pages in parallel.
>
> The object_name_lock shouldn't cause contention in practice. But
objects
> are also pinned on attachment, hence maybe this lock is indeed
> unnecessary.. I'll re-check it.
I'm not worried about contention with export/import/etc, but
contention between multiple threads hitting the shrinker in parallel.
I guess since you are using trylock, it won't *block* the other
threads hitting shrinker, but they'll just end up looping in
do_shrink_slab() because they are hitting contention.

I'd have to do some experiments to see how it works out in practice,
but my gut feel is that it isn't a good idea

BR,
-R
> --
> Best regards,
> Dmitry

Daniel Vetter

2022-Jun-24 20:23 UTC

head link

[PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

On Mon, Jun 20, 2022 at 08:18:04AM -0700, Rob Clark
wrote:> On Mon, Jun 20, 2022 at 7:09 AM Dmitry Osipenko
> <dmitry.osipenko at collabora.com> wrote:
> >
> > On 6/19/22 20:53, Rob Clark wrote:
> > ...
> > >> +static unsigned long
> > >> +drm_gem_shmem_shrinker_count_objects(struct shrinker
*shrinker,
> > >> +                                    struct shrink_control
*sc)
> > >> +{
> > >> +       struct drm_gem_shmem_shrinker *gem_shrinker =
to_drm_shrinker(shrinker);
> > >> +       struct drm_gem_shmem_object *shmem;
> > >> +       unsigned long count = 0;
> > >> +
> > >> +       if (!mutex_trylock(&gem_shrinker->lock))
> > >> +               return 0;
> > >> +
> > >> +       list_for_each_entry(shmem,
&gem_shrinker->lru_evictable, madv_list) {
> > >> +               count += shmem->base.size;
> > >> +
> > >> +               if (count >= SHRINK_EMPTY)
> > >> +                       break;
> > >> +       }
> > >> +
> > >> +       mutex_unlock(&gem_shrinker->lock);
> > >
> > > As I mentioned on other thread, count_objects, being approximate
but
> > > lockless and fast is the important thing.  Otherwise when you
start
> > > hitting the shrinker on many threads, you end up serializing them
all,
> > > even if you have no pages to return to the system at that point.
> >
> > Daniel's point for dropping the lockless variant was that
we're already
> > in trouble if we're hitting shrinker too often and extra
optimizations
> > won't bring much benefits to us.
> 
> At least with zram swap (which I highly recommend using even if you
> are not using a physical swap file/partition), swapin/out is actually
> quite fast.  And if you are leaning on zram swap to fit 8GB of chrome
> browser on a 4GB device, the shrinker gets hit quite a lot.  Lower
> spec (4GB RAM) chromebooks can be under constant memory pressure and
> can quite easily get into a situation where you are hitting the
> shrinker on many threads simultaneously.  So it is pretty important
> for all shrinkers in the system (not just drm driver) to be as
> concurrent as possible.  As long as you avoid serializing reclaim on
> all the threads, performance can still be quite good, but if you don't
> performance will fall off a cliff.
> 
> jfwiw, we are seeing pretty good results (iirc 40-70% increase in open
> tab counts) with the combination of eviction + multigen LRU[1] +
> sizing zram swap to be 2x physical RAM
> 
> [1] https://lwn.net/Articles/856931/
> 
> > Alright, I'll add back the lockless variant (or will use yours
> > drm_gem_lru) in the next revision. The code difference is very small
> > after all.
> >
> > ...
> > >> +               /* prevent racing with the dma-buf
importing/exporting */
> > >> +               if
(!mutex_trylock(&gem_shrinker->dev->object_name_lock)) {
> > >> +                       *lock_contention |= true;
> > >> +                       goto resv_unlock;
> > >> +               }
> > >
> > > I'm not sure this is a good idea to serialize on
object_name_lock.
> > > Purgeable buffers should never be shared (imported or exported). 
So
> > > at best you are avoiding evicting and immediately swapping back
in, in
> > > a rare case, at the cost of serializing multiple threads trying
to
> > > reclaim pages in parallel.
> >
> > The object_name_lock shouldn't cause contention in practice. But
objects
> > are also pinned on attachment, hence maybe this lock is indeed
> > unnecessary.. I'll re-check it.
> 
> I'm not worried about contention with export/import/etc, but
> contention between multiple threads hitting the shrinker in parallel.
> I guess since you are using trylock, it won't *block* the other
> threads hitting shrinker, but they'll just end up looping in
> do_shrink_slab() because they are hitting contention.
> 
> I'd have to do some experiments to see how it works out in practice,
> but my gut feel is that it isn't a good idea
Yeah trylock on anything else than the object lock is No Good in the
shrinker. And it really shouldn't be needed, since import/export should
pin stuff as needed. Which should be protected by the dma_resv object
lock. If not, we need to fix that.

Picking a random drm-internal lock like this is definitely no good design.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Virtualization - Jun 2022 - [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

[PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

[PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker