thr3ads.net - Nouveau - [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files [Jun 2022]

If this information is useful, please help other people find it:
Share via:

Christian König

2022-Jun-13 12:55 UTC

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

Am 13.06.22 um 14:11 schrieb Michal Hocko:> [SNIP]
>>>> Alternative I could try to track the "owner" of a
buffer (e.g. a shmem
>>>> file), but then it can happen that one processes creates the
object and
>>>> another one is writing to it and actually allocating the
memory.
>>> If you can enforce that the owner is really responsible for the
>>> allocation then all should be fine. That would require MAP_POPULATE
like
>>> semantic and I suspect this is not really feasible with the
existing
>>> userspace. It would be certainly hard to enforce for bad players.
>> I've tried this today and the result was: "BUG: Bad
rss-counter state
>> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
>>
>> The problem is once more that files are not informed when the process
>> clones. So what happened is that somebody called fork() with an
mm_struct
>> I've accounted my pages to. The result is just that we messed up
the
>> rss_stats and? the the "BUG..." above.
>>
>> The key difference between normal allocated pages and the resources
here is
>> just that we are not bound to an mm_struct in any way.
> It is not really clear to me what exactly you have tried.
I've tried to track the "owner" of a driver connection by keeping
a
reference to the mm_struct who created this connection inside our file 
private and then use add_mm_counter() to account all the allocations of 
the driver to this mm_struct.

This works to the extend that now the right process is killed in an OOM 
situation. The problem with this approach is that the driver is not 
informed about operations like fork() or clone(), so what happens is 
that after a fork()/clone() we have an unbalanced rss-counter.

Let me maybe get back to the initial question: We have resources which 
are not related to the virtual address space of a process, how should we 
tell the OOM killer about them?

Thanks for all the input so far,
Christian.

Michal Hocko

2022-Jun-13 14:11 UTC

head link

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

On Mon 13-06-22 14:55:54, Christian K?nig wrote:> Am 13.06.22 um 14:11 schrieb Michal Hocko:
> > [SNIP]
> > > > > Alternative I could try to track the "owner"
of a buffer (e.g. a shmem
> > > > > file), but then it can happen that one processes
creates the object and
> > > > > another one is writing to it and actually allocating
the memory.
> > > > If you can enforce that the owner is really responsible for
the
> > > > allocation then all should be fine. That would require
MAP_POPULATE like
> > > > semantic and I suspect this is not really feasible with the
existing
> > > > userspace. It would be certainly hard to enforce for bad
players.
> > > I've tried this today and the result was: "BUG: Bad
rss-counter state
> > > mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> > > 
> > > The problem is once more that files are not informed when the
process
> > > clones. So what happened is that somebody called fork() with an
mm_struct
> > > I've accounted my pages to. The result is just that we messed
up the
> > > rss_stats and? the the "BUG..." above.
> > > 
> > > The key difference between normal allocated pages and the
resources here is
> > > just that we are not bound to an mm_struct in any way.
> > It is not really clear to me what exactly you have tried.
> 
> I've tried to track the "owner" of a driver connection by
keeping a
> reference to the mm_struct who created this connection inside our file
> private and then use add_mm_counter() to account all the allocations of the
> driver to this mm_struct.
> 
> This works to the extend that now the right process is killed in an OOM
> situation. The problem with this approach is that the driver is not
informed
> about operations like fork() or clone(), so what happens is that after a
> fork()/clone() we have an unbalanced rss-counter.
Yes, I do not think you can make per-process accounting without a
concept of the per-process ownership.
> Let me maybe get back to the initial question: We have resources which are
> not related to the virtual address space of a process, how should we tell
> the OOM killer about them?
I would say memcg, but we have discussed this already...

I do not think that exposing a resource (in a form of a counter
or something like that) is sufficient. The existing oom killer
implementation is hevily process centric (with memcg extension for
grouping but not changing the overall design in principle). If you
want to make it aware of resources which are not directly accounted to
processes then a a new implementation is necessary IMHO. You would need
to evaluate those resources and kill all the tasks that can hold on that
resource.

This is also the reason why I am not really fan of the per file
badness because it adds a notion of resource that is not process bound
in general so it will add all sorts of weird runtime corner cases which
are impossible to anticipate [*]. Maybe that will work in some scenarios
but definitely not something to be done by default without users opting
into that and being aware of consequences. 

There have been discussions that the existing oom implementation cannot
fit all potential usecases so maybe we need to finally decide to use a
plugable, BPFable etc architecture allow implementations that fit
specific needs.

[*] I know it is not directly related but kinda similar. In the past
we used to have heuristics to consider work done as a resource . That is
kill younger processes preferably to reduce the damage.  This has turned
out to have a very unpredictable behavior and many complains by
users. Situation has improved when the selection was solely based on
rss. This has its own cons of course but at least they are predictable.
-- 
Michal Hocko
SUSE Labs

Christian König

2022-Jun-15 12:35 UTC

head link

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

Am 13.06.22 um 16:11 schrieb Michal Hocko:> [SNIP]
>> Let me maybe get back to the initial question: We have resources which
are
>> not related to the virtual address space of a process, how should we
tell
>> the OOM killer about them?
> I would say memcg, but we have discussed this already...
Well memcg is at least closer to the requirements than the classic 
mm_struct accounting.

It won't work for really shared buffers, but if that's the requirement 
to find some doable solution for the remaining 99% then I can live with 
that.
> I do not think that exposing a resource (in a form of a counter
> or something like that) is sufficient. The existing oom killer
> implementation is hevily process centric (with memcg extension for
> grouping but not changing the overall design in principle). If you
> want to make it aware of resources which are not directly accounted to
> processes then a a new implementation is necessary IMHO. You would need
> to evaluate those resources and kill all the tasks that can hold on that
> resource.
Well the OOM killer is process centric because processes are what you 
can kill.

Even the classic mm_struct based accounting includes MM_SHMEMPAGES into 
the badness. So accounting shared resources as badness to make a 
decision is nothing new here.

The difference is that this time the badness doesn't come from the 
memory management subsystem, but rather from the I/O subsystem.
> This is also the reason why I am not really fan of the per file
> badness because it adds a notion of resource that is not process bound
> in general so it will add all sorts of weird runtime corner cases which
> are impossible to anticipate [*]. Maybe that will work in some scenarios
> but definitely not something to be done by default without users opting
> into that and being aware of consequences.
Would a kernel command line option to control the behavior be helpful here?
> There have been discussions that the existing oom implementation cannot
> fit all potential usecases so maybe we need to finally decide to use a
> plugable, BPFable etc architecture allow implementations that fit
> specific needs.
Yeah, BPF came to my mind as well. But need to talk with out experts on 
that topic first.

When the OOM killer runs allocating more memory is pretty much a no-go 
and I'm not sure what the requirements of running a BPF to find the 
badness are.
> [*] I know it is not directly related but kinda similar. In the past
> we used to have heuristics to consider work done as a resource . That is
> kill younger processes preferably to reduce the damage.  This has turned
> out to have a very unpredictable behavior and many complains by
> users. Situation has improved when the selection was solely based on
> rss. This has its own cons of course but at least they are predictable.
Good to know, thanks.

Regards,
Christian.

Nouveau - Jun 2022 - [PATCH 03/13] mm: shmem: provide oom badness for shmem files

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files