thr3ads.net - Nouveau - [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files [Jun 2022]

If this information is useful, please help other people find it:
Share via:

Christian König

2022-Jun-10 12:17 UTC

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

Am 10.06.22 um 13:44 schrieb Michal Hocko:> On Fri 10-06-22 12:58:53, Christian K?nig wrote:
> [SNIP]
>>> I do realize this is a long term problem and there is a demand for
some
>>> solution at least. I am not sure how to deal with shared resources
>>> myself. The best approximation I can come up with is to limit the
scope
>>> of the damage into a memcg context. One idea I was playing with
(but
>>> never convinced myself it is really a worth) is to allow a new mode
of
>>> the oom victim selection for the global oom event.
> And just for the clarity. I have mentioned global oom event here but the
> concept could be extended to per-memcg oom killer as well.
Then what exactly do you mean with "limiting the scope of the damage"?
Cause that doesn't make sense without memcg.
>>> It would be an opt in
>>> and the victim would be selected from the biggest leaf memcg (or
kill
>>> the whole memcg if it has group_oom configured.
>>>
>>> That would address at least some of the accounting issue because
charges
>>> are better tracked than per process memory consumption. It is a
crude
>>> and ugly hack and it doesn't solve the underlying problem as
shared
>>> resources are not guaranteed to be freed when processes die but
maybe it
>>> would be just slightly better than the existing scheme which is
clearly
>>> lacking behind existing userspace.
>> Well, what is so bad at the approach of giving each process holding a
>> reference to some shared memory it's equal amount of badness even
when the
>> processes belong to different memory control groups?
> I am not claiming this is wrong per se. It is just an approximation and
> it can surely be wrong in some cases (e.g. in those workloads where the
> share memory is mostly owned by one process while the shared content is
> consumed by many).
Yeah, completely agree. Basically we can only do an educated guess.

Key point is that we should do the most educated guess we can and not 
just try to randomly kill something until we hit the right target. 
That's essentially what's happening today.
> The primary question is whether it actually helps much or what kind of
> scenarios it can help with and whether we can actually do better for
> those.
Well, it does help massively with a standard Linux desktop and GPU 
workloads (e.g. games).

See what currently happens is that when games allocate for example 
textures the memory for that is not accounted against that game. Instead 
it's usually the display server (X or Wayland) which most of the shared 
resources accounts to because it needs to compose a desktop from it and 
usually also mmaps it for fallback CPU operations.

So what happens when a games over allocates texture resources is that 
your whole desktop restarts because the compositor is killed. This 
obviously also kills the game, but it would be much nice if we would be 
more selective here.

For hardware rendering DMA-buf and GPU drivers are used, but for the 
software fallback shmem files is what is used under the hood as far as I 
know. And the underlying problem is the same for both.
> Also do not forget that shared file memory is not the only thing
> to care about. What about the kernel memory used on behalf of processes?
Yeah, I'm aware of that as well. But at least inside the GPU drivers we 
try to keep that in a reasonable ratio.
> Just consider the above mentioned memcg driven model. It doesn't really
> require to chase specific files and do some arbitrary math to share the
> responsibility. It has a clear accounting and responsibility model.
Ok, how does that work then?
> It shares the same underlying problem that the oom killing is not
> resource aware and therefore there is no guarantee that memory really
> gets freed.  But it allows sane configurations where shared resources do
> not cross memcg boundaries at least. With that in mind and oom_cgroup
> semantic you can get at least some semi-sane guarantees. Is it
> pefect? No, by any means. But I would expect it to be more predictable.
>
> Maybe we can come up with a saner model, but just going with per file
> stats sounds like a hard to predict and debug approach to me. OOM
> killing is a very disruptive operation and having random tasks killed
> just because they have mapped few pages from a shared resource sounds
> like a terrible thing to debug and explain to users.
Well to be honest I think it's much saner than what we do today.

As I said you currently can get any Linux system down within seconds and 
that's basically a perfect deny of service attack.
>> If you really think that this would be a hard problem for upstreaming
we
>> could as well keep the behavior for memcg as it is for now. We would
just
>> need to adjust the paramters to oom_badness() a bit.
> Say we ignore the memcg side of things for now. How does it help long
> term? Special casing the global oom is not all that hard but any future
> change would very likely be disruptive with some semantic implications
> AFAICS.
What else can we do? I mean the desktop instability we are facing is 
really massive.

Regards,
Christian.

Michal Hocko

2022-Jun-10 14:16 UTC

head link

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

On Fri 10-06-22 14:17:27, Christian K?nig wrote:> Am 10.06.22 um 13:44 schrieb Michal Hocko:
> > On Fri 10-06-22 12:58:53, Christian K?nig wrote:
> > [SNIP]
> > > > I do realize this is a long term problem and there is a
demand for some
> > > > solution at least. I am not sure how to deal with shared
resources
> > > > myself. The best approximation I can come up with is to
limit the scope
> > > > of the damage into a memcg context. One idea I was playing
with (but
> > > > never convinced myself it is really a worth) is to allow a
new mode of
> > > > the oom victim selection for the global oom event.
> > And just for the clarity. I have mentioned global oom event here but
the
> > concept could be extended to per-memcg oom killer as well.
> 
> Then what exactly do you mean with "limiting the scope of the
damage"? Cause
> that doesn't make sense without memcg.
What I meant to say is to use the scheme of the damage control
not only to the global oom situation (on the global shortage of memory)
but also to the memcg oom situation (when the hard limit on a hierarchy
is reached).

[...]> > The primary question is whether it actually helps much or what kind of
> > scenarios it can help with and whether we can actually do better for
> > those.
> 
> Well, it does help massively with a standard Linux desktop and GPU
workloads
> (e.g. games).
> 
> See what currently happens is that when games allocate for example textures
> the memory for that is not accounted against that game. Instead it's
usually
> the display server (X or Wayland) which most of the shared resources
> accounts to because it needs to compose a desktop from it and usually also
> mmaps it for fallback CPU operations.
Let me try to understand some more. So the game (or the entity to be
responsible for the resource) doesn't really allocate the memory but it
relies on somebody else (from memcg perspective living in a different
resource domain - i.e. a different memcg) to do that on its behalf.
Correct? If that is the case then that is certainly not fitting into the
memcg model then.
I am not really sure there is any reasonable model where you cannot
really tell who is responsible for the resource.
> So what happens when a games over allocates texture resources is that your
> whole desktop restarts because the compositor is killed. This obviously
also
> kills the game, but it would be much nice if we would be more selective
> here.
> 
> For hardware rendering DMA-buf and GPU drivers are used, but for the
> software fallback shmem files is what is used under the hood as far as I
> know. And the underlying problem is the same for both.
For shmem files the end user of the buffer can preallocate and so own
the buffer and be accounted for it.> 
> > Also do not forget that shared file memory is not the only thing
> > to care about. What about the kernel memory used on behalf of
processes?
> 
> Yeah, I'm aware of that as well. But at least inside the GPU drivers we
try
> to keep that in a reasonable ratio.
> 
> > Just consider the above mentioned memcg driven model. It doesn't
really
> > require to chase specific files and do some arbitrary math to share
the
> > responsibility. It has a clear accounting and responsibility model.
> 
> Ok, how does that work then?
The memory is accounted to whoever faults that memory in or to the
allocating context if that is a kernel memory (in most situations).
-- 
Michal Hocko
SUSE Labs

Christian König

2022-Jun-11 08:06 UTC

head link

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

Am 10.06.22 um 16:16 schrieb Michal Hocko:> [...]
>>> The primary question is whether it actually helps much or what kind
of
>>> scenarios it can help with and whether we can actually do better
for
>>> those.
>> Well, it does help massively with a standard Linux desktop and GPU
workloads
>> (e.g. games).
>>
>> See what currently happens is that when games allocate for example
textures
>> the memory for that is not accounted against that game. Instead
it's usually
>> the display server (X or Wayland) which most of the shared resources
>> accounts to because it needs to compose a desktop from it and usually
also
>> mmaps it for fallback CPU operations.
> Let me try to understand some more. So the game (or the entity to be
> responsible for the resource) doesn't really allocate the memory but it
> relies on somebody else (from memcg perspective living in a different
> resource domain - i.e. a different memcg) to do that on its behalf.
> Correct? If that is the case then that is certainly not fitting into the
> memcg model then.
More or less: yes, that is one possible use case.? But we could leave 
that one out since it is not the primary use case.

What happens more is that 99% of the resources are only allocated per 
process, but around 1% are shared with somebody else.

But see two comments below of a better description of the problem I'm 
facing.
> I am not really sure there is any reasonable model where you cannot
> really tell who is responsible for the resource.
Well it would be fine with me to leave out those 1% of resources shared 
with different memcgs.

What breaks my neck are those 99% which are allocated by a game and 
could potentially be shared but are most of the time not.
>> So what happens when a games over allocates texture resources is that
your
>> whole desktop restarts because the compositor is killed. This obviously
also
>> kills the game, but it would be much nice if we would be more selective
>> here.
>>
>> For hardware rendering DMA-buf and GPU drivers are used, but for the
>> software fallback shmem files is what is used under the hood as far as
I
>> know. And the underlying problem is the same for both.
> For shmem files the end user of the buffer can preallocate and so own
> the buffer and be accounted for it.
The problem is just that it can easily happen that one process is 
allocating the resource and a different one freeing it.

So just imaging the following example: Process opens X window, get 
reference to the handle of the buffer backing this window for drawing, 
tells X to close the window again and then a bit later closes the buffer 
handle.

In this example the X server would be charged allocating the buffer and 
the client (which is most likely in a different memcg group) is charged 
freeing it.

I could of course add something to struct page to track which memcg (or 
process) it was charged against, but extending struct page is most 
likely a no-go.

Alternative I could try to track the "owner" of a buffer (e.g. a shmem
file), but then it can happen that one processes creates the object and 
another one is writing to it and actually allocating the memory.
>>> Also do not forget that shared file memory is not the only thing
>>> to care about. What about the kernel memory used on behalf of
processes?
>> Yeah, I'm aware of that as well. But at least inside the GPU
drivers we try
>> to keep that in a reasonable ratio.
>>
>>> Just consider the above mentioned memcg driven model. It
doesn't really
>>> require to chase specific files and do some arbitrary math to share
the
>>> responsibility. It has a clear accounting and responsibility model.
>> Ok, how does that work then?
> The memory is accounted to whoever faults that memory in or to the
> allocating context if that is a kernel memory (in most situations).
That's what I had in mind as well. Problem with this approach is that 
file descriptors are currently not informed that they are shared between 
processes.

So to make this work we would need something like attach/detach to 
process in struct file_operations.

And as I noted, this happens rather often. For example a game which 
renders 120 frames per second needs to transfer 120 buffers per second 
between client and X.

So this is not something which could take a lot of time and the file 
descriptor tracking structures in the Linux kernel are not made for this 
either.

I think for now I will try something like this specific for DRM drivers. 
That doesn't solve the shmem file problem, but it at least gives me 
something at hand for the accelerated Linux desktop case.

Regards,
Christian.

Nouveau - Jun 2022 - [PATCH 03/13] mm: shmem: provide oom badness for shmem files

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files

[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files