Christian König
2022-Jun-09 14:29 UTC
[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
Am 09.06.22 um 16:21 schrieb Michal Hocko:> On Thu 09-06-22 16:10:33, Christian K?nig wrote: >> Am 09.06.22 um 14:57 schrieb Michal Hocko: >>> On Thu 09-06-22 14:16:56, Christian K?nig wrote: >>>> Am 09.06.22 um 11:18 schrieb Michal Hocko: >>>>> On Tue 31-05-22 11:59:57, Christian K?nig wrote: >>>>>> This gives the OOM killer an additional hint which processes are >>>>>> referencing shmem files with potentially no other accounting for them. >>>>>> >>>>>> Signed-off-by: Christian K?nig <christian.koenig at amd.com> >>>>>> --- >>>>>> mm/shmem.c | 6 ++++++ >>>>>> 1 file changed, 6 insertions(+) >>>>>> >>>>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>>>> index 4b2fea33158e..a4ad92a16968 100644 >>>>>> --- a/mm/shmem.c >>>>>> +++ b/mm/shmem.c >>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file, >>>>>> return inflated_addr; >>>>>> } >>>>>> +static long shmem_oom_badness(struct file *file) >>>>>> +{ >>>>>> + return i_size_read(file_inode(file)) >> PAGE_SHIFT; >>>>>> +} >>>>> This doesn't really represent the in memory size of the file, does it? >>>> Well the file could be partially or fully swapped out as anonymous memory or >>>> the address space only sparse populated, but even then just using the file >>>> size as OOM badness sounded like the most straightforward approach to me. >>> It covers hole as well, right? >> Yes, exactly. > So let's say I have a huge sparse shmem file. I will get killed because > the oom_badness of such a file would be large as well...Yes, correct. But I of hand don't see how we could improve that accounting.>>>> What could happen is that the file is also mmaped and we double account. >>>> >>>>> Also the memcg oom handling could be considerably skewed if the file was >>>>> shared between more memcgs. >>>> Yes, and that's one of the reasons why I didn't touched the memcg by this >>>> and only affected the classic OOM killer. >>> oom_badness is for all oom handlers, including memcg. Maybe I have >>> misread an earlier patch but I do not see anything specific to global >>> oom handling. >> As far as I can see the oom_badness() function is only used in >> oom_kill.c and in procfs to return the oom score. Did I missed >> something? > oom_kill.c implements most of the oom killer functionality. Memcg oom > killing is a part of that. Have a look at select_bad_process.Ah! So mem_cgroup_scan_tasks() calls oom_evaluate_task for each task in the control group. Thanks for pointing that out, that was absolutely not obvious to me. Is that a show stopper? How should we address this? Christian.
Michal Hocko
2022-Jun-09 15:07 UTC
[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
On Thu 09-06-22 16:29:46, Christian K?nig wrote: [...]> Is that a show stopper? How should we address this?This is a hard problem to deal with and I am not sure this simple solution is really a good fit. Not only because of the memcg side of things. I have my doubts that sparse files handling is ok as well. I do realize this is a long term problem and there is a demand for some solution at least. I am not sure how to deal with shared resources myself. The best approximation I can come up with is to limit the scope of the damage into a memcg context. One idea I was playing with (but never convinced myself it is really a worth) is to allow a new mode of the oom victim selection for the global oom event. It would be an opt in and the victim would be selected from the biggest leaf memcg (or kill the whole memcg if it has group_oom configured. That would address at least some of the accounting issue because charges are better tracked than per process memory consumption. It is a crude and ugly hack and it doesn't solve the underlying problem as shared resources are not guaranteed to be freed when processes die but maybe it would be just slightly better than the existing scheme which is clearly lacking behind existing userspace. -- Michal Hocko SUSE Labs
Christian König
2022-Jun-10 10:58 UTC
[Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
Am 09.06.22 um 17:07 schrieb Michal Hocko:> On Thu 09-06-22 16:29:46, Christian K?nig wrote: > [...] >> Is that a show stopper? How should we address this? > This is a hard problem to deal with and I am not sure this simple > solution is really a good fit. Not only because of the memcg side of > things. I have my doubts that sparse files handling is ok as well.Well I didn't claimed that this would be easy, we juts need to start somewhere. Regarding the sparse file handling, how about using file->f_mapping->nrpages as badness for shmem files? That should give us the real number of pages allocated through this shmem file and gracefully handles sparse files.> I do realize this is a long term problem and there is a demand for some > solution at least. I am not sure how to deal with shared resources > myself. The best approximation I can come up with is to limit the scope > of the damage into a memcg context. One idea I was playing with (but > never convinced myself it is really a worth) is to allow a new mode of > the oom victim selection for the global oom event. It would be an opt in > and the victim would be selected from the biggest leaf memcg (or kill > the whole memcg if it has group_oom configured. > > That would address at least some of the accounting issue because charges > are better tracked than per process memory consumption. It is a crude > and ugly hack and it doesn't solve the underlying problem as shared > resources are not guaranteed to be freed when processes die but maybe it > would be just slightly better than the existing scheme which is clearly > lacking behind existing userspace.Well, what is so bad at the approach of giving each process holding a reference to some shared memory it's equal amount of badness even when the processes belong to different memory control groups? If you really think that this would be a hard problem for upstreaming we could as well keep the behavior for memcg as it is for now. We would just need to adjust the paramters to oom_badness() a bit. Regards, Christian.