thr3ads.net - Linux Virtualization - [PATCH v35 1/5] mm: support to get hints of free page blocks [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Linus Torvalds

2018-Jul-11 01:44 UTC

[PATCH v35 1/5] mm: support to get hints of free page blocks

On Tue, Jul 10, 2018 at 6:24 PM Wei Wang <wei.w.wang at intel.com>
wrote:>
> We only get addresses of the "MAX_ORDER-1" blocks into the array.
The
> max size of the array that could be allocated by kmalloc is
> KMALLOC_MAX_SIZE (i.e. 4MB on x86). With that max array, we could load
> "4MB / sizeof(u64)" addresses of "MAX_ORDER-1" blocks,
that is, 2TB free
> memory at most. We thought about removing that 2TB limitation by passing
> in multiple such max arrays (a list of them).
No.

Stop this already./

You're doing everthing wrong.

If the array has to describe *all* memory you will ever free, then you
have already lost.

Just do it in chunks.

I don't want the VM code to even fill in that big of an array anyway -
this all happens under the zone lock, and you're walking a list that
is bad for caching anyway.

So plan on an interface that allows _incremental_ freeing, because any
plan that starts with "I worry that maybe two TERABYTES of memory
isn't big enough" is so broken that it's laughable.

That was what I tried to encourage with actually removing the pages
form the page list. That would be an _incremental_ interface. You can
remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark
them free for ballooning that way. And if you still feel you have tons
of free memory, just continue removing more pages from the free list.

Notice? Incremental. Not "I want to have a crazy array that is enough
to hold 2TB at one time".

So here's the rule:

 - make it a simple array interface

 - make the array *small*. Not megabytes. Kilobytes. Because if you're
filling in megabytes worth of free pointers while holding the zone
lock, you're doing something wrong.

 - design the interface so that you do not *need* to have this crazy
"all or nothing" approach.

See what I'm trying to push for. Think "low latency". Think
"small
arrays". Think "simple and straightforward interfaces".

At no point should you ever worry about "2TB". Never.

           Linus

Michal Hocko

2018-Jul-11 09:21 UTC

head link

[PATCH v35 1/5] mm: support to get hints of free page blocks

On Tue 10-07-18 18:44:34, Linus Torvalds wrote:
[...]> That was what I tried to encourage with actually removing the pages
> form the page list. That would be an _incremental_ interface. You can
> remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark
> them free for ballooning that way. And if you still feel you have tons
> of free memory, just continue removing more pages from the free list.
We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1).
So why do we need any array based interface?
-- 
Michal Hocko
SUSE Labs

Wei Wang

2018-Jul-11 10:52 UTC

head link

[PATCH v35 1/5] mm: support to get hints of free page blocks

On 07/11/2018 05:21 PM, Michal Hocko wrote:> On Tue 10-07-18 18:44:34, Linus Torvalds wrote:
> [...]
>> That was what I tried to encourage with actually removing the pages
>> form the page list. That would be an _incremental_ interface. You can
>> remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark
>> them free for ballooning that way. And if you still feel you have tons
>> of free memory, just continue removing more pages from the free list.
> We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER
-1).
> So why do we need any array based interface?
Yes, I'm trying to get free pages directly via alloc_pages, so there 
will be no new mm APIs.
I plan to let free page allocation stop when the remaining system free 
memory becomes close to min_free_kbytes (prevent swapping).


Best,
Wei

Linus Torvalds

2018-Jul-11 16:23 UTC

head link

[PATCH v35 1/5] mm: support to get hints of free page blocks

On Wed, Jul 11, 2018 at 2:21 AM Michal Hocko <mhocko at kernel.org>
wrote:>
> We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER
-1).
> So why do we need any array based interface?
That was actually my original argument in the original thread - that
the only new interface people might want is one that just tells how
many of those MAX_ORDER-1 pages there are.

See the thread in v33 with the subject

  "[PATCH v33 1/4] mm: add a function to get free page blocks"

and look for me suggesting just using

    #define GFP_MINFLAGS (__GFP_NORETRY | __GFP_NOWARN |
__GFP_THISNODE | __GFP_NOMEMALLOC)

    struct page *page =  alloc_pages(GFP_MINFLAGS, MAX_ORDER-1);

for this all.

But I could also see an argument for "allocate N pages of size
MAX_ORDER-1", with some small N, simply because I can see the
advantage of not taking and releasing the locking and looking up the
zone individually N times.

If you want to get gigabytes of memory (or terabytes), doing it in
bigger chunks than one single maximum-sized page sounds fairly
reasonable.

I just don't think that "thousands of pages" is reasonable. But
"tens
of max-sized pages" sounds fair enough to me, and it would certainly
not be a pain for the VM.

So I'm open to new interfaces. I just want those new interfaces to
make sense, and be low latency and simple for the VM to do. I'm
objecting to the incredibly baroque and heavy-weight one that can
return near-infinite amounts of memory.

The real advantage of jjuist the existing "alloc_pages()" model is
that I think the ballooning people can use that to *test* things out.
If it turns out that taking and releasing the VM locks is a big cost,
we can see if a batch interface that allows you to get tens of pages
at the same time is worth it.

So yes, I'd suggest starting with just the existing alloc_pages. Maybe
it's not enough, but it should be good enough for testing.

                    Linus

Maybe Matching Threads

Search for more possibly parallel threads

Linux Virtualization - Jul 2018 - [PATCH v35 1/5] mm: support to get hints of free page blocks

[PATCH v35 1/5] mm: support to get hints of free page blocks

[PATCH v35 1/5] mm: support to get hints of free page blocks

[PATCH v35 1/5] mm: support to get hints of free page blocks

[PATCH v35 1/5] mm: support to get hints of free page blocks

Maybe Matching Threads