Linus Torvalds
2018-Jul-10 17:33 UTC
[PATCH v35 1/5] mm: support to get hints of free page blocks
NAK. On Tue, Jul 10, 2018 at 2:56 AM Wei Wang <wei.w.wang at intel.com> wrote:> > + > + buf_page = list_first_entry_or_null(pages, struct page, lru); > + if (!buf_page) > + return -EINVAL; > + buf = (__le64 *)page_address(buf_page);Stop this garbage. Why the hell would you pass in some crazy "liost of pages" that uses that lru list? That's just insane shit. Just pass in a an array to fill in. No idiotic games like this with odd list entries (what's the locking?) and crazy casting to So if you want an array of page addresses, pass that in as such. If you want to do it in a page, do it with u64 *array = page_address(page); int nr = PAGE_SIZE / sizeof(u64); and now you pass that array in to the thing. None of this completely insane crazy crap interfaces. Plus, I still haven't heard an explanation for why you want so many pages in the first place, and why you want anything but MAX_ORDER-1. So no. This kind of unnecessarily complex code with completely insane calling interfaces does not make it into the VM layer. Maybe that crazy "let's pass a chain of pages that uses the lru list" makes sense to the virtio-balloon code. But you need to understand that it makes ZERO conceptual sense to anybody else. And the core VM code is about a million times more important than the balloon code in this case, so you had better make the interface make sense to *it*. Linus
Wei Wang
2018-Jul-11 01:28 UTC
[PATCH v35 1/5] mm: support to get hints of free page blocks
On 07/11/2018 01:33 AM, Linus Torvalds wrote:> NAK. > > On Tue, Jul 10, 2018 at 2:56 AM Wei Wang <wei.w.wang at intel.com> wrote: >> + >> + buf_page = list_first_entry_or_null(pages, struct page, lru); >> + if (!buf_page) >> + return -EINVAL; >> + buf = (__le64 *)page_address(buf_page); > Stop this garbage. > > Why the hell would you pass in some crazy "liost of pages" that uses > that lru list? > > That's just insane shit. > > Just pass in a an array to fill in. No idiotic games like this with > odd list entries (what's the locking?) and crazy casting to > > So if you want an array of page addresses, pass that in as such. If > you want to do it in a page, do it with > > u64 *array = page_address(page); > int nr = PAGE_SIZE / sizeof(u64); > > and now you pass that array in to the thing. None of this completely > insane crazy crap interfaces. > > Plus, I still haven't heard an explanation for why you want so many > pages in the first place, and why you want anything but MAX_ORDER-1.Sorry for missing that explanation. We only get addresses of the "MAX_ORDER-1" blocks into the array. The max size of the array that could be allocated by kmalloc is KMALLOC_MAX_SIZE (i.e. 4MB on x86). With that max array, we could load "4MB / sizeof(u64)" addresses of "MAX_ORDER-1" blocks, that is, 2TB free memory at most. We thought about removing that 2TB limitation by passing in multiple such max arrays (a list of them). But 2TB has been enough for our use cases so far, and agree it would be better to have a simpler API in the first place. So I plan to go back to the previous version of just passing in one simple array (https://lkml.org/lkml/2018/6/15/21) if no objections. Best, Wei
Linus Torvalds
2018-Jul-11 01:44 UTC
[PATCH v35 1/5] mm: support to get hints of free page blocks
On Tue, Jul 10, 2018 at 6:24 PM Wei Wang <wei.w.wang at intel.com> wrote:> > We only get addresses of the "MAX_ORDER-1" blocks into the array. The > max size of the array that could be allocated by kmalloc is > KMALLOC_MAX_SIZE (i.e. 4MB on x86). With that max array, we could load > "4MB / sizeof(u64)" addresses of "MAX_ORDER-1" blocks, that is, 2TB free > memory at most. We thought about removing that 2TB limitation by passing > in multiple such max arrays (a list of them).No. Stop this already./ You're doing everthing wrong. If the array has to describe *all* memory you will ever free, then you have already lost. Just do it in chunks. I don't want the VM code to even fill in that big of an array anyway - this all happens under the zone lock, and you're walking a list that is bad for caching anyway. So plan on an interface that allows _incremental_ freeing, because any plan that starts with "I worry that maybe two TERABYTES of memory isn't big enough" is so broken that it's laughable. That was what I tried to encourage with actually removing the pages form the page list. That would be an _incremental_ interface. You can remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark them free for ballooning that way. And if you still feel you have tons of free memory, just continue removing more pages from the free list. Notice? Incremental. Not "I want to have a crazy array that is enough to hold 2TB at one time". So here's the rule: - make it a simple array interface - make the array *small*. Not megabytes. Kilobytes. Because if you're filling in megabytes worth of free pointers while holding the zone lock, you're doing something wrong. - design the interface so that you do not *need* to have this crazy "all or nothing" approach. See what I'm trying to push for. Think "low latency". Think "small arrays". Think "simple and straightforward interfaces". At no point should you ever worry about "2TB". Never. Linus
Michael S. Tsirkin
2018-Jul-11 04:00 UTC
[PATCH v35 1/5] mm: support to get hints of free page blocks
On Tue, Jul 10, 2018 at 10:33:08AM -0700, Linus Torvalds wrote:> NAK. > > On Tue, Jul 10, 2018 at 2:56 AM Wei Wang <wei.w.wang at intel.com> wrote: > > > > + > > + buf_page = list_first_entry_or_null(pages, struct page, lru); > > + if (!buf_page) > > + return -EINVAL; > > + buf = (__le64 *)page_address(buf_page); > > Stop this garbage. > > Why the hell would you pass in some crazy "liost of pages" that uses > that lru list? > > That's just insane shit. > > Just pass in a an array to fill in. > No idiotic games like this with > odd list entries (what's the locking?) and crazy casting to > > So if you want an array of page addresses, pass that in as such. If > you want to do it in a page, do it with > > u64 *array = page_address(page); > int nr = PAGE_SIZE / sizeof(u64); > > and now you pass that array in to the thing. None of this completely > insane crazy crap interfaces.Question was raised what to do if there are so many free MAX_ORDER pages that their addresses don't fit in a single MAX_ORDER page. Yes, only a huge guest would trigger that but it seems theoretically possible. I guess an array of arrays then? An alternative suggestion was not to pass an array at all, instead peel enough pages off the list to contain all free entries. Maybe that's too hacky.> > Plus, I still haven't heard an explanation for why you want so many > pages in the first place, and why you want anything but MAX_ORDER-1. > > So no. This kind of unnecessarily complex code with completely insane > calling interfaces does not make it into the VM layer. > > Maybe that crazy "let's pass a chain of pages that uses the lru list" > makes sense to the virtio-balloon code. But you need to understand > that it makes ZERO conceptual sense to anybody else. And the core VM > code is about a million times more important than the balloon code in > this case, so you had better make the interface make sense to *it*. > > Linus
Michael S. Tsirkin
2018-Jul-11 04:04 UTC
[PATCH v35 1/5] mm: support to get hints of free page blocks
On Wed, Jul 11, 2018 at 07:00:37AM +0300, Michael S. Tsirkin wrote:> On Tue, Jul 10, 2018 at 10:33:08AM -0700, Linus Torvalds wrote: > > NAK. > > > > On Tue, Jul 10, 2018 at 2:56 AM Wei Wang <wei.w.wang at intel.com> wrote: > > > > > > + > > > + buf_page = list_first_entry_or_null(pages, struct page, lru); > > > + if (!buf_page) > > > + return -EINVAL; > > > + buf = (__le64 *)page_address(buf_page); > > > > Stop this garbage. > > > > Why the hell would you pass in some crazy "liost of pages" that uses > > that lru list? > > > > That's just insane shit. > > > > Just pass in a an array to fill in. > > No idiotic games like this with > > odd list entries (what's the locking?) and crazy casting to > > > > So if you want an array of page addresses, pass that in as such. If > > you want to do it in a page, do it with > > > > u64 *array = page_address(page); > > int nr = PAGE_SIZE / sizeof(u64); > > > > and now you pass that array in to the thing. None of this completely > > insane crazy crap interfaces. > > Question was raised what to do if there are so many free > MAX_ORDER pages that their addresses don't fit in a single MAX_ORDER > page.Oh you answered already, I spoke too soon. Nevermind, pls ignore me.
Apparently Analagous Threads
- [PATCH v35 1/5] mm: support to get hints of free page blocks
- [PATCH v35 1/5] mm: support to get hints of free page blocks
- [PATCH v35 1/5] mm: support to get hints of free page blocks
- [PATCH v35 1/5] mm: support to get hints of free page blocks
- [PATCH v35 1/5] mm: support to get hints of free page blocks