David Hildenbrand
2017-Jun-20 16:49 UTC
[PATCH v11 4/6] mm: function to offer a page block on the free list
On 20.06.2017 18:44, Rik van Riel wrote:> On Mon, 2017-06-12 at 07:10 -0700, Dave Hansen wrote: > >> The hypervisor is going to throw away the contents of these pages, >> right? As soon as the spinlock is released, someone can allocate a >> page, and put good data in it. What keeps the hypervisor from >> throwing >> away good data? > > That looks like it may be the wrong API, then? > > We already have hooks called arch_free_page and > arch_alloc_page in the VM, which are called when > pages are freed, and allocated, respectively. > > Nitesh Lal (on the CC list) is working on a way > to efficiently batch recently freed pages for > free page hinting to the hypervisor. > > If that is done efficiently enough (eg. with > MADV_FREE on the hypervisor side for lazy freeing, > and lazy later re-use of the pages), do we still > need the harder to use batch interface from this > patch? >David's opinion incoming: No, I think proper free page hinting would be the optimum solution, if done right. This would avoid the batch interface and even turn virtio-balloon in some sense useless. -- Thanks, David
Rik van Riel
2017-Jun-20 17:29 UTC
[PATCH v11 4/6] mm: function to offer a page block on the free list
On Tue, 2017-06-20 at 18:49 +0200, David Hildenbrand wrote:> On 20.06.2017 18:44, Rik van Riel wrote:> > Nitesh Lal (on the CC list) is working on a way > > to efficiently batch recently freed pages for > > free page hinting to the hypervisor. > > > > If that is done efficiently enough (eg. with > > MADV_FREE on the hypervisor side for lazy freeing, > > and lazy later re-use of the pages), do we still > > need the harder to use batch interface from this > > patch? > > > > David's opinion incoming: > > No, I think proper free page hinting would be the optimum solution, > if > done right. This would avoid the batch interface and even turn > virtio-balloon in some sense useless.I agree with that. Let me go into some more detail of what Nitesh is implementing: 1) In arch_free_page, the being-freed page is added to a per-cpu set of freed pages. 2) Once that set is full, arch_free_pages goes into a slow path, which: 2a) Iterates over the set of freed pages, and 2b) Checks whether they are still free, and 2c) Adds the still free pages to a list that is to be passed to the hypervisor, to be MADV_FREEd. 2d) Makes that hypercall. Meanwhile all arch_alloc_pages has to do is make sure it does not allocate a page while it is currently being MADV_FREEd on the hypervisor side. The code Wei is working on looks like it could be suitable for steps (2c) and (2d) above. Nitesh already has code for steps 1 through 2b. -- All rights reversed -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20170620/88e2176b/attachment-0001.sig>
Michael S. Tsirkin
2017-Jun-20 18:17 UTC
[PATCH v11 4/6] mm: function to offer a page block on the free list
On Tue, Jun 20, 2017 at 06:49:33PM +0200, David Hildenbrand wrote:> On 20.06.2017 18:44, Rik van Riel wrote: > > On Mon, 2017-06-12 at 07:10 -0700, Dave Hansen wrote: > > > >> The hypervisor is going to throw away the contents of these pages, > >> right? As soon as the spinlock is released, someone can allocate a > >> page, and put good data in it. What keeps the hypervisor from > >> throwing > >> away good data? > > > > That looks like it may be the wrong API, then? > > > > We already have hooks called arch_free_page and > > arch_alloc_page in the VM, which are called when > > pages are freed, and allocated, respectively. > > > > Nitesh Lal (on the CC list) is working on a way > > to efficiently batch recently freed pages for > > free page hinting to the hypervisor. > > > > If that is done efficiently enough (eg. with > > MADV_FREE on the hypervisor side for lazy freeing, > > and lazy later re-use of the pages), do we still > > need the harder to use batch interface from this > > patch? > > > David's opinion incoming: > > No, I think proper free page hinting would be the optimum solution, if > done right. This would avoid the batch interface and even turn > virtio-balloon in some sense useless.I agree generally. But we have to balance that against the fact that this was discussed since at least 2011 and no one built this solution yet.> -- > > Thanks, > > David
Michael S. Tsirkin
2017-Jun-20 18:26 UTC
[PATCH v11 4/6] mm: function to offer a page block on the free list
On Tue, Jun 20, 2017 at 01:29:00PM -0400, Rik van Riel wrote:> On Tue, 2017-06-20 at 18:49 +0200, David Hildenbrand wrote: > > On 20.06.2017 18:44, Rik van Riel wrote: > > > > Nitesh Lal (on the CC list) is working on a way > > > to efficiently batch recently freed pages for > > > free page hinting to the hypervisor. > > > > > > If that is done efficiently enough (eg. with > > > MADV_FREE on the hypervisor side for lazy freeing, > > > and lazy later re-use of the pages), do we still > > > need the harder to use batch interface from this > > > patch? > > > > > > > David's opinion incoming: > > > > No, I think proper free page hinting would be the optimum solution, > > if > > done right. This would avoid the batch interface and even turn > > virtio-balloon in some sense useless. > > I agree with that. Let me go into some more detail of > what Nitesh is implementing: > > 1) In arch_free_page, the being-freed page is added > to a per-cpu set of freed pages. > 2) Once that set is full, arch_free_pages goes into a > slow path, which: > 2a) Iterates over the set of freed pages, and > 2b) Checks whether they are still free, and > 2c) Adds the still free pages to a list that is > to be passed to the hypervisor, to be MADV_FREEd. > 2d) Makes that hypercall. > > Meanwhile all arch_alloc_pages has to do is make sure it > does not allocate a page while it is currently being > MADV_FREEd on the hypervisor side. > > The code Wei is working on looks like it could be > suitable for steps (2c) and (2d) above. Nitesh already > has code for steps 1 through 2b. > > -- > All rights reversedSo my question is this: Wei posted these numbers for balloon inflation times: inflating 7GB of an 8GB idle guest: 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. It seems that this is an excessive amount of time to stay under a lock. What are your estimates for Nitesh's work? -- MST
David Hildenbrand
2017-Jun-20 18:54 UTC
[PATCH v11 4/6] mm: function to offer a page block on the free list
On 20.06.2017 20:17, Michael S. Tsirkin wrote:> On Tue, Jun 20, 2017 at 06:49:33PM +0200, David Hildenbrand wrote: >> On 20.06.2017 18:44, Rik van Riel wrote: >>> On Mon, 2017-06-12 at 07:10 -0700, Dave Hansen wrote: >>> >>>> The hypervisor is going to throw away the contents of these pages, >>>> right? As soon as the spinlock is released, someone can allocate a >>>> page, and put good data in it. What keeps the hypervisor from >>>> throwing >>>> away good data? >>> >>> That looks like it may be the wrong API, then? >>> >>> We already have hooks called arch_free_page and >>> arch_alloc_page in the VM, which are called when >>> pages are freed, and allocated, respectively. >>> >>> Nitesh Lal (on the CC list) is working on a way >>> to efficiently batch recently freed pages for >>> free page hinting to the hypervisor. >>> >>> If that is done efficiently enough (eg. with >>> MADV_FREE on the hypervisor side for lazy freeing, >>> and lazy later re-use of the pages), do we still >>> need the harder to use batch interface from this >>> patch? >>> >> David's opinion incoming: >> >> No, I think proper free page hinting would be the optimum solution, if >> done right. This would avoid the batch interface and even turn >> virtio-balloon in some sense useless. > > I agree generally. But we have to balance that against the fact that > this was discussed since at least 2011 and no one built this solution > yet.I totally agree, and I still think it will be hard to get a decent performance for free page hinting (let's call it challenging). But I heard of some interesting ideas. Surprise me. Still, I would favor such an interface over a mm interface where people start asking the same question over and over again ("how can this even work"). Not only because it wasn't explained sufficiently enough, but also because this interface is so special for one use case and one scenario (concurrent dirty tracking in the host during migration). IMHO even simply writing all-zeros to all free pages before starting migration (or even when freeing a page) would be a cleaner interface than this (because it atomically works with the entity the host cares about for migration). But yes, performance is horrible that's why I am not even suggesting it. Just saying that this mm interface is very very special and if we could find something better, I'd favor it. -- Thanks, David
Possibly Parallel Threads
- [PATCH v11 4/6] mm: function to offer a page block on the free list
- [PATCH v11 4/6] mm: function to offer a page block on the free list
- [PATCH v11 4/6] mm: function to offer a page block on the free list
- [PATCH v11 4/6] mm: function to offer a page block on the free list
- [PATCH v11 4/6] mm: function to offer a page block on the free list