This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device")' Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and shouldn't be given to the host ksmd to scan. Therefore, it is not necessary to zero ballooned pages, which is very time consuming when the page amount is large. The ongoing fast balloon tests show that the time to balloon 7G pages is increased from ~491ms to 2.8 seconds with __GFP_ZERO added. So, this patch removes the flag. Signed-off-by: Wei Wang <wei.w.wang at intel.com> Cc: Michal Hocko <mhocko at kernel.org> Cc: Michael S. Tsirkin <mst at redhat.com> --- mm/balloon_compaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c index 9075aa5..b06d9fe 100644 --- a/mm/balloon_compaction.c +++ b/mm/balloon_compaction.c @@ -24,7 +24,7 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) { unsigned long flags; struct page *page = alloc_page(balloon_mapping_gfp_mask() | - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); + __GFP_NOMEMALLOC | __GFP_NORETRY); if (!page) return NULL; -- 2.7.4
On Thu, Aug 03, 2017 at 07:59:17PM +0800, Wei Wang wrote:> This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: > enqueue zero page to balloon device")' > > Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and > shouldn't be given to the host ksmd to scan. Therefore, it is not > necessary to zero ballooned pages, which is very time consuming when > the page amount is large. The ongoing fast balloon tests show that the > time to balloon 7G pages is increased from ~491ms to 2.8 seconds with > __GFP_ZERO added. So, this patch removes the flag. > > Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Michael S. Tsirkin <mst at redhat.com>Fixes: bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device") Looks like hypervisor is better placed to zero these if it wants to. If it can't for some reason, this change would need a feature bit to avoid adding extra work for all guests. Acked-by: Michael S. Tsirkin <mst at redhat.com>> --- > mm/balloon_compaction.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > index 9075aa5..b06d9fe 100644 > --- a/mm/balloon_compaction.c > +++ b/mm/balloon_compaction.c > @@ -24,7 +24,7 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) > { > unsigned long flags; > struct page *page = alloc_page(balloon_mapping_gfp_mask() | > - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); > + __GFP_NOMEMALLOC | __GFP_NORETRY); > if (!page) > return NULL; > > -- > 2.7.4
On Thu 03-08-17 19:59:17, Wei Wang wrote:> This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: > enqueue zero page to balloon device")' > > Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and > shouldn't be given to the host ksmd to scan.I find MADV_DONTNEED reference still quite confusing. What do you think about the following wording instead: " Zeroying ballon pages is rather time consuming, especially when a lot of pages are in flight. E.g. 7GB worth of ballooned memory takes 2.8s with __GFP_ZERO while it takes ~491ms without it. The original commit argued that zeroying will help ksmd to merge these pages on the host but this argument is assuming that the host actually marks balloon pages for ksm which is not universally true. So we pay performance penalty for something that even might not be used in the end which is wrong. The host can zero out pages on its own when there is a need. "> Therefore, it is not > necessary to zero ballooned pages, which is very time consuming when > the page amount is large. The ongoing fast balloon tests show that the > time to balloon 7G pages is increased from ~491ms to 2.8 seconds with > __GFP_ZERO added. So, this patch removes the flag.The only reason why unconditional zeroying makes some sense is the data leak protection (guest doesn't want to leak potentially sensitive data to a malicious guest). I am not sure such a thread applies here though.> Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Michael S. Tsirkin <mst at redhat.com>other than that Acked-by: Michal Hocko <mhocko at suse.com>> --- > mm/balloon_compaction.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > index 9075aa5..b06d9fe 100644 > --- a/mm/balloon_compaction.c > +++ b/mm/balloon_compaction.c > @@ -24,7 +24,7 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) > { > unsigned long flags; > struct page *page = alloc_page(balloon_mapping_gfp_mask() | > - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); > + __GFP_NOMEMALLOC | __GFP_NORETRY); > if (!page) > return NULL; > > -- > 2.7.4-- Michal Hocko SUSE Labs
On 08/03/2017 08:54 PM, Michal Hocko wrote:> On Thu 03-08-17 19:59:17, Wei Wang wrote: >> This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: >> enqueue zero page to balloon device")' >> >> Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and >> shouldn't be given to the host ksmd to scan. > I find MADV_DONTNEED reference still quite confusing. What do you think > about the following wording instead: > " > Zeroying ballon pages is rather time consuming, especially when a lot of > pages are in flight. E.g. 7GB worth of ballooned memory takes 2.8s with > __GFP_ZERO while it takes ~491ms without it. The original commit argued > that zeroying will help ksmd to merge these pages on the host but this > argument is assuming that the host actually marks balloon pages for ksm > which is not universally true. So we pay performance penalty for > something that even might not be used in the end which is wrong. The > host can zero out pages on its own when there is a need. > "I think it looks good. Thanks.>> Therefore, it is not >> necessary to zero ballooned pages, which is very time consuming when >> the page amount is large. The ongoing fast balloon tests show that the >> time to balloon 7G pages is increased from ~491ms to 2.8 seconds with >> __GFP_ZERO added. So, this patch removes the flag. > The only reason why unconditional zeroying makes some sense is the > data leak protection (guest doesn't want to leak potentially sensitive > data to a malicious guest). I am not sure such a thread applies here > though.I think the unwashed contents left in the balloon pages (also free pages) should be treated non-confidential - if the guest application has confidential content in its memory, the application itself should zero that before giving back that memory to the guest kernel. Best, Wei
On 03.08.2017 13:59, Wei Wang wrote:> This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: > enqueue zero page to balloon device")' > > Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and > shouldn't be given to the host ksmd to scan. Therefore, it is not > necessary to zero ballooned pages, which is very time consuming when > the page amount is large. The ongoing fast balloon tests show that the > time to balloon 7G pages is increased from ~491ms to 2.8 seconds with > __GFP_ZERO added. So, this patch removes the flag. > > Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Michael S. Tsirkin <mst at redhat.com> > --- > mm/balloon_compaction.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > index 9075aa5..b06d9fe 100644 > --- a/mm/balloon_compaction.c > +++ b/mm/balloon_compaction.c > @@ -24,7 +24,7 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) > { > unsigned long flags; > struct page *page = alloc_page(balloon_mapping_gfp_mask() | > - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); > + __GFP_NOMEMALLOC | __GFP_NORETRY); > if (!page) > return NULL; > >Your assumption here is, that the hypervisor will always supply a zero page. Unfortunately, this assumption is wrong (and it stems from the lack of different page size support in virtio-balloon). Think about these examples: 1. Guest is backed by huge pages (hugetbfs). Ballooning kicks in. MADV_DONTNEED is simply ignored in the hypervisor (hugetlbfs requires fallocate punshhole). Also, trying to zap 4k on e.g. 1MB pages will simply be ignored. 2. Guest on PPC uses 4k pages. Hypervisor uses 64k pages. trying to MADV_DONTNEED 4K on 64k pages will simply be ignored. So unfortunately, zeroing the page is the right thing to do to cover all cases. -- Thanks, David
On Mon 07-08-17 10:44:50, David Hildenbrand wrote:> On 03.08.2017 13:59, Wei Wang wrote: > > This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: > > enqueue zero page to balloon device")' > > > > Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and > > shouldn't be given to the host ksmd to scan. Therefore, it is not > > necessary to zero ballooned pages, which is very time consuming when > > the page amount is large. The ongoing fast balloon tests show that the > > time to balloon 7G pages is increased from ~491ms to 2.8 seconds with > > __GFP_ZERO added. So, this patch removes the flag. > > > > Signed-off-by: Wei Wang <wei.w.wang at intel.com> > > Cc: Michal Hocko <mhocko at kernel.org> > > Cc: Michael S. Tsirkin <mst at redhat.com> > > --- > > mm/balloon_compaction.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > > index 9075aa5..b06d9fe 100644 > > --- a/mm/balloon_compaction.c > > +++ b/mm/balloon_compaction.c > > @@ -24,7 +24,7 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) > > { > > unsigned long flags; > > struct page *page = alloc_page(balloon_mapping_gfp_mask() | > > - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); > > + __GFP_NOMEMALLOC | __GFP_NORETRY); > > if (!page) > > return NULL; > > > > > > Your assumption here is, that the hypervisor will always supply a zero > page. Unfortunately, this assumption is wrong (and it stems from the > lack of different page size support in virtio-balloon). > > Think about these examples: > > 1. Guest is backed by huge pages (hugetbfs). Ballooning kicks in. > > MADV_DONTNEED is simply ignored in the hypervisor (hugetlbfs requires > fallocate punshhole). Also, trying to zap 4k on e.g. 1MB pages will > simply be ignored. > > 2. Guest on PPC uses 4k pages. Hypervisor uses 64k pages. trying to > MADV_DONTNEED 4K on 64k pages will simply be ignored. > > So unfortunately, zeroing the page is the right thing to do to cover all > cases.Maybe it is my absolute lack of familiarity with what the host actually does with balloon pages but I fail to see why the above matters at all. ksm will not try to merge sub page units (4k for hugetlb or a large base page). And if you need to hide the guest contents then the host can clear the respective subpage just fine. So could you be more explicit why MADV_DONTNEED matters at all? Also does any host actually share sub pages between different guests? This sounds like a bad idea to me in general. -- Michal Hocko SUSE Labs
On 08/07/2017 04:44 PM, David Hildenbrand wrote:> On 03.08.2017 13:59, Wei Wang wrote: >> This patch is a revert of 'commit bb01b64cfab7 ("mm/balloon_compaction.c: >> enqueue zero page to balloon device")' >> >> Ballooned pages will be marked as MADV_DONTNEED by the hypervisor and >> shouldn't be given to the host ksmd to scan. Therefore, it is not >> necessary to zero ballooned pages, which is very time consuming when >> the page amount is large. The ongoing fast balloon tests show that the >> time to balloon 7G pages is increased from ~491ms to 2.8 seconds with >> __GFP_ZERO added. So, this patch removes the flag. >> >> Signed-off-by: Wei Wang <wei.w.wang at intel.com> >> Cc: Michal Hocko <mhocko at kernel.org> >> Cc: Michael S. Tsirkin <mst at redhat.com> >> --- >> mm/balloon_compaction.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c >> index 9075aa5..b06d9fe 100644 >> --- a/mm/balloon_compaction.c >> +++ b/mm/balloon_compaction.c >> @@ -24,7 +24,7 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) >> { >> unsigned long flags; >> struct page *page = alloc_page(balloon_mapping_gfp_mask() | >> - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); >> + __GFP_NOMEMALLOC | __GFP_NORETRY); >> if (!page) >> return NULL; >> >> > Your assumption here is, that the hypervisor will always supply a zero > page. Unfortunately, this assumption is wrong (and it stems from the > lack of different page size support in virtio-balloon).I think this would be something that we can improve the balloon. For example, the balloon request from the device should be aligned to the host page size before sending to the guest driver: On PPC, if the command requests for 140K memory to inflate, it can be aligned to 128K.> > Think about these examples: > > 1. Guest is backed by huge pages (hugetbfs). Ballooning kicks in. > > MADV_DONTNEED is simply ignored in the hypervisor (hugetlbfs requires > fallocate punshhole). Also, trying to zap 4k on e.g. 1MB pages will > simply be ignored.For the hugetlbfs case, I think the balloon size can be aligned to the huge page size (i.e 2M or 1GB). Best, Wei