thr3ads.net - Linux Virtualization - [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2018-Jun-18 02:28 UTC

[PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

On Sat, Jun 16, 2018 at 01:09:44AM +0000, Wang, Wei W
wrote:> On Friday, June 15, 2018 10:29 PM, Michael S. Tsirkin wrote:
> > On Fri, Jun 15, 2018 at 02:11:23PM +0000, Wang, Wei W wrote:
> > > On Friday, June 15, 2018 7:42 PM, Michael S. Tsirkin wrote:
> > > > On Fri, Jun 15, 2018 at 12:43:11PM +0800, Wei Wang wrote:
> > > > > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT
feature
> > > > > indicates the support of reporting hints of guest free
pages to host via
> > virtio-balloon.
> > > > >
> > > > > Host requests the guest to report free page hints by
sending a
> > > > > command to the guest via setting the
> > > > VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT
> > > > > bit of the host_cmd config register.
> > > > >
> > > > > As the first step here, virtio-balloon only reports
free page
> > > > > hints from the max order (10) free page list to host.
This has
> > > > > generated similar good results as reporting all free
page hints during
> > our tests.
> > > > >
> > > > > TODO:
> > > > > - support reporting free page hints from smaller order
free page lists
> > > > >   when there is a need/request from users.
> > > > >
> > > > > Signed-off-by: Wei Wang <wei.w.wang at intel.com>
> > > > > Signed-off-by: Liang Li <liang.z.li at intel.com>
> > > > > Cc: Michael S. Tsirkin <mst at redhat.com>
> > > > > Cc: Michal Hocko <mhocko at kernel.org>
> > > > > Cc: Andrew Morton <akpm at linux-foundation.org>
> > > > > ---
> > > > >  drivers/virtio/virtio_balloon.c     | 187
> > +++++++++++++++++++++++++++++--
> > > > -----
> > > > >  include/uapi/linux/virtio_balloon.h |  13 +++
> > > > >  2 files changed, 163 insertions(+), 37 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_balloon.c
> > > > > b/drivers/virtio/virtio_balloon.c index
6b237e3..582a03b 100644
> > > > > --- a/drivers/virtio/virtio_balloon.c
> > > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > > @@ -43,6 +43,9 @@
> > > > >  #define OOM_VBALLOON_DEFAULT_PAGES 256  #define
> > > > > VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> > > > >
> > > > > +/* The size of memory in bytes allocated for reporting
free page
> > > > > +hints */ #define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE *
16)
> > > > > +
> > > > >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > > > > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > > > > MODULE_PARM_DESC(oom_pages, "pages to free on
OOM");
> > > >
> > > > Doesn't this limit memory size of the guest we can
report?
> > > > Apparently to several gigabytes ...
> > > > OTOH huge guests with lots of free memory is exactly where
we would
> > > > gain the most ...
> > >
> > > Yes, the 16-page array can report up to 32GB (each page can hold
512
> > addresses of 4MB free page blocks, i.e. 2GB free memory per page) free
> > memory to host. It is not flexible.
> > >
> > > How about allocating the buffer according to the guest memory
size
> > > (proportional)? That is,
> > >
> > > /* Calculates the maximum number of 4MB (equals to 1024 pages)
free
> > > pages blocks that the system can have */ 4m_page_blocks > >
> totalram_pages / 1024;
> > >
> > > /* Allocating one page can hold 512 free page blocks, so
calculates
> > > the number of pages that can hold those 4MB blocks. And this
> > > allocation should not exceed 1024 pages */ pages_to_allocate >
> > min(4m_page_blocks / 512, 1024);
> > >
> > > For a 2TB guests, which has 2^19 page blocks (4MB each), we will
allocate
> > 1024 pages as the buffer.
> > >
> > > When the guest has large memory, it should be easier to succeed
in
> > allocation of large buffer. If that allocation fails, that implies
that nothing
> > would be got from the 4MB free page list.
> > >
> > > I think the proportional allocation is simpler compared to other
> > > approaches like
> > > - scattered buffer, which will complicate the
get_from_free_page_list
> > > implementation;
> > > - one buffer to call get_from_free_page_list multiple times,
which needs
> > get_from_free_page_list to maintain states.. also too complicated.
> > >
> > > Best,
> > > Wei
> > >
> > 
> > That's more reasonable, but question remains what to do if that
value
> > exceeds MAX_ORDER. I'd say maybe tell host we can't report it.
> 
> Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) above, so
the maximum memory that can be reported is 2TB. For larger guests, e.g. 4TB, the
optimization can still offer 2TB free memory (better than no optimization).
Maybe it's better, maybe it isn't. It certainly muddies the waters even
more.  I'd rather we had a better plan. From that POV I like what
Matthew Wilcox suggested for this which is to steal the necessary #
of entries off the list.

If that doesn't fly, we can allocate out of the loop and just retry with
more
pages.
> On the other hand, large guests being large mostly because the guests need
to use large memory. In that case, they usually won't have that much free
memory to report.
And following this logic small guests don't have a lot of memory to report
at all.
Could you remind me why are we considering this optimization then?
> > 
> > Also allocating it with GFP_KERNEL is out. You only want to take it
off the free
> > list. So I guess __GFP_NOMEMALLOC and __GFP_ATOMIC.
> 
> Sounds good, thanks.
> 
> > Also you can't allocate this on device start. First totalram_pages
can change.
> > Second that's too much memory to tie up forever.
> 
> Yes, makes sense.
> 
> Best,
> Wei

Wang, Wei W

2018-Jun-19 01:06 UTC

head link

[virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin
wrote:> On Sat, Jun 16, 2018 at 01:09:44AM +0000, Wang, Wei W wrote:
> > Not necessarily, I think. We have min(4m_page_blocks / 512, 1024)
above,
> so the maximum memory that can be reported is 2TB. For larger guests, e.g.
> 4TB, the optimization can still offer 2TB free memory (better than no
> optimization).
> 
> Maybe it's better, maybe it isn't. It certainly muddies the waters
even more.
> I'd rather we had a better plan. From that POV I like what Matthew
Wilcox
> suggested for this which is to steal the necessary # of entries off the
list.
Actually what Matthew suggested doesn't make a difference here. That method
always steal the first free page blocks, and sure can be changed to take more.
But all these can be achieved via kmalloc by the caller which is more prudent
and makes the code more straightforward. I think we don't need to take that
risk unless the MM folks strongly endorse that approach.

The max size of the kmalloc-ed memory is 4MB, which gives us the limitation that
the max free memory to report is 2TB. Back to the motivation of this work, the
cloud guys want to use this optimization to accelerate their guest live
migration. 2TB guests are not common in today's clouds. When huge guests
become common in the future, we can easily tweak this API to fill hints into
scattered buffer (e.g. several 4MB arrays passed to this API) instead of one as
in this version.

This limitation doesn't cause any issue from functionality perspective. For
the extreme case like a 100TB guest live migration which is theoretically
possible today, this optimization helps skip 2TB of its free memory. This result
is that it may reduce only 2% live migration time, but still better than not
skipping the 2TB (if not using the feature).

So, for the first release of this feature, I think it is better to have the
simpler and more straightforward solution as we have now, and clearly document
why it can report up to 2TB free memory.

 > If that doesn't fly, we can allocate out of the loop and just retry
with more
> pages.
> 
> > On the other hand, large guests being large mostly because the guests
need
> to use large memory. In that case, they usually won't have that much
free
> memory to report.
> 
> And following this logic small guests don't have a lot of memory to
report at
> all.
> Could you remind me why are we considering this optimization then?
If there is a 3TB guest, it is 3TB not 2TB mostly because it would need to use
e.g. 2.5TB memory from time to time. In the worst case, it only has 0.5TB free
memory to report, but reporting 0.5TB with this optimization is better than no
optimization. (and the current 2TB limitation isn't a limitation for the 3TB
guest in this case)

Best,
Wei

Michael S. Tsirkin

2018-Jun-19 03:05 UTC

head link

[virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

On Tue, Jun 19, 2018 at 01:06:48AM +0000, Wang, Wei W
wrote:> On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> > On Sat, Jun 16, 2018 at 01:09:44AM +0000, Wang, Wei W wrote:
> > > Not necessarily, I think. We have min(4m_page_blocks / 512, 1024)
above,
> > so the maximum memory that can be reported is 2TB. For larger guests,
e.g.
> > 4TB, the optimization can still offer 2TB free memory (better than no
> > optimization).
> > 
> > Maybe it's better, maybe it isn't. It certainly muddies the
waters even more.
> > I'd rather we had a better plan. From that POV I like what Matthew
Wilcox
> > suggested for this which is to steal the necessary # of entries off
the list.
> 
> Actually what Matthew suggested doesn't make a difference here. That
method always steal the first free page blocks, and sure can be changed to take
more. But all these can be achieved via kmalloc
I'd do get_user_pages really. You don't want pages split, etc.
> by the caller which is more prudent and makes the code more
straightforward. I think we don't need to take that risk unless the MM folks
strongly endorse that approach.
> 
> The max size of the kmalloc-ed memory is 4MB, which gives us the limitation
that the max free memory to report is 2TB. Back to the motivation of this work,
the cloud guys want to use this optimization to accelerate their guest live
migration. 2TB guests are not common in today's clouds. When huge guests
become common in the future, we can easily tweak this API to fill hints into
scattered buffer (e.g. several 4MB arrays passed to this API) instead of one as
in this version.
> 
> This limitation doesn't cause any issue from functionality perspective.
For the extreme case like a 100TB guest live migration which is theoretically
possible today, this optimization helps skip 2TB of its free memory. This result
is that it may reduce only 2% live migration time, but still better than not
skipping the 2TB (if not using the feature).
Not clearly better, no, since you are slowing the guest.

> So, for the first release of this feature, I think it is better to have the
simpler and more straightforward solution as we have now, and clearly document
why it can report up to 2TB free memory.
No one has the time to read documentation about how an internal flag
within a device works. Come on, getting two pages isn't much harder
than a single one.
> 
>  
> > If that doesn't fly, we can allocate out of the loop and just
retry with more
> > pages.
> > 
> > > On the other hand, large guests being large mostly because the
guests need
> > to use large memory. In that case, they usually won't have that
much free
> > memory to report.
> > 
> > And following this logic small guests don't have a lot of memory
to report at
> > all.
> > Could you remind me why are we considering this optimization then?
> 
> If there is a 3TB guest, it is 3TB not 2TB mostly because it would need to
use e.g. 2.5TB memory from time to time. In the worst case, it only has 0.5TB
free memory to report, but reporting 0.5TB with this optimization is better than
no optimization. (and the current 2TB limitation isn't a limitation for the
3TB guest in this case)
I'd rather not spend time writing up random limitations.

> Best,
> Wei

Reasonably Related Threads

Search for more seemingly similar threads

Linux Virtualization - Jun 2018 - [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

[PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

[virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

[virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

Reasonably Related Threads