thr3ads.net - Linux Virtualization - [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Li, Liang Z

2016-Dec-09 05:35 UTC

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

> On 12/08/2016 08:45 PM, Li, Liang Z wrote:
> > What's the conclusion of your discussion? It seems you want some
> > statistic before deciding whether to  ripping the bitmap from the ABI,
> > am I right?
> 
> I think Andrea and David feel pretty strongly that we should remove the
> bitmap, unless we have some data to support keeping it.  I don't feel
as
> strongly about it, but I think their critique of it is pretty valid.  I
think the
> consensus is that the bitmap needs to go.
> 
Thanks for you clarification.
> The only real question IMNHO is whether we should do a power-of-2 or a
> length.  But, if we have 12 bits, then the argument for doing length is
pretty
> strong.  We don't need anywhere near 12 bits if doing power-of-2.
> So each item can max represent 16MB Bytes, seems not big enough,
but enough for most case.
Things became much more simple without the bitmap, and I like simple solution
too. :)

I will prepare the v6 and remove all the bitmap related stuffs. Thank you all!

Liang

Andrea Arcangeli

2016-Dec-09 16:42 UTC

head link

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

Hello,

On Fri, Dec 09, 2016 at 05:35:45AM +0000, Li, Liang Z
wrote:> > On 12/08/2016 08:45 PM, Li, Liang Z wrote:
> > > What's the conclusion of your discussion? It seems you want
some
> > > statistic before deciding whether to  ripping the bitmap from the
ABI,
> > > am I right?
> > 
> > I think Andrea and David feel pretty strongly that we should remove
the
> > bitmap, unless we have some data to support keeping it.  I don't
feel as
> > strongly about it, but I think their critique of it is pretty valid. 
I think the
> > consensus is that the bitmap needs to go.
> > 
> 
> Thanks for you clarification.
> 
> > The only real question IMNHO is whether we should do a power-of-2 or a
> > length.  But, if we have 12 bits, then the argument for doing length
is pretty
> > strong.  We don't need anywhere near 12 bits if doing power-of-2.
> > 
> So each item can max represent 16MB Bytes, seems not big enough,
> but enough for most case.
> Things became much more simple without the bitmap, and I like simple
solution too. :)
> 
> I will prepare the v6 and remove all the bitmap related stuffs. Thank you
all!
Sounds great!

I suggested to check the statistics, because collecting those stats
looked simpler and quicker than removing all bitmap related stuff from
the patchset. However if you prefer to prepare a v6 without the bitmap
another perhaps more interesting way to evaluate the usefulness of the
bitmap is to just run the same benchmark and verify that there is no
regression compared to the bitmap enabled code.

The other issue with the bitmap is, the best case for the bitmap is
ever less likely to materialize the more RAM is added to the guest. It
won't regress linearly because after all there can be some locality
bias in the buddy splits, but if sync compaction is used in the large
order allocations tried before reaching order 0, the bitmap payoff
will regress close to linearly with the increase of RAM.

So it'd be good to check the stats or the benchmark on large guests,
at least one hundred gigabytes or so.

Changing topic but still about the ABI features needed, so it may be
relevant for this discussion:

1) vNUMA locality: i.e. allowing host to specify which vNODEs to take
   memory from, using alloc_pages_node in guest. So you can ask to
   take X pages from vnode A, Y pages from vnode B, in one vmenter.

2) allowing qemu to tell the guest to stop inflating the balloon and
   report a fragmentation limit being hit, when sync compaction
   powered allocations fails at certain power-of-two order granularity
   passed by qemu to the guest. This order constraint will be passed
   by default for hugetlbfs guests with 2MB hpage size, while it can
   be used optionally on THP backed guests. This option with THP
   guests would allow a highlevel management software to provide a
   "don't reduce guest performance" while shrinking the memory
size of
   the guest from the GUI. If you deselect the option, you can shrink
   down to the last freeable 4k guest page, but doing so may have to
   split THP in the host (you don't know for sure if they were really
   THP but they could have been), and it may regress
   performance. Inflating the balloon while passing a minimum
   granularity "order" of the pages being zapped, will guarantee
   inflating the balloon cannot decrease guest performance
   instead. Plus it's needed for hugetlbfs anyway as far as I can
   tell. hugetlbfs would not be host enforceable even if the idea is
   not to free memory but only reduce the available memory of the
   guest (not without major changes that maps a hugetlb page with 4k
   ptes at least). While for a more cooperative usage of hugetlbfs
   guests, it's simply not useful to inflate the balloon at anything
   less than the "HPAGE_SIZE" hugetlbfs granularity.

We also plan to use userfaultfd to make the balloon driver host
enforced (will work fine on hugetlbfs 2M and tmpfs too) but that's
going to be invisible to the ABI so it's not strictly relevant for
this discussion.

On a side note, registering userfaultfd on the ballooned range, will
keep khugepaged at bay so it won't risk to re-inflating the
MADV_DONTNEED zapped sub-THP fragments no matter the sysfs tunings.

Thanks!
Andrea

Li, Liang Z

2016-Dec-14 08:20 UTC

head link

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

> fast (de)inflating & fast live migration
> 
> Hello,
> 
> On Fri, Dec 09, 2016 at 05:35:45AM +0000, Li, Liang Z wrote:
> > > On 12/08/2016 08:45 PM, Li, Liang Z wrote:
> > > > What's the conclusion of your discussion? It seems you
want some
> > > > statistic before deciding whether to  ripping the bitmap
from the
> > > > ABI, am I right?
> > >
> > > I think Andrea and David feel pretty strongly that we should
remove
> > > the bitmap, unless we have some data to support keeping it.  I
don't
> > > feel as strongly about it, but I think their critique of it is
> > > pretty valid.  I think the consensus is that the bitmap needs to
go.
> > >
> >
> > Thanks for you clarification.
> >
> > > The only real question IMNHO is whether we should do a power-of-2
or
> > > a length.  But, if we have 12 bits, then the argument for doing
> > > length is pretty strong.  We don't need anywhere near 12 bits
if doing
> power-of-2.
> > >
> > So each item can max represent 16MB Bytes, seems not big enough, but
> > enough for most case.
> > Things became much more simple without the bitmap, and I like simple
> > solution too. :)
> >
> > I will prepare the v6 and remove all the bitmap related stuffs. Thank
you all!
> 
> Sounds great!
> 
> I suggested to check the statistics, because collecting those stats looked
> simpler and quicker than removing all bitmap related stuff from the
patchset.
> However if you prefer to prepare a v6 without the bitmap another perhaps
> more interesting way to evaluate the usefulness of the bitmap is to just
run
> the same benchmark and verify that there is no regression compared to the
> bitmap enabled code.
> 
> The other issue with the bitmap is, the best case for the bitmap is ever
less
> likely to materialize the more RAM is added to the guest. It won't
regress
> linearly because after all there can be some locality bias in the buddy
splits,
> but if sync compaction is used in the large order allocations tried before
> reaching order 0, the bitmap payoff will regress close to linearly with the
> increase of RAM.
> 
> So it'd be good to check the stats or the benchmark on large guests, at
least
> one hundred gigabytes or so.
> 
> Changing topic but still about the ABI features needed, so it may be
relevant
> for this discussion:
> 
> 1) vNUMA locality: i.e. allowing host to specify which vNODEs to take
>    memory from, using alloc_pages_node in guest. So you can ask to
>    take X pages from vnode A, Y pages from vnode B, in one vmenter.
> 
> 2) allowing qemu to tell the guest to stop inflating the balloon and
>    report a fragmentation limit being hit, when sync compaction
>    powered allocations fails at certain power-of-two order granularity
>    passed by qemu to the guest. This order constraint will be passed
>    by default for hugetlbfs guests with 2MB hpage size, while it can
>    be used optionally on THP backed guests. This option with THP
>    guests would allow a highlevel management software to provide a
>    "don't reduce guest performance" while shrinking the
memory size of
>    the guest from the GUI. If you deselect the option, you can shrink
>    down to the last freeable 4k guest page, but doing so may have to
>    split THP in the host (you don't know for sure if they were really
>    THP but they could have been), and it may regress
>    performance. Inflating the balloon while passing a minimum
>    granularity "order" of the pages being zapped, will guarantee
>    inflating the balloon cannot decrease guest performance
>    instead. Plus it's needed for hugetlbfs anyway as far as I can
>    tell. hugetlbfs would not be host enforceable even if the idea is
>    not to free memory but only reduce the available memory of the
>    guest (not without major changes that maps a hugetlb page with 4k
>    ptes at least). While for a more cooperative usage of hugetlbfs
>    guests, it's simply not useful to inflate the balloon at anything
>    less than the "HPAGE_SIZE" hugetlbfs granularity.
> 
> We also plan to use userfaultfd to make the balloon driver host enforced
(will
> work fine on hugetlbfs 2M and tmpfs too) but that's going to be
invisible to
> the ABI so it's not strictly relevant for this discussion.
> 
> On a side note, registering userfaultfd on the ballooned range, will keep
> khugepaged at bay so it won't risk to re-inflating the MADV_DONTNEED
> zapped sub-THP fragments no matter the sysfs tunings.
> 
Thanks for your elaboration!
> Thanks!
> Andrea

Maybe Matching Threads

Search for more reasonably related threads

Linux Virtualization - Dec 2016 - [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

Maybe Matching Threads