thr3ads.net - Linux Virtualization - [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Roman Kagan

2016-Mar-04 08:35 UTC

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z
wrote:> > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert
wrote:
> > > * Liang Li (liang.z.li at intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all
the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all
these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about
the content in
> > > > free pages. We can make use of this fact and skip processing
the
> > > > free pages in the ram bulk stage, it can save a lot CPU
cycles and
> > > > reduce the network traffic significantly while speed up the
live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free
pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU
can use it
> > > > to filter out the guest's free pages in the ram bulk
stage. This
> > > > make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have
been
> > > looking at how to speed up ballooned VM migration.
> > >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> > same result?
> > 
> > Yes I was about to suggest the same thing: it's simple and makes
use of the
> > existing infrastructure.  And you wouldn't need to care if the
pages were
> > unmapped by ballooning or anything else (alternative balloon
> > implementations, not yet touched by the guest, etc.).  Besides, you
wouldn't
> > need to synchronize with the guest.
> > 
> > Roman.
> 
> The unmapped/zero mapped pages can be detected by parsing
/proc/self/pagemap,
> but the free pages can't be detected by this. Imaging an application
allocates a large amount
> of memory , after using, it frees the memory, then live migration happens.
All these free pages
> will be process and sent to the destination, it's not optimal.
First, the likelihood of such a situation is marginal, there's no point
optimizing for it specifically.

And second, even if that happens, you inflate the balloon right before
the migration and the free memory will get umapped very quickly, so this
case is covered nicely by the same technique that works for more
realistic cases, too.

Roman.

Dr. David Alan Gilbert

2016-Mar-04 09:08 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

* Roman Kagan (rkagan at virtuozzo.com) wrote:> On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert
wrote:
> > > > * Liang Li (liang.z.li at intel.com) wrote:
> > > > > The current QEMU live migration implementation mark the
all the
> > > > > guest's RAM pages as dirtied in the ram bulk stage,
all these pages
> > > > > will be processed and that takes quit a lot of CPU
cycles.
> > > > >
> > > > > From guest's point of view, it doesn't care
about the content in
> > > > > free pages. We can make use of this fact and skip
processing the
> > > > > free pages in the ram bulk stage, it can save a lot CPU
cycles and
> > > > > reduce the network traffic significantly while speed up
the live
> > > > > migration process obviously.
> > > > >
> > > > > This patch set is the QEMU side implementation.
> > > > >
> > > > > The virtio-balloon is extended so that QEMU can get the
free pages
> > > > > information from the guest through virtio.
> > > > >
> > > > > After getting the free pages information (a bitmap),
QEMU can use it
> > > > > to filter out the guest's free pages in the ram
bulk stage. This
> > > > > make the live migration process much more efficient.
> > > >
> > > > Hi,
> > > >   An interesting solution; I know a few different people
have been
> > > > looking at how to speed up ballooned VM migration.
> > > >
> > > >   I wonder if it would be possible to avoid the kernel
changes by
> > > > parsing /proc/self/pagemap - if that can be used to detect
> > > > unmapped/zero mapped pages in the guest ram, would it
achieve the
> > > same result?
> > > 
> > > Yes I was about to suggest the same thing: it's simple and
makes use of the
> > > existing infrastructure.  And you wouldn't need to care if
the pages were
> > > unmapped by ballooning or anything else (alternative balloon
> > > implementations, not yet touched by the guest, etc.).  Besides,
you wouldn't
> > > need to synchronize with the guest.
> > > 
> > > Roman.
> > 
> > The unmapped/zero mapped pages can be detected by parsing
/proc/self/pagemap,
> > but the free pages can't be detected by this. Imaging an
application allocates a large amount
> > of memory , after using, it frees the memory, then live migration
happens. All these free pages
> > will be process and sent to the destination, it's not optimal.
> 
> First, the likelihood of such a situation is marginal, there's no point
> optimizing for it specifically.
> 
> And second, even if that happens, you inflate the balloon right before
> the migration and the free memory will get umapped very quickly, so this
> case is covered nicely by the same technique that works for more
> realistic cases, too.
Although I wonder which is cheaper; that would be fairly expensive for
the guest wouldn't it? And you'd somehow have to kick the guest
before migration to do the ballooning - and how long would you wait
for it to finish?

Dave
> 
> Roman.--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK

Li, Liang Z

2016-Mar-04 09:12 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

> * Roman Kagan (rkagan at virtuozzo.com) wrote:
> > On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > > > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan
Gilbert wrote:
> > > > > * Liang Li (liang.z.li at intel.com) wrote:
> > > > > > The current QEMU live migration implementation
mark the all
> > > > > > the guest's RAM pages as dirtied in the ram
bulk stage, all
> > > > > > these pages will be processed and that takes quit
a lot of CPU cycles.
> > > > > >
> > > > > > From guest's point of view, it doesn't
care about the content
> > > > > > in free pages. We can make use of this fact and
skip
> > > > > > processing the free pages in the ram bulk stage,
it can save a
> > > > > > lot CPU cycles and reduce the network traffic
significantly
> > > > > > while speed up the live migration process
obviously.
> > > > > >
> > > > > > This patch set is the QEMU side implementation.
> > > > > >
> > > > > > The virtio-balloon is extended so that QEMU can
get the free
> > > > > > pages information from the guest through virtio.
> > > > > >
> > > > > > After getting the free pages information (a
bitmap), QEMU can
> > > > > > use it to filter out the guest's free pages in
the ram bulk
> > > > > > stage. This make the live migration process much
more efficient.
> > > > >
> > > > > Hi,
> > > > >   An interesting solution; I know a few different
people have
> > > > > been looking at how to speed up ballooned VM migration.
> > > > >
> > > > >   I wonder if it would be possible to avoid the kernel
changes
> > > > > by parsing /proc/self/pagemap - if that can be used to
detect
> > > > > unmapped/zero mapped pages in the guest ram, would it
achieve
> > > > > the
> > > > same result?
> > > >
> > > > Yes I was about to suggest the same thing: it's simple
and makes
> > > > use of the existing infrastructure.  And you wouldn't
need to care
> > > > if the pages were unmapped by ballooning or anything else
> > > > (alternative balloon implementations, not yet touched by the
> > > > guest, etc.).  Besides, you wouldn't need to synchronize
with the guest.
> > > >
> > > > Roman.
> > >
> > > The unmapped/zero mapped pages can be detected by parsing
> > > /proc/self/pagemap, but the free pages can't be detected by
this.
> > > Imaging an application allocates a large amount of memory , after
> > > using, it frees the memory, then live migration happens. All
these free
> pages will be process and sent to the destination, it's not optimal.
> >
> > First, the likelihood of such a situation is marginal, there's no
> > point optimizing for it specifically.
> >
> > And second, even if that happens, you inflate the balloon right before
> > the migration and the free memory will get umapped very quickly, so
> > this case is covered nicely by the same technique that works for more
> > realistic cases, too.
> 
> Although I wonder which is cheaper; that would be fairly expensive for the
> guest wouldn't it? And you'd somehow have to kick the guest before
> migration to do the ballooning - and how long would you wait for it to
finish?
About 5 seconds for an 8G guest, balloon to 1G. Get the free pages bitmap take
about 20ms
for an 8G idle guest.

Liang
> 
> Dave
> 
> >
> > Roman.
> --
> Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK

Roman Kagan

2016-Mar-04 09:35 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On Fri, Mar 04, 2016 at 09:08:20AM +0000, Dr. David Alan Gilbert
wrote:> * Roman Kagan (rkagan at virtuozzo.com) wrote:
> > On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:
> > > The unmapped/zero mapped pages can be detected by parsing
/proc/self/pagemap,
> > > but the free pages can't be detected by this. Imaging an
application allocates a large amount
> > > of memory , after using, it frees the memory, then live migration
happens. All these free pages
> > > will be process and sent to the destination, it's not
optimal.
> > 
> > First, the likelihood of such a situation is marginal, there's no
point
> > optimizing for it specifically.
> > 
> > And second, even if that happens, you inflate the balloon right before
> > the migration and the free memory will get umapped very quickly, so
this
> > case is covered nicely by the same technique that works for more
> > realistic cases, too.
> 
> Although I wonder which is cheaper; that would be fairly expensive for
> the guest wouldn't it?
For the guest -- generally it wouldn't if you have a good estimate of
available memory (i.e. the amount you can balloon out without forcing
the guest to swap).

And yes you need certain cost estimates for choosing the best migration
strategy: e.g. if your network bandwidth is unlimited you may be better
off transferring the zeros to the destination rather than optimizing
them away.
> And you'd somehow have to kick the guest
> before migration to do the ballooning - and how long would you wait
> for it to finish?
It's a matter for fine-tuning with all the inputs at hand, like network
bandwidth, costs of delaying the migration, etc.  And you don't need to
wait for it to finish, i.e. reach the balloon size target: you can start
the migration as soon as it's good enough (for whatever definition of
"enough" is found appropriate by that fine-tuning).

Roman.

Apparently Analagous Threads

Search for more reasonably related threads

Linux Virtualization - Mar 2016 - [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

Apparently Analagous Threads