thr3ads.net - Linux Virtualization - [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Li, Liang Z

2016-Mar-04 14:26 UTC

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

> Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration
> optimization
> 
> On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote:
> > > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> > > > >   I wonder if it would be possible to avoid the kernel
changes
> > > > > by parsing /proc/self/pagemap - if that can be used to
detect
> > > > > unmapped/zero mapped pages in the guest ram, would it
achieve
> > > > > the
> > > same result?
> > > >
> > > > Only detect the unmapped/zero mapped pages is not enough.
> Consider
> > > the
> > > > situation like case 2, it can't achieve the same result.
> > >
> > > Your case 2 doesn't exist in the real world.  If people could
stop
> > > their main memory consumer in the guest prior to migration they
> > > wouldn't need live migration at all.
> >
> > The case 2 is just a simplified scenario, not a real case.
> > As long as the guest's memory usage does not keep increasing, or
not
> > always run out, it can be covered by the case 2.
> 
> The memory usage will keep increasing due to ever growing caches, etc, so
> you'll be left with very little free memory fairly soon.
> 
I don't think so.
> > > I tend to think you can safely assume there's no free memory
in the
> > > guest, so there's little point optimizing for it.
> >
> > If this is true, we should not inflate the balloon either.
> 
> We certainly should if there's "available" memory, i.e. not
free but cheap to
> reclaim.
> 
What's your mean by "available" memory? if they are not free, I
don't think it's cheap.
> > > OTOH it makes perfect sense optimizing for the unmapped memory
> > > that's made up, in particular, by the ballon, and consider
inflating
> > > the balloon right before migration unless you already maintain it
at
> > > the optimal size for other reasons (like e.g. a global resource
manager
> optimizing the VM density).
> > >
> >
> > Yes, I believe the current balloon works and it's simple. Do you
take the
> performance impact for consideration?
> > For and 8G guest, it takes about 5s to  inflating the balloon. But it
> > only takes 20ms to  traverse the free_list and construct the free
pages
> bitmap.
> 
> I don't have any feeling of how important the difference is.  And if
the
> limiting factor for balloon inflation speed is the granularity of
communication
> it may be worth optimizing that, because quick balloon reaction may be
> important in certain resource management scenarios.
> 
> > By inflating the balloon, all the guest's pages are still be
processed (zero
> page checking).
> 
> Not sure what you mean.  If you describe the current state of affairs
that's
> exactly the suggested optimization point: skip unmapped pages.
> 
You'd better check the live migration code.
> > The only advantage of ' inflating the balloon before live
migration' is simple,
> nothing more.
> 
> That's a big advantage.  Another one is that it does something useful
in real-
> world scenarios.
> 
I don't think the heave performance impaction is something useful in real
world scenarios.

Liang> Roman.

Michael S. Tsirkin

2016-Mar-04 14:45 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On Fri, Mar 04, 2016 at 02:26:49PM +0000, Li, Liang Z
wrote:> > Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live
migration
> > optimization
> > 
> > On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote:
> > > > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote:
> > > > > >   I wonder if it would be possible to avoid the
kernel changes
> > > > > > by parsing /proc/self/pagemap - if that can be
used to detect
> > > > > > unmapped/zero mapped pages in the guest ram, would
it achieve
> > > > > > the
> > > > same result?
> > > > >
> > > > > Only detect the unmapped/zero mapped pages is not
enough.
> > Consider
> > > > the
> > > > > situation like case 2, it can't achieve the same
result.
> > > >
> > > > Your case 2 doesn't exist in the real world.  If people
could stop
> > > > their main memory consumer in the guest prior to migration
they
> > > > wouldn't need live migration at all.
> > >
> > > The case 2 is just a simplified scenario, not a real case.
> > > As long as the guest's memory usage does not keep increasing,
or not
> > > always run out, it can be covered by the case 2.
> > 
> > The memory usage will keep increasing due to ever growing caches, etc,
so
> > you'll be left with very little free memory fairly soon.
> > 
> 
> I don't think so.
Here's my laptop:
KiB Mem : 16048560 total,  8574956 free,  3360532 used,  4113072 buff/cache

But here's a server:
KiB Mem:  32892768 total, 20092812 used, 12799956 free,   368704 buffers

What is the difference? A ton of tiny daemons not doing anything,
staying resident in memory.
> > > > I tend to think you can safely assume there's no free
memory in the
> > > > guest, so there's little point optimizing for it.
> > >
> > > If this is true, we should not inflate the balloon either.
> > 
> > We certainly should if there's "available" memory, i.e.
not free but cheap to
> > reclaim.
> > 
> 
> What's your mean by "available" memory? if they are not free,
I don't think it's cheap.
clean pages are cheap to drop as they don't have to be written.
whether they will be ever be used is another matter.
> > > > OTOH it makes perfect sense optimizing for the unmapped
memory
> > > > that's made up, in particular, by the ballon, and
consider inflating
> > > > the balloon right before migration unless you already
maintain it at
> > > > the optimal size for other reasons (like e.g. a global
resource manager
> > optimizing the VM density).
> > > >
> > >
> > > Yes, I believe the current balloon works and it's simple. Do
you take the
> > performance impact for consideration?
> > > For and 8G guest, it takes about 5s to  inflating the balloon.
But it
> > > only takes 20ms to  traverse the free_list and construct the free
pages
> > bitmap.
> > 
> > I don't have any feeling of how important the difference is.  And
if the
> > limiting factor for balloon inflation speed is the granularity of
communication
> > it may be worth optimizing that, because quick balloon reaction may be
> > important in certain resource management scenarios.
> > 
> > > By inflating the balloon, all the guest's pages are still be
processed (zero
> > page checking).
> > 
> > Not sure what you mean.  If you describe the current state of affairs
that's
> > exactly the suggested optimization point: skip unmapped pages.
> > 
> 
> You'd better check the live migration code.
What's there to check in migration code?
Here's the extent of what balloon does on output:


        while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn,
4) == 4) {
            ram_addr_t pa;
            ram_addr_t addr;
            int p = virtio_ldl_p(vdev, &pfn);

            pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
            offset += 4;

            /* FIXME: remove get_system_memory(), but how? */
            section = memory_region_find(get_system_memory(), pa, 1);
            if (!int128_nz(section.size) || !memory_region_is_ram(section.mr))
                continue;

            trace_virtio_balloon_handle_output(memory_region_name(section.mr),
                                               pa);
            /* Using memory_region_get_ram_ptr is bending the rules a bit, but
               should be OK because we only want a single page.  */
            addr = section.offset_within_region;
            balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
                         !!(vq == s->dvq));
            memory_region_unref(section.mr);
        }

so all that happens when we get a page is balloon_page.
and

static void balloon_page(void *addr, int deflate)
{
#if defined(__linux__)
    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
                                         kvm_has_sync_mmu())) {
        qemu_madvise(addr, TARGET_PAGE_SIZE,
                deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
    }
#endif
}


Do you see anything that tracks pages to help migration skip
the ballooned memory? I don't.


> > > The only advantage of ' inflating the balloon before live
migration' is simple,
> > nothing more.
> > 
> > That's a big advantage.  Another one is that it does something
useful in real-
> > world scenarios.
> > 
> 
> I don't think the heave performance impaction is something useful in
real world scenarios.
> 
> Liang
> > Roman.
So fix the performance then. You will have to try harder if you want to
convince people that the performance is due to bad host/guest interface,
and so we have to change *that*.

-- 
MST

Li, Liang Z

2016-Mar-04 15:49 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

> > > > > > Only detect the unmapped/zero mapped pages is not
enough.
> > > Consider
> > > > > the
> > > > > > situation like case 2, it can't achieve the
same result.
> > > > >
> > > > > Your case 2 doesn't exist in the real world.  If
people could
> > > > > stop their main memory consumer in the guest prior to
migration
> > > > > they wouldn't need live migration at all.
> > > >
> > > > The case 2 is just a simplified scenario, not a real case.
> > > > As long as the guest's memory usage does not keep
increasing, or
> > > > not always run out, it can be covered by the case 2.
> > >
> > > The memory usage will keep increasing due to ever growing caches,
> > > etc, so you'll be left with very little free memory fairly
soon.
> > >
> >
> > I don't think so.
> 
> Here's my laptop:
> KiB Mem : 16048560 total,  8574956 free,  3360532 used,  4113072 buff/cache
> 
> But here's a server:
> KiB Mem:  32892768 total, 20092812 used, 12799956 free,   368704 buffers
> 
> What is the difference? A ton of tiny daemons not doing anything, staying
> resident in memory.
> 
> > > > > I tend to think you can safely assume there's no
free memory in
> > > > > the guest, so there's little point optimizing for
it.
> > > >
> > > > If this is true, we should not inflate the balloon either.
> > >
> > > We certainly should if there's "available" memory,
i.e. not free but
> > > cheap to reclaim.
> > >
> >
> > What's your mean by "available" memory? if they are not
free, I don't think
> it's cheap.
> 
> clean pages are cheap to drop as they don't have to be written.
> whether they will be ever be used is another matter.
> 
> > > > > OTOH it makes perfect sense optimizing for the unmapped
memory
> > > > > that's made up, in particular, by the ballon, and
consider
> > > > > inflating the balloon right before migration unless you
already
> > > > > maintain it at the optimal size for other reasons (like
e.g. a
> > > > > global resource manager
> > > optimizing the VM density).
> > > > >
> > > >
> > > > Yes, I believe the current balloon works and it's
simple. Do you
> > > > take the
> > > performance impact for consideration?
> > > > For and 8G guest, it takes about 5s to  inflating the
balloon. But
> > > > it only takes 20ms to  traverse the free_list and construct
the
> > > > free pages
> > > bitmap.
> > >
> > > I don't have any feeling of how important the difference is. 
And if
> > > the limiting factor for balloon inflation speed is the
granularity
> > > of communication it may be worth optimizing that, because quick
> > > balloon reaction may be important in certain resource management
> scenarios.
> > >
> > > > By inflating the balloon, all the guest's pages are
still be
> > > > processed (zero
> > > page checking).
> > >
> > > Not sure what you mean.  If you describe the current state of
> > > affairs that's exactly the suggested optimization point: skip
unmapped
> pages.
> > >
> >
> > You'd better check the live migration code.
> 
> What's there to check in migration code?
> Here's the extent of what balloon does on output:
> 
> 
>         while (iov_to_buf(elem->out_sg, elem->out_num, offset,
&pfn, 4) == 4)
> {
>             ram_addr_t pa;
>             ram_addr_t addr;
>             int p = virtio_ldl_p(vdev, &pfn);
> 
>             pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
>             offset += 4;
> 
>             /* FIXME: remove get_system_memory(), but how? */
>             section = memory_region_find(get_system_memory(), pa, 1);
>             if (!int128_nz(section.size) ||
!memory_region_is_ram(section.mr))
>                 continue;
> 
> 
> trace_virtio_balloon_handle_output(memory_region_name(section.mr),
>                                                pa);
>             /* Using memory_region_get_ram_ptr is bending the rules a bit,
but
>                should be OK because we only want a single page.  */
>             addr = section.offset_within_region;
>             balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
>                          !!(vq == s->dvq));
>             memory_region_unref(section.mr);
>         }
> 
> so all that happens when we get a page is balloon_page.
> and
> 
> static void balloon_page(void *addr, int deflate) { #if defined(__linux__)
>     if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
>                                          kvm_has_sync_mmu())) {
>         qemu_madvise(addr, TARGET_PAGE_SIZE,
>                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
>     }
> #endif
> }
> 
> 
> Do you see anything that tracks pages to help migration skip the ballooned
> memory? I don't.
> 
No. And it's exactly what I mean. The ballooned memory is still processed
during
live migration without skipping. The live migration code is in migration/ram.c.
> 
> > > > The only advantage of ' inflating the balloon before
live
> > > > migration' is simple,
> > > nothing more.
> > >
> > > That's a big advantage.  Another one is that it does
something
> > > useful in real- world scenarios.
> > >
> >
> > I don't think the heave performance impaction is something useful
in real
> world scenarios.
> >
> > Liang
> > > Roman.
> 
> So fix the performance then. You will have to try harder if you want to
> convince people that the performance is due to bad host/guest interface,
> and so we have to change *that*.
> 
Actually, the PV solution is irrelevant with the balloon mechanism, I just use
it
to transfer information between host and guest. 
I am not sure if I should implement a new virtio device, and I want to get the
answer from
the community.
In this RFC patch, to make things simple, I choose to extend the virtio-balloon
and use the
extended interface to transfer the request and free_page_bimap content.

I am not intend to change the current virtio-balloon implementation.

Liang
> --
> MST

Paolo Bonzini

2016-Mar-04 16:24 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On 04/03/2016 15:26, Li, Liang Z wrote:>> > 
>> > The memory usage will keep increasing due to ever growing caches,
etc, so
>> > you'll be left with very little free memory fairly soon.
>> > 
> I don't think so.
> 
Roman is right.  For example, here I am looking at a 64 GB (physical)
machine which was booted about 30 minutes ago, and which is running
disk-heavy workloads (installing VMs).

Since I have started writing this email (2 minutes?), the amount of free
memory has already gone down from 37 GB to 33 GB.  I expect that by the
time I have finished running the workload, in two hours, it will not
have any free memory.

Paolo

Dr. David Alan Gilbert

2016-Mar-04 18:51 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

* Paolo Bonzini (pbonzini at redhat.com) wrote:> 
> 
> On 04/03/2016 15:26, Li, Liang Z wrote:
> >> > 
> >> > The memory usage will keep increasing due to ever growing
caches, etc, so
> >> > you'll be left with very little free memory fairly soon.
> >> > 
> > I don't think so.
> > 
> 
> Roman is right.  For example, here I am looking at a 64 GB (physical)
> machine which was booted about 30 minutes ago, and which is running
> disk-heavy workloads (installing VMs).
> 
> Since I have started writing this email (2 minutes?), the amount of free
> memory has already gone down from 37 GB to 33 GB.  I expect that by the
> time I have finished running the workload, in two hours, it will not
> have any free memory.
But what about a VM sitting idle, or that just has more RAM assigned to it
than is currently using.
 I've got a host here that's been up for 46 days and has been doing some
heavy VM debugging a few days ago, but today:

# free -m
              total        used        free      shared  buff/cache   available
Mem:          96536        1146       44834         184       50555       94735

I very rarely use all it's RAM, so it's got a big chunk of free RAM, and
yes
it's got a big chunk of cache as well.

Dave
> 
> Paolo--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK

Li, Liang Z

2016-Mar-09 06:18 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

> On 04/03/2016 15:26, Li, Liang Z wrote:
> >> >
> >> > The memory usage will keep increasing due to ever growing
caches,
> >> > etc, so you'll be left with very little free memory
fairly soon.
> >> >
> > I don't think so.
> >
> 
> Roman is right.  For example, here I am looking at a 64 GB (physical)
machine
> which was booted about 30 minutes ago, and which is running disk-heavy
> workloads (installing VMs).
> 
> Since I have started writing this email (2 minutes?), the amount of free
> memory has already gone down from 37 GB to 33 GB.  I expect that by the
> time I have finished running the workload, in two hours, it will not have
any
> free memory.
> 
> Paolo
I have a VM which has 2GB of RAM, when the guest booted, there were about 1.4GB
of free pages.
Then I tried to download a large file from the internet with the browser, after
the downloading finished,
there were only 72MB of free pages left, as Roman pointed out, there were quite
a lot of Cached memory.
Then I tried to compile the QEMU, after the compiling finished, there were about
1.3G free pages.

So even the cache will increase to a large amount, it will be freed if there are
some other specific workloads.
The cache memory is a big issue that should be taken into consideration.
 How about reclaim some cache before getting the free pages information?  

Liang

Possibly Parallel Threads

Search for more reasonably related threads

Linux Virtualization - Mar 2016 - [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

Possibly Parallel Threads