Li, Liang Z
2016-Mar-04 14:26 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration > optimization > > On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote: > > > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote: > > > > > I wonder if it would be possible to avoid the kernel changes > > > > > by parsing /proc/self/pagemap - if that can be used to detect > > > > > unmapped/zero mapped pages in the guest ram, would it achieve > > > > > the > > > same result? > > > > > > > > Only detect the unmapped/zero mapped pages is not enough. > Consider > > > the > > > > situation like case 2, it can't achieve the same result. > > > > > > Your case 2 doesn't exist in the real world. If people could stop > > > their main memory consumer in the guest prior to migration they > > > wouldn't need live migration at all. > > > > The case 2 is just a simplified scenario, not a real case. > > As long as the guest's memory usage does not keep increasing, or not > > always run out, it can be covered by the case 2. > > The memory usage will keep increasing due to ever growing caches, etc, so > you'll be left with very little free memory fairly soon. >I don't think so.> > > I tend to think you can safely assume there's no free memory in the > > > guest, so there's little point optimizing for it. > > > > If this is true, we should not inflate the balloon either. > > We certainly should if there's "available" memory, i.e. not free but cheap to > reclaim. >What's your mean by "available" memory? if they are not free, I don't think it's cheap.> > > OTOH it makes perfect sense optimizing for the unmapped memory > > > that's made up, in particular, by the ballon, and consider inflating > > > the balloon right before migration unless you already maintain it at > > > the optimal size for other reasons (like e.g. a global resource manager > optimizing the VM density). > > > > > > > Yes, I believe the current balloon works and it's simple. Do you take the > performance impact for consideration? > > For and 8G guest, it takes about 5s to inflating the balloon. But it > > only takes 20ms to traverse the free_list and construct the free pages > bitmap. > > I don't have any feeling of how important the difference is. And if the > limiting factor for balloon inflation speed is the granularity of communication > it may be worth optimizing that, because quick balloon reaction may be > important in certain resource management scenarios. > > > By inflating the balloon, all the guest's pages are still be processed (zero > page checking). > > Not sure what you mean. If you describe the current state of affairs that's > exactly the suggested optimization point: skip unmapped pages. >You'd better check the live migration code.> > The only advantage of ' inflating the balloon before live migration' is simple, > nothing more. > > That's a big advantage. Another one is that it does something useful in real- > world scenarios. >I don't think the heave performance impaction is something useful in real world scenarios. Liang> Roman.
Michael S. Tsirkin
2016-Mar-04 14:45 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
On Fri, Mar 04, 2016 at 02:26:49PM +0000, Li, Liang Z wrote:> > Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration > > optimization > > > > On Fri, Mar 04, 2016 at 09:08:44AM +0000, Li, Liang Z wrote: > > > > On Fri, Mar 04, 2016 at 01:52:53AM +0000, Li, Liang Z wrote: > > > > > > I wonder if it would be possible to avoid the kernel changes > > > > > > by parsing /proc/self/pagemap - if that can be used to detect > > > > > > unmapped/zero mapped pages in the guest ram, would it achieve > > > > > > the > > > > same result? > > > > > > > > > > Only detect the unmapped/zero mapped pages is not enough. > > Consider > > > > the > > > > > situation like case 2, it can't achieve the same result. > > > > > > > > Your case 2 doesn't exist in the real world. If people could stop > > > > their main memory consumer in the guest prior to migration they > > > > wouldn't need live migration at all. > > > > > > The case 2 is just a simplified scenario, not a real case. > > > As long as the guest's memory usage does not keep increasing, or not > > > always run out, it can be covered by the case 2. > > > > The memory usage will keep increasing due to ever growing caches, etc, so > > you'll be left with very little free memory fairly soon. > > > > I don't think so.Here's my laptop: KiB Mem : 16048560 total, 8574956 free, 3360532 used, 4113072 buff/cache But here's a server: KiB Mem: 32892768 total, 20092812 used, 12799956 free, 368704 buffers What is the difference? A ton of tiny daemons not doing anything, staying resident in memory.> > > > I tend to think you can safely assume there's no free memory in the > > > > guest, so there's little point optimizing for it. > > > > > > If this is true, we should not inflate the balloon either. > > > > We certainly should if there's "available" memory, i.e. not free but cheap to > > reclaim. > > > > What's your mean by "available" memory? if they are not free, I don't think it's cheap.clean pages are cheap to drop as they don't have to be written. whether they will be ever be used is another matter.> > > > OTOH it makes perfect sense optimizing for the unmapped memory > > > > that's made up, in particular, by the ballon, and consider inflating > > > > the balloon right before migration unless you already maintain it at > > > > the optimal size for other reasons (like e.g. a global resource manager > > optimizing the VM density). > > > > > > > > > > Yes, I believe the current balloon works and it's simple. Do you take the > > performance impact for consideration? > > > For and 8G guest, it takes about 5s to inflating the balloon. But it > > > only takes 20ms to traverse the free_list and construct the free pages > > bitmap. > > > > I don't have any feeling of how important the difference is. And if the > > limiting factor for balloon inflation speed is the granularity of communication > > it may be worth optimizing that, because quick balloon reaction may be > > important in certain resource management scenarios. > > > > > By inflating the balloon, all the guest's pages are still be processed (zero > > page checking). > > > > Not sure what you mean. If you describe the current state of affairs that's > > exactly the suggested optimization point: skip unmapped pages. > > > > You'd better check the live migration code.What's there to check in migration code? Here's the extent of what balloon does on output: while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) { ram_addr_t pa; ram_addr_t addr; int p = virtio_ldl_p(vdev, &pfn); pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT; offset += 4; /* FIXME: remove get_system_memory(), but how? */ section = memory_region_find(get_system_memory(), pa, 1); if (!int128_nz(section.size) || !memory_region_is_ram(section.mr)) continue; trace_virtio_balloon_handle_output(memory_region_name(section.mr), pa); /* Using memory_region_get_ram_ptr is bending the rules a bit, but should be OK because we only want a single page. */ addr = section.offset_within_region; balloon_page(memory_region_get_ram_ptr(section.mr) + addr, !!(vq == s->dvq)); memory_region_unref(section.mr); } so all that happens when we get a page is balloon_page. and static void balloon_page(void *addr, int deflate) { #if defined(__linux__) if (!qemu_balloon_is_inhibited() && (!kvm_enabled() || kvm_has_sync_mmu())) { qemu_madvise(addr, TARGET_PAGE_SIZE, deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); } #endif } Do you see anything that tracks pages to help migration skip the ballooned memory? I don't.> > > The only advantage of ' inflating the balloon before live migration' is simple, > > nothing more. > > > > That's a big advantage. Another one is that it does something useful in real- > > world scenarios. > > > > I don't think the heave performance impaction is something useful in real world scenarios. > > Liang > > Roman.So fix the performance then. You will have to try harder if you want to convince people that the performance is due to bad host/guest interface, and so we have to change *that*. -- MST
Li, Liang Z
2016-Mar-04 15:49 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
> > > > > > Only detect the unmapped/zero mapped pages is not enough. > > > Consider > > > > > the > > > > > > situation like case 2, it can't achieve the same result. > > > > > > > > > > Your case 2 doesn't exist in the real world. If people could > > > > > stop their main memory consumer in the guest prior to migration > > > > > they wouldn't need live migration at all. > > > > > > > > The case 2 is just a simplified scenario, not a real case. > > > > As long as the guest's memory usage does not keep increasing, or > > > > not always run out, it can be covered by the case 2. > > > > > > The memory usage will keep increasing due to ever growing caches, > > > etc, so you'll be left with very little free memory fairly soon. > > > > > > > I don't think so. > > Here's my laptop: > KiB Mem : 16048560 total, 8574956 free, 3360532 used, 4113072 buff/cache > > But here's a server: > KiB Mem: 32892768 total, 20092812 used, 12799956 free, 368704 buffers > > What is the difference? A ton of tiny daemons not doing anything, staying > resident in memory. > > > > > > I tend to think you can safely assume there's no free memory in > > > > > the guest, so there's little point optimizing for it. > > > > > > > > If this is true, we should not inflate the balloon either. > > > > > > We certainly should if there's "available" memory, i.e. not free but > > > cheap to reclaim. > > > > > > > What's your mean by "available" memory? if they are not free, I don't think > it's cheap. > > clean pages are cheap to drop as they don't have to be written. > whether they will be ever be used is another matter. > > > > > > OTOH it makes perfect sense optimizing for the unmapped memory > > > > > that's made up, in particular, by the ballon, and consider > > > > > inflating the balloon right before migration unless you already > > > > > maintain it at the optimal size for other reasons (like e.g. a > > > > > global resource manager > > > optimizing the VM density). > > > > > > > > > > > > > Yes, I believe the current balloon works and it's simple. Do you > > > > take the > > > performance impact for consideration? > > > > For and 8G guest, it takes about 5s to inflating the balloon. But > > > > it only takes 20ms to traverse the free_list and construct the > > > > free pages > > > bitmap. > > > > > > I don't have any feeling of how important the difference is. And if > > > the limiting factor for balloon inflation speed is the granularity > > > of communication it may be worth optimizing that, because quick > > > balloon reaction may be important in certain resource management > scenarios. > > > > > > > By inflating the balloon, all the guest's pages are still be > > > > processed (zero > > > page checking). > > > > > > Not sure what you mean. If you describe the current state of > > > affairs that's exactly the suggested optimization point: skip unmapped > pages. > > > > > > > You'd better check the live migration code. > > What's there to check in migration code? > Here's the extent of what balloon does on output: > > > while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) > { > ram_addr_t pa; > ram_addr_t addr; > int p = virtio_ldl_p(vdev, &pfn); > > pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT; > offset += 4; > > /* FIXME: remove get_system_memory(), but how? */ > section = memory_region_find(get_system_memory(), pa, 1); > if (!int128_nz(section.size) || !memory_region_is_ram(section.mr)) > continue; > > > trace_virtio_balloon_handle_output(memory_region_name(section.mr), > pa); > /* Using memory_region_get_ram_ptr is bending the rules a bit, but > should be OK because we only want a single page. */ > addr = section.offset_within_region; > balloon_page(memory_region_get_ram_ptr(section.mr) + addr, > !!(vq == s->dvq)); > memory_region_unref(section.mr); > } > > so all that happens when we get a page is balloon_page. > and > > static void balloon_page(void *addr, int deflate) { #if defined(__linux__) > if (!qemu_balloon_is_inhibited() && (!kvm_enabled() || > kvm_has_sync_mmu())) { > qemu_madvise(addr, TARGET_PAGE_SIZE, > deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); > } > #endif > } > > > Do you see anything that tracks pages to help migration skip the ballooned > memory? I don't. >No. And it's exactly what I mean. The ballooned memory is still processed during live migration without skipping. The live migration code is in migration/ram.c.> > > > > The only advantage of ' inflating the balloon before live > > > > migration' is simple, > > > nothing more. > > > > > > That's a big advantage. Another one is that it does something > > > useful in real- world scenarios. > > > > > > > I don't think the heave performance impaction is something useful in real > world scenarios. > > > > Liang > > > Roman. > > So fix the performance then. You will have to try harder if you want to > convince people that the performance is due to bad host/guest interface, > and so we have to change *that*. >Actually, the PV solution is irrelevant with the balloon mechanism, I just use it to transfer information between host and guest. I am not sure if I should implement a new virtio device, and I want to get the answer from the community. In this RFC patch, to make things simple, I choose to extend the virtio-balloon and use the extended interface to transfer the request and free_page_bimap content. I am not intend to change the current virtio-balloon implementation. Liang> -- > MST
Paolo Bonzini
2016-Mar-04 16:24 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
On 04/03/2016 15:26, Li, Liang Z wrote:>> > >> > The memory usage will keep increasing due to ever growing caches, etc, so >> > you'll be left with very little free memory fairly soon. >> > > I don't think so. >Roman is right. For example, here I am looking at a 64 GB (physical) machine which was booted about 30 minutes ago, and which is running disk-heavy workloads (installing VMs). Since I have started writing this email (2 minutes?), the amount of free memory has already gone down from 37 GB to 33 GB. I expect that by the time I have finished running the workload, in two hours, it will not have any free memory. Paolo
Dr. David Alan Gilbert
2016-Mar-04 18:51 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
* Paolo Bonzini (pbonzini at redhat.com) wrote:> > > On 04/03/2016 15:26, Li, Liang Z wrote: > >> > > >> > The memory usage will keep increasing due to ever growing caches, etc, so > >> > you'll be left with very little free memory fairly soon. > >> > > > I don't think so. > > > > Roman is right. For example, here I am looking at a 64 GB (physical) > machine which was booted about 30 minutes ago, and which is running > disk-heavy workloads (installing VMs). > > Since I have started writing this email (2 minutes?), the amount of free > memory has already gone down from 37 GB to 33 GB. I expect that by the > time I have finished running the workload, in two hours, it will not > have any free memory.But what about a VM sitting idle, or that just has more RAM assigned to it than is currently using. I've got a host here that's been up for 46 days and has been doing some heavy VM debugging a few days ago, but today: # free -m total used free shared buff/cache available Mem: 96536 1146 44834 184 50555 94735 I very rarely use all it's RAM, so it's got a big chunk of free RAM, and yes it's got a big chunk of cache as well. Dave> > Paolo-- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
Li, Liang Z
2016-Mar-09 06:18 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
> On 04/03/2016 15:26, Li, Liang Z wrote: > >> > > >> > The memory usage will keep increasing due to ever growing caches, > >> > etc, so you'll be left with very little free memory fairly soon. > >> > > > I don't think so. > > > > Roman is right. For example, here I am looking at a 64 GB (physical) machine > which was booted about 30 minutes ago, and which is running disk-heavy > workloads (installing VMs). > > Since I have started writing this email (2 minutes?), the amount of free > memory has already gone down from 37 GB to 33 GB. I expect that by the > time I have finished running the workload, in two hours, it will not have any > free memory. > > PaoloI have a VM which has 2GB of RAM, when the guest booted, there were about 1.4GB of free pages. Then I tried to download a large file from the internet with the browser, after the downloading finished, there were only 72MB of free pages left, as Roman pointed out, there were quite a lot of Cached memory. Then I tried to compile the QEMU, after the compiling finished, there were about 1.3G free pages. So even the cache will increase to a large amount, it will be freed if there are some other specific workloads. The cache memory is a big issue that should be taken into consideration. How about reclaim some cache before getting the free pages information? Liang
Reasonably Related Threads
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
- [RFC for QEMU] virtio-balloon: Add option thp-order to set VIRTIO_BALLOON_F_THP_ORDER
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization