Roman Kagan
2016-Mar-04 07:55 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:> * Liang Li (liang.z.li at intel.com) wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free > > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > > the network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > Hi, > An interesting solution; I know a few different people have been looking > at how to speed up ballooned VM migration. > > I wonder if it would be possible to avoid the kernel changes by > parsing /proc/self/pagemap - if that can be used to detect unmapped/zero > mapped pages in the guest ram, would it achieve the same result?Yes I was about to suggest the same thing: it's simple and makes use of the existing infrastructure. And you wouldn't need to care if the pages were unmapped by ballooning or anything else (alternative balloon implementations, not yet touched by the guest, etc.). Besides, you wouldn't need to synchronize with the guest. Roman.
Li, Liang Z
2016-Mar-04 08:23 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
> On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote: > > * Liang Li (liang.z.li at intel.com) wrote: > > > The current QEMU live migration implementation mark the all the > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > > will be processed and that takes quit a lot of CPU cycles. > > > > > > From guest's point of view, it doesn't care about the content in > > > free pages. We can make use of this fact and skip processing the > > > free pages in the ram bulk stage, it can save a lot CPU cycles and > > > reduce the network traffic significantly while speed up the live > > > migration process obviously. > > > > > > This patch set is the QEMU side implementation. > > > > > > The virtio-balloon is extended so that QEMU can get the free pages > > > information from the guest through virtio. > > > > > > After getting the free pages information (a bitmap), QEMU can use it > > > to filter out the guest's free pages in the ram bulk stage. This > > > make the live migration process much more efficient. > > > > Hi, > > An interesting solution; I know a few different people have been > > looking at how to speed up ballooned VM migration. > > > > I wonder if it would be possible to avoid the kernel changes by > > parsing /proc/self/pagemap - if that can be used to detect > > unmapped/zero mapped pages in the guest ram, would it achieve the > same result? > > Yes I was about to suggest the same thing: it's simple and makes use of the > existing infrastructure. And you wouldn't need to care if the pages were > unmapped by ballooning or anything else (alternative balloon > implementations, not yet touched by the guest, etc.). Besides, you wouldn't > need to synchronize with the guest. > > Roman.The unmapped/zero mapped pages can be detected by parsing /proc/self/pagemap, but the free pages can't be detected by this. Imaging an application allocates a large amount of memory , after using, it frees the memory, then live migration happens. All these free pages will be process and sent to the destination, it's not optimal. Liang
Roman Kagan
2016-Mar-04 08:35 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
On Fri, Mar 04, 2016 at 08:23:09AM +0000, Li, Liang Z wrote:> > On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote: > > > * Liang Li (liang.z.li at intel.com) wrote: > > > > The current QEMU live migration implementation mark the all the > > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > > > will be processed and that takes quit a lot of CPU cycles. > > > > > > > > From guest's point of view, it doesn't care about the content in > > > > free pages. We can make use of this fact and skip processing the > > > > free pages in the ram bulk stage, it can save a lot CPU cycles and > > > > reduce the network traffic significantly while speed up the live > > > > migration process obviously. > > > > > > > > This patch set is the QEMU side implementation. > > > > > > > > The virtio-balloon is extended so that QEMU can get the free pages > > > > information from the guest through virtio. > > > > > > > > After getting the free pages information (a bitmap), QEMU can use it > > > > to filter out the guest's free pages in the ram bulk stage. This > > > > make the live migration process much more efficient. > > > > > > Hi, > > > An interesting solution; I know a few different people have been > > > looking at how to speed up ballooned VM migration. > > > > > > I wonder if it would be possible to avoid the kernel changes by > > > parsing /proc/self/pagemap - if that can be used to detect > > > unmapped/zero mapped pages in the guest ram, would it achieve the > > same result? > > > > Yes I was about to suggest the same thing: it's simple and makes use of the > > existing infrastructure. And you wouldn't need to care if the pages were > > unmapped by ballooning or anything else (alternative balloon > > implementations, not yet touched by the guest, etc.). Besides, you wouldn't > > need to synchronize with the guest. > > > > Roman. > > The unmapped/zero mapped pages can be detected by parsing /proc/self/pagemap, > but the free pages can't be detected by this. Imaging an application allocates a large amount > of memory , after using, it frees the memory, then live migration happens. All these free pages > will be process and sent to the destination, it's not optimal.First, the likelihood of such a situation is marginal, there's no point optimizing for it specifically. And second, even if that happens, you inflate the balloon right before the migration and the free memory will get umapped very quickly, so this case is covered nicely by the same technique that works for more realistic cases, too. Roman.
Apparently Analagous Threads
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
- [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
- [RFC qemu 0/4] A PV solution for live migration optimization
- [RFC qemu 0/4] A PV solution for live migration optimization