Hey folks, I am running libvirt 1.2.4 and qemu 2.1 on a 3.14.27 kernel. I've found that live migrating a relatively large VM (16 cores and 64G ram) is taking forever - close to 15 hours now, and still not done... With "lsof -i", I can see a connection is established from my source hypervisor to a target hypervisor, likely for the purpose of copying data. nettop shows that this connection is constantly sending 50-60MBps traffic. The VM being migrated has a disk on ceph by using librbd. I wonder if anyone has seen similar issues, and how I could troubleshoot further. (I tried but failed to get qemu monitor to work on the VM...) Thanks. -Simon
On Thu, Jan 22, 2015 at 9:11 PM, Xu (Simon) Chen <xchenum@gmail.com> wrote:> Hey folks, > > I am running libvirt 1.2.4 and qemu 2.1 on a 3.14.27 kernel. I've found that > live migrating a relatively large VM (16 cores and 64G ram) is taking > forever - close to 15 hours now, and still not done... > > With "lsof -i", I can see a connection is established from my source > hypervisor to a target hypervisor, likely for the purpose of copying data. > nettop shows that this connection is constantly sending 50-60MBps traffic. > The VM being migrated has a disk on ceph by using librbd. > > I wonder if anyone has seen similar issues, and how I could troubleshoot > further. (I tried but failed to get qemu monitor to work on the VM...) > > Thanks. > -Simon >Hi, under certain conditions, like memory-intensive procedures inside guest, live migration effectively will have no end, and this is expected behavior. You may want to use --timeout parameter for fallback interval to non-live migration for example. If your vm is sitting idle, the observed behavior is most probably a bug.
On Thu, Jan 22, 2015 at 22:40:59 +0400, Andrey Korolyov wrote:> On Thu, Jan 22, 2015 at 9:11 PM, Xu (Simon) Chen <xchenum@gmail.com> wrote: > > Hey folks, > > > > I am running libvirt 1.2.4 and qemu 2.1 on a 3.14.27 kernel. I've found that > > live migrating a relatively large VM (16 cores and 64G ram) is taking > > forever - close to 15 hours now, and still not done... > > > > With "lsof -i", I can see a connection is established from my source > > hypervisor to a target hypervisor, likely for the purpose of copying data. > > nettop shows that this connection is constantly sending 50-60MBps traffic. > > The VM being migrated has a disk on ceph by using librbd. > > > > I wonder if anyone has seen similar issues, and how I could troubleshoot > > further. (I tried but failed to get qemu monitor to work on the VM...) > > > > Thanks. > > -Simon > > > > > Hi, under certain conditions, like memory-intensive procedures inside > guest, live migration effectively will have no end, and this is > expected behavior. You may want to use --timeout parameter for > fallback interval to non-live migration for example. If your vm is > sitting idle, the observed behavior is most probably a bug.Or you can try to play with --auto-converge and --compressed options of virsh migrate. Mainly the --auto-converge options was designed to help in you situation. If used, QEMU will automatically slow down guest CPUs so that it cannot change too much memory during the migration. It may be better than non-live migration in case you need the guest to be at least partially responsive. Jirka
Apparently Analagous Threads
- Re: live migration taking forever
- Why librbd disallow VM live migration if the disk cache mode is not none or directsync
- Create RBD Format 2 disk images with qemu-image
- Re: Why librbd disallow VM live migration if the disk cache mode is not none or directsync
- Libvirt pool cannot see or create rbd clones