Xinglong Wu
2012-Nov-25 10:57 UTC
[libvirt-users] Live migration with non-shared storage leads to corrupted file system
Hi, We have the following environment for live-migration with non-shared stroage between two nodes, Host OS: RHEL 6.3 Kernel: 2.6.32-279.el6.x86_64 Qemu-kvm: 1.2.0 libvirt: 0.10.1 and use "virsh" to do the job as virsh -c 'qemu:///system' migrate --live --persistent --copy-storage-all <guest-name> qemu+ssh://<target-node>/system The above command itself returns no error, and the migrated domain in the destination node starts fine. But when I log into the migrated domain, some commands failed immediately. And if I shutdown the domain, it won't boot up any more, complaining about the corrupted file system. Furthermore, I can confirm that the domain before migration works flawlessly after thorough test. The log file in /var/log/libvirt/qemu looks fine without any warnings or errors. And the only error message I can observe is found at /var/log/libvirt/libvirtd.log 2012-11-25 10:00:55.001+0000: 15398: warning : qemuDomainObjBeginJobInternal:838 : Cannot start job (query, none) for domain testVM; current job is (async nested, migration out) owned by (15397, 15397) 2012-11-25 10:00:55.001+0000: 15398: error : qemuDomainObjBeginJobInternal:842 : Timed out during operation: cannot acquire state change lock 2012-11-25 10:00:57.009+0000: 15393: error : virNetSocketReadWire:1184 : End of file while reading data: Input/output error I also noticed that the raw image file used by the migrated domain has the different sizes (reported by "du") before and after the migration. Is there anybody having the similiar experience with live migration on non-shared storage? It apparently leads to failed migrations in libvirt but no cirtical errors ever reported. Brett
Henrik Ahlgren
2012-Nov-25 11:12 UTC
[libvirt-users] Live migration with non-shared storage leads to corrupted file system
On Sun, Nov 25, 2012 at 06:57:19PM +0800, Xinglong Wu wrote:> Is there anybody having the similiar experience with live migration on > non-shared storage? It apparently leads to failed migrations in > libvirt but no cirtical errors ever reported.Make sure you have your driver cache set to "none". I've seen similar things happen with other cache settings, but with "none" it works just fine (but of course that might impose an i/o performance penalty in some cases?). I think the default is "unsafe" in 0.9.7 and later? It would be nice if the cache settings were documented more clearly.