Xinglong Wu
2012-Nov-25 10:57 UTC
[libvirt-users] Live migration with non-shared storage leads to corrupted file system
Hi,
We have the following environment for live-migration with
non-shared stroage between two nodes,
Host OS: RHEL 6.3
Kernel: 2.6.32-279.el6.x86_64
Qemu-kvm: 1.2.0
libvirt: 0.10.1
and use "virsh" to do the job as
virsh -c 'qemu:///system' migrate --live --persistent
--copy-storage-all <guest-name> qemu+ssh://<target-node>/system
The above command itself returns no error, and the migrated domain in
the destination node starts fine. But when I log into the migrated
domain, some commands failed immediately. And if I shutdown the
domain, it won't boot up any more, complaining about the corrupted
file system. Furthermore, I can confirm that the domain before
migration works flawlessly after thorough test.
The log file in /var/log/libvirt/qemu looks fine without any warnings
or errors. And the only error message I can observe is found at
/var/log/libvirt/libvirtd.log
2012-11-25 10:00:55.001+0000: 15398: warning :
qemuDomainObjBeginJobInternal:838 : Cannot start job (query, none) for
domain testVM; current job is (async nested, migration out) owned by
(15397, 15397)
2012-11-25 10:00:55.001+0000: 15398: error :
qemuDomainObjBeginJobInternal:842 : Timed out during operation: cannot
acquire state change lock
2012-11-25 10:00:57.009+0000: 15393: error : virNetSocketReadWire:1184
: End of file while reading data: Input/output error
I also noticed that the raw image file used by the migrated domain has
the different sizes (reported by "du") before and after the migration.
Is there anybody having the similiar experience with live migration on
non-shared storage? It apparently leads to failed migrations in
libvirt but no cirtical errors ever reported.
Brett
Henrik Ahlgren
2012-Nov-25 11:12 UTC
[libvirt-users] Live migration with non-shared storage leads to corrupted file system
On Sun, Nov 25, 2012 at 06:57:19PM +0800, Xinglong Wu wrote:> Is there anybody having the similiar experience with live migration on > non-shared storage? It apparently leads to failed migrations in > libvirt but no cirtical errors ever reported.Make sure you have your driver cache set to "none". I've seen similar things happen with other cache settings, but with "none" it works just fine (but of course that might impose an i/o performance penalty in some cases?). I think the default is "unsafe" in 0.9.7 and later? It would be nice if the cache settings were documented more clearly.