vincent@cojot.name
2019-Jul-30 20:40 UTC
[libvirt-users] Researching why different cache modes result in 'some' guest filesystem corruption..
Hi All, I've been chasing down an issue in recent weeks (my own lab, so no prod here) and I'm reaching out in case someone might have some guidance to share. I'm running fairly large VMs (RHOSP underclouds - 8vcpu, 32gb ram, about 200gb single disk as a growable qcow2) on some RHEL7.6 hypervisors (kernel 3.10.0-927.2x.y, libvirt 4.5.0, qemu-kvm-1.5.3) on top of SSD/NVMe drives with various filesystems (vxfs, zfs, etc..) and using ECC RAM. The issue can be described as follows: - the guest VMs work fine for a while (days, weeks) but after a kernel update (z-stream) comes in, I am often greeted by the following message immediately after rebooting (or attempting to reboot into the new kernel): "error: not a correct xfs inode" - booting the previous kernel works fine and re-generating the initramfs for the new kernel (from the n-1 kernel) does not solve anything. - if booted from an ISO, xfs_repair does not find errors. - on ext4, there seems to be some kind of corruption there too. I'm building the initial guest image qcow2 for those guest VMs this way: 1) start with a rhel-guest image (currently rhel-server-7.6-update-5-x86_64-kvm.qcow2) 2) convert to LVM by doing this: qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=1048576,lazy_refcounts=off final_guest.qcow2 512G virt-format -a final_guest.qcow2 --partition=mbr --lvm=/dev/rootdg/lv_root --filesystem=xfs guestfish --ro -a rhel_guest.qcow2 -m /dev/sda1 -- tar-out / - | \ guestfish --rw -a final_guest.qcow2 -m /dev/rootdg/lv_root -- tar-in - / 3) use "final_guest.qcow2" as the basis for my guests with LVM. After chasing down this issue some more and attempting various things (build the image on Fedora29, build a real XFS filesystem inside a VM and use the generated qcow2 as a basis instead of virt-format).. ..I've noticed that the SATA disk of each of those guests were using 'directsync' (instead of 'Hypervisor Default'). As soon as I switched to 'None', the XFS issues disappeared and I've now applied several consecutive kernel updates without issues. Also, 'directsync' or 'writethrough', while providing decent performance, both exhibited the XFS 'corruption' behaviour.. Only 'none' seem to have solved that. I've read the docs but I thought it was OK to use those modes (UPS, Battery-Backed RAID, etc..) Does anyone have any idea what's going on or what I may be doing wrong? Thanks for reading, Vincent
Michal Privoznik
2019-Aug-02 07:20 UTC
Re: [libvirt-users] Researching why different cache modes result in 'some' guest filesystem corruption..
On 7/30/19 10:40 PM, vincent@cojot.name wrote:> > Does anyone have any idea what's going on or what I may be doing wrong?I don't think it is you doing something wrong. The way shutdown works is that libvirt tells qemu to shut down (or in case of guest initiated shutdown it is guest OS who says that), and then qemu starts flushing its own internal caches (note that these are on the top of host kernel FS caches). And once done, qemu sends libvirt an event to which libvirt reacts by killing the process. However, if there is a bug in qemu so that the event is sent before all cahces were flushed it may lead to disk corruption. This also corresponds to your experience where cache='none' makes the bug go away because in that case qemu doesn't add any of its caches into the picture and thus no disk corruption is possible. Unfortunatelly, I don't know enough about qemu to suggest where the bug might be, sorry. Michal