Andrew Martin
2014-Aug-11 19:06 UTC
[libvirt-users] Behavior of disk caching with qcow2 disks
Hello, I am running several virtualization servers with QEMU 1.4.x and libvirt 1.0.2 on Ubuntu 12.04 and am working on optimizing the cache= and aio= options for the virtual machines. These VM images are mostly qcow2, and are served both from a local ext4 filesystem (with data=ordered,barrier) and from an NFS mountpoint (with sync). The local filesystem sits on top of an md software RAID of SATA HDDs. I have read some conflicting information about which cache option is used by default. This documentation states that cache=writethrough is the default: http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaat/liaatbpkvmguestcache.htm?lang=en However this SuSE documentation claims that QEMU 1.2.x and newer allows the driver to select which cache mode, and it often defaults to cache=writeback: https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html Which is correct? How is the cache mode set by default (if cache= is not specified)? My second question is can cache=none be used safely on a local ext4 filesystem with no BBU? Since ext4 uses barriers, would writing to these qcow2 image files be safe? The kernel documentation about barriers states that "Write barriers enforce proper on-disk ordering of journal commits, making volatile disk write caches safe to use, at some performance penalty". Does this apply to qcow2 VM images? Thanks, Andrew Martin
Kashyap Chamarthy
2014-Aug-12 18:15 UTC
Re: [libvirt-users] Behavior of disk caching with qcow2 disks
On Mon, Aug 11, 2014 at 02:06:54PM -0500, Andrew Martin wrote:> Hello, > > I am running several virtualization servers with QEMU 1.4.x and > libvirt 1.0.2 on Ubuntu 12.04 and am working on optimizing the cache> and aio= options for the virtual machines. These VM images are mostly > qcow2, and are served both from a local ext4 filesystem (with > data=ordered,barrier) and from an NFS mountpoint (with sync). The > local filesystem sits on top of an md software RAID of SATA HDDs. > > I have read some conflicting information about which cache option is > used by default. This documentation states that cache=writethrough is > the default: > http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaat/liaatbpkvmguestcache.htm?lang=enThe above sounds incorrect (refer below).> However this SuSE documentation claims that QEMU 1.2.x and newer > allows the driver to select which cache mode, and it often defaults to > cache=writeback: > https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.htmlThe above seems correct. Looking at `qemu-img` source[1], 'cache=writeback' seems to be the default. That's also corroborated by this[2] (Rich's blog, and libguestfs/virt-tools lead developer). [1] http://git.qemu.org/?p=qemu.git;a=blob;f=qemu-img.c;h=d4518e724f848a6ff8ffaf61656d080de5a08f03;hb=HEAD#l55 [2] http://rwmj.wordpress.com/2013/09/02/new-in-libguestfs-allow-cache-mode-to-be-selected/> Which is correct?"cache=writeback"> How is the cache mode set by default (if cache= is not > specified)?It's compiled into the binary.> My second question is can cache=none be used safely on a local ext4 > filesystem with no BBU? Since ext4 uses barriers, would writing to > these qcow2 image files be safe? The kernel documentation about > barriers states that "Write barriers enforce proper on-disk ordering > of journal commits, making volatile disk write caches safe to use, at > some performance penalty". Does this apply to qcow2 VM images?FWIW, in my test environments (which I should admit - there's not a whole lot of I/O activity), I use: $ qemu-img create -f qcow2 -o preallocation=metadata test1.qcow2 8G Followed by an `fallocate`: $ fallocate -l 8589934592 test1.qcow2 Then, I used to invoke QEMU "cache=none" (setting it in libvirt's guest XML), but lately started using the default "cache=writeback" after the I learnt about the bug from Rich's blog above. -- /kashyap
Andrew Martin
2014-Aug-14 14:45 UTC
Re: [libvirt-users] Behavior of disk caching with qcow2 disks
----- Original Message -----> From: "Kashyap Chamarthy" <kchamart@redhat.com> > > Looking at `qemu-img` source[1], 'cache=writeback' seems to be the > default. That's also corroborated by this[2] (Rich's blog, and > libguestfs/virt-tools lead developer). > > > [1] > http://git.qemu.org/?p=qemu.git;a=blob;f=qemu-img.c;h=d4518e724f848a6ff8ffaf61656d080de5a08f03;hb=HEAD#l55 > [2] > http://rwmj.wordpress.com/2013/09/02/new-in-libguestfs-allow-cache-mode-to-be-selected/ > > > Which is correct? > > > "cache=writeback" > > > How is the cache mode set by default (if cache= is not > > specified)? > > It's compiled into the binary. > > > My second question is can cache=none be used safely on a local ext4 > > filesystem with no BBU? Since ext4 uses barriers, would writing to > > these qcow2 image files be safe? The kernel documentation about > > barriers states that "Write barriers enforce proper on-disk ordering > > of journal commits, making volatile disk write caches safe to use, at > > some performance penalty". Does this apply to qcow2 VM images? > > FWIW, in my test environments (which I should admit - there's not a > whole lot of I/O activity), I use: > > $ qemu-img create -f qcow2 -o preallocation=metadata test1.qcow2 8G > > Followed by an `fallocate`: > > $ fallocate -l 8589934592 test1.qcow2 > > Then, I used to invoke QEMU "cache=none" (setting it in libvirt's guest > XML), but lately started using the default "cache=writeback" after the > I learnt about the bug from Rich's blog above. > > -- > /kashyap >Kashyap, Thanks for the clarification. Rich's article seems to indicate that cache=writeback is safe:> writeback is the new, safe default. Flush commands are obeyed so as long as you’re > using a journalled filesystem or issue guestfs_sync calls your data will be safe.However, I have several VMs running on a server with qemu-kvm 1.4.0 and libguestfs 1.14.8 (older because this is Ubuntu 12.04) using the default cache mode, cache=writeback, and recently this server's UPS experienced a fault so all of the VMs and host lost power. After booting back up, I discovered that the filesystems on 3 of the guests were corrupted, requiring an fsck with a lot of fixes. After fsck finished, data that had been written within the last 24-48 hours on the disks appears to have been corrupted. This makes me think that the data was never synced back to the disk, and would indicate that I can't trust the guest's journalled filesystem. This data was written several hours before the crash, so I would think that should be enough time for an fsync to be called. How can I guarantee the safety of written data on guests whose images are stored on the following types of filesystems: * local ext4 filesystem on a md RAID (no BBU) * NFS mountpoint with the "sync" option Thanks, Andrew