Dion Kant
2012-Sep-20 08:57 UTC
GPLPV Disk performance block device vs. file based backend
Dear all, I have two Windows 2008 VMs with latest gplpv drivers installed. One is based on a file container name="wsrv-file" disk=[ ''file:/var/lib/xen/images/wsrv-file/disk0.raw,xvda,w'', ] and the other one is based on a block device. name="wsrv-bd" disk=[ ''phy:/dev/vg0/wsrv-bd,xvda,w'', ] The block device represents a logical volume consisting of 4 disks and is configured with 4 stripes. The file lives on another logical volume in the same volume group and is formatted with XFS. Now I measure more than a factor 3 better I/O performance on the file based VM as compared to the block device based VM. I don''t think it is a cache issue which is tricking me. Details of my test and setup are given below. Hardware: Supermicro X8DTU 2x CPU E5620 RAM 12 GB (6x2GB DIMM) Disks: 4x 2TB Controller (Adaptec) SMC AOC-USAS-S4i (driver: aacraid) Disk setup JBOD Software: openSuSE 12.2 x86_64 distribution00000 kernel: 3.4.6-2.10-xen Xen: 4.1.3_01-5.6.200000 dom0 lives on a 80 GB RAID10 at the outside of the disks, the remaining partitions are directly (no raid) to a volume group. The VM disks live on logical volumes in that volume group. Both Windows VMs are similar Windows Server 2008 R2 Enterprise all updates installed. They are assigned 4 cpus (no pinning) and 4 GB RAM. Test setup: When running a test, only one VM is running, no other things are happening (just the test) . Installed Cygwin and use dd for writing to disk: sync; time (dd if=/dev/zero of=test.bin bs=4096 count=5000000; sync) Measured time is used to calculate streaming rate, but this could hide caching in dom0. However, while running the test, in dom0 a vmstat -n 1 is running so the actual write performance to the block devices can be observed in dom0. Note we are not looking for some marginal effect! I noticed that for the file based VM, the "bo" results from vmstat are doubled, i.e. the bytes written to the disk file living on the LV are counted twice. This does not happen for a block device based disk. 00000 block device based: 150 MB/s file based: >450 MB/s I have reproduced this behaviour using previous versions of openSUSE (older Xen) and other Windows versions (XP, 2003). Furthermore I observed this behaviour on other hardware, mostly tested this on Supermicro mother boards. I can provide more details if required and I can do more testing as well. Regards, Dion
Fajar A. Nugraha
2012-Sep-20 10:16 UTC
Re: GPLPV Disk performance block device vs. file based backend
On Thu, Sep 20, 2012 at 3:57 PM, Dion Kant <dion@concero.nl> wrote:> name="wsrv-file" > disk=[ ''file:/var/lib/xen/images/wsrv-file/disk0.raw,xvda,w'', ]> name="wsrv-bd" > disk=[ ''phy:/dev/vg0/wsrv-bd,xvda,w'', ]> Now I measure more than a factor 3 better I/O performance on the file > based VM as compared to the block device based VM. I don''t think it is a > cache issue which is tricking me.I''m 99.9% sure it tricks you :)> sync; time (dd if=/dev/zero of=test.bin bs=4096 count=5000000; sync)dd is terrible for benchmark purposes. I''d suggest fio, random rw, data size at least twice RAM.> I noticed that for the file based VM, the "bo" results from vmstat are > doubled, i.e. the bytes written to the disk file living on the LV are > counted twice.probably because file:/ uses loopback, which is counted as another block device.> I can provide more details if required and I can do more testing as well.There are many factors involved: file vs phy, file-backed vs LV-backed, windows, gplpv, etc. What I suggest is: - use linux pv domU, one backed with LV, the other with file - use tap:aio for the file-backed one (NOT file:/) - use fio for testing That SHOULD eliminate most other factors, and allow you to focus on file-tap vs LV-phy. -- Fajar
Dion Kant
2012-Sep-20 14:16 UTC
Re: GPLPV Disk performance block device vs. file based backend
On 09/20/2012 12:16 PM, Fajar A. Nugraha wrote:> On Thu, Sep 20, 2012 at 3:57 PM, Dion Kant <dion@concero.nl> wrote: > > >> Now I measure more than a factor 3 better I/O performance on the file >> based VM as compared to the block device based VM. I don''t think it is a >> cache issue which is tricking me. > I''m 99.9% sure it tricks you :)Hi Fajar, Thank you for leaving me 0.01% of uncertainty ;)>> sync; time (dd if=/dev/zero of=test.bin bs=4096 count=5000000; sync) > dd is terrible for benchmark purposes. I''d suggest fio, random rw, > data size at least twice RAM.I don''t care about random rw, I am looking at the speed of which nicely ordered data is streamed to a set of disks, observed with vmstat in dom0. Note I write 20GB, so I have plenty time to make sure that all caches are filled in dom0 and writing to the disks has to start. There is 8 GB in dom0 left, furthermore the sync of Cygwin really does its job. I''ll have a look at fio anyway.....>> I noticed that for the file based VM, the "bo" results from vmstat are >> doubled, i.e. the bytes written to the disk file living on the LV are >> counted twice. > probably because file:/ uses loopback, which is counted as another block device.Ok that shall be the reason.>> I can provide more details if required and I can do more testing as well. > There are many factors involved: file vs phy, file-backed vs > LV-backed, windows, gplpv, etc. What I suggest is: > > - use linux pv domU, one backed with LV, the other with file > - use tap:aio for the file-backed one (NOT file:/) > - use fio for testing > > That SHOULD eliminate most other factors, and allow you to focus on > file-tap vs LV-phy.I don''t have this issue with Linux PV domUs. I think it is something related to GPLPV or HVM. I''ll do this anyway again and report on the results. If I recall correctly from my tests in the past with PV Linux, using phy: tap:aio or file: only differs a little bit (<10%). Here we are talking about a factor >3. Thanks, Dion
Pasi Kärkkäinen
2012-Nov-30 11:16 UTC
Re: GPLPV Disk performance block device vs. file based backend
On Thu, Sep 20, 2012 at 05:16:53PM +0700, Fajar A. Nugraha wrote:> On Thu, Sep 20, 2012 at 3:57 PM, Dion Kant <dion@concero.nl> wrote: > > > name="wsrv-file" > > disk=[ ''file:/var/lib/xen/images/wsrv-file/disk0.raw,xvda,w'', ] > > > > name="wsrv-bd" > > disk=[ ''phy:/dev/vg0/wsrv-bd,xvda,w'', ] > > > Now I measure more than a factor 3 better I/O performance on the file > > based VM as compared to the block device based VM. I don''t think it is a > > cache issue which is tricking me. > > I''m 99.9% sure it tricks you :) > > > sync; time (dd if=/dev/zero of=test.bin bs=4096 count=5000000; sync) > > dd is terrible for benchmark purposes. I''d suggest fio, random rw, > data size at least twice RAM. >you can tell dd to do direct io bypassing caches so you get proper results: iflag=direct or oflag=direct The biggest limitation is that dd is always single threaded (unless you launch multiple copies). -- Pasi