[originally reported for ZFS on FreeBSD but Pawel Jakub Dawid says this problem also exists on Solaris hence this email.] Summary: on ZFS, overhead for reading a hole seems far worse than actual reading from a disk. Small buffers are used to make this overhead more visible. I ran the following script on both ZFS and UF2 filesystems. [Note that on FreeBSD cat uses a 4k buffer and md5 uses a 1k buffer. On Solaris you can replace them with dd with respective buffer sizes for this test and you should see similar results.] $ dd </dev/zero bs=1m count=10240 >SPACY# 10G zero bytes allocated $ truncate -s 10G HOLEY # no space allocated $ time dd <SPACY >/dev/null bs=1m # A1 $ time dd <HOLEY >/dev/null bs=1m # A2 $ time cat SPACY >/dev/null # B1 $ time cat HOLEY >/dev/null # B2 $ time md5 SPACY # C1 $ time md5 HOLEY # C2 I have summarized the results below. ZFS UFS2 Elapsed System Elapsed System Test dd SPACY bs=1m 110.26 22.52 340.38 19.11 A1 dd HOLEY bs=1m 22.44 22.41 24.24 24.13 A2 cat SPACY 119.64 33.04 342.77 17.30 B1 cat HOLEY 222.85 222.08 22.91 22.41 B2 md5 SPACY 210.01 77.46 337.51 25.54 C1 md5 HOLEY 856.39 801.21 82.11 28.31 C2 A1, A2: Numbers are more or less as expected. When doing large reads, reading from "holes" takes far less time than from a real disk. We also see that UFS2 disk is about 3 times slower for sequential reads. B1, B2: UFS2 numbers are as expected but ZFS numbers for the HOLEY file are much too high. Why should *not* going to a real disk cost more? We also see that UFS2 handles holey files 10 times more efficiently than ZFS! C1, C2: Again UFS2 numbers and C1 numbers for ZFS are as expected. but C2 numbers for ZFS are very high. md5 uses BLKSIZ (=1k) size reads and does hardly any other system calls. For ZFS each syscall takes 76.4 microseconds while UFS2 syscalls are 2.7 us each! zpool iostat shows there is no IO to the real disk so this implies that for the HOLEY case zfs read calls have a significantly higher overhead or there is a bug. Basically C tests just confirm what we find in B tests.
Pawel Jakub Dawidek
2007-May-03 22:21 UTC
[zfs-discuss] ZFS vs UFS2 overhead and may be a bug?
On Thu, May 03, 2007 at 02:15:45PM -0700, Bakul Shah wrote:> [originally reported for ZFS on FreeBSD but Pawel Jakub Dawid > says this problem also exists on Solaris hence this email.]Thanks!> Summary: on ZFS, overhead for reading a hole seems far worse > than actual reading from a disk. Small buffers are used to > make this overhead more visible. > > I ran the following script on both ZFS and UF2 filesystems. > > [Note that on FreeBSD cat uses a 4k buffer and md5 uses a 1k > buffer. On Solaris you can replace them with dd with > respective buffer sizes for this test and you should see > similar results.] > > $ dd </dev/zero bs=1m count=10240 >SPACY# 10G zero bytes allocated > $ truncate -s 10G HOLEY # no space allocated > > $ time dd <SPACY >/dev/null bs=1m # A1 > $ time dd <HOLEY >/dev/null bs=1m # A2 > $ time cat SPACY >/dev/null # B1 > $ time cat HOLEY >/dev/null # B2 > $ time md5 SPACY # C1 > $ time md5 HOLEY # C2 > > I have summarized the results below. > > ZFS UFS2 > Elapsed System Elapsed System Test > dd SPACY bs=1m 110.26 22.52 340.38 19.11 A1 > dd HOLEY bs=1m 22.44 22.41 24.24 24.13 A2 > > cat SPACY 119.64 33.04 342.77 17.30 B1 > cat HOLEY 222.85 222.08 22.91 22.41 B2 > > md5 SPACY 210.01 77.46 337.51 25.54 C1 > md5 HOLEY 856.39 801.21 82.11 28.31 C2This is what I see on Solaris (hole is 4GB): # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k real 23.7 # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k real 21.2 # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k real 31.4 # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k real 7:32.2 -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070504/fc3e4b92/attachment.bin>
Pawel Jakub Dawidek wrote:> This is what I see on Solaris (hole is 4GB): > > # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k > real 23.7 > # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k > real 21.2 > > # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k > real 31.4 > # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k > real 7:32.2This is probably because the time to execute this on ZFS is dominated by per-systemcall costs, rather than per-byte costs. You are doing 32x more system calls with the 4k blocksize, and it is taking 20x longer. That said, I could be wrong, and yowtch, that''s much slower than I''d like! --matt
> Pawel Jakub Dawidek wrote: > > This is what I see on Solaris (hole is 4GB): > > > > # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k > > real 23.7 > > # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k > > real 21.2 > > > > # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k > > real 31.4 > > # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k > > real 7:32.2 > > This is probably because the time to execute this on ZFS is dominated by > per-systemcall costs, rather than per-byte costs. You are doing 32x more > system calls with the 4k blocksize, and it is taking 20x longer. > > That said, I could be wrong, and yowtch, that''s much slower than I''d like!You missed my earlier post where I showed accessing a hole file takes much longer than accessing a regular data file for blocksize of 4k and below. I will repeat the most dramatic difference: ZFS UFS2 Elapsed System Elapsed System md5 SPACY 210.01 77.46 337.51 25.54 md5 HOLEY 856.39 801.21 82.11 28.31 I used md5 because all but a couple of syscalls are for reading the file (with a buffer of 1K). dd would make an equal number of calls for writing. For both file systems and both cases the filesize is the same but SPACY has 10GB allocated while HOLEY was created with truncate -s 10G HOLEY. Look at the system times. On UFS2 system time is a little bit more for the HOLEY case because it has to clear a block. ON ZFS it is over 10 times more! Something is very wrong.
Robert Milkowski
2007-May-08 08:50 UTC
[zfs-discuss] ZFS vs UFS2 overhead and may be a bug?
Hello Matthew, Tuesday, May 8, 2007, 1:04:56 AM, you wrote: MA> Pawel Jakub Dawidek wrote:>> This is what I see on Solaris (hole is 4GB): >> >> # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k >> real 23.7 >> # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k >> real 21.2 >> >> # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k >> real 31.4 >> # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k >> real 7:32.2MA> This is probably because the time to execute this on ZFS is dominated by MA> per-systemcall costs, rather than per-byte costs. You are doing 32x more MA> system calls with the 4k blocksize, and it is taking 20x longer. MA> That said, I could be wrong, and yowtch, that''s much slower than I''d like! But 4k for UFS took him 31s while 4k for ZFS took him 14x more time! In both cases the same number of syscalls were executed. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Matthew, > > Tuesday, May 8, 2007, 1:04:56 AM, you wrote: > > MA> Pawel Jakub Dawidek wrote: >>> This is what I see on Solaris (hole is 4GB): >>> >>> # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k >>> real 23.7 >>> # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k >>> real 21.2 >>> >>> # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k >>> real 31.4 >>> # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k >>> real 7:32.2 > > MA> This is probably because the time to execute this on ZFS is dominated by > MA> per-systemcall costs, rather than per-byte costs. You are doing 32x more > MA> system calls with the 4k blocksize, and it is taking 20x longer. > > MA> That said, I could be wrong, and yowtch, that''s much slower than I''d like! > > > But 4k for UFS took him 31s while 4k for ZFS took him 14x more time! > In both cases the same number of syscalls were executed.So, I''m hearing two claims here: ZFS is much slower than UFS when reading a sparse file, and ZFS is much slower at reading a sparse file than a filled-in file. However, I am not able to reproduce these results. On a 2-CPU, 2.2GHz Opteron, with most recent Nevada bits, nondebug: on ZFS, 4k recordsize, ptime dd if=file of=/dev/null bs=4k sparse: 1.3sec real filled, cached: 0.9sec real on ZFS, 1k recordsize, ptime dd if=file of=/dev/null bs=1k sparse: 5.4sec real filled, cached: 3.5sec real on UFS, (4k blocksize), ptime dd if=file of=/dev/null bs=4k sparse, cached: 0.8sec real filled, cached: 0.8sec real In summary: ZFS is ~40% slower than UFS when reading sparse files (with 4k block/recordsize). ZFS is ~40% slower reading a sparse file than a cached filled-in file (with 4k or 1k recordsize). This is because we don''t cache the sparse buffers, so we spend more time instantiating and zeroing them. So, I''m not sure how you are getting your 20x number. Are you sure that you aren''t using debug bits or something? --matt