Marko Milisavljevic
2007-May-14 07:53 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
I was trying to simply test bandwidth that Solaris/ZFS (Nevada b63) can deliver from a drive, and doing this: dd if=(raw disk) of=/dev/null gives me around 80MB/s, while dd if=(file on ZFS) of=/dev/null gives me only 35MB/s!?. I am getting basically the same result whether it is single zfs drive, mirror or a stripe (I am testing with two Seagate 7200.10 320G drives hanging off the same interface card). On the test machine I also have an old disk with UFS on PATA interface (Seagate 7200.7 120G). dd from raw disk gives 58MB/s and dd from file on UFS gives 45MB/s - far less relative slowdown compared to raw disk. This is just an AthlonXP 2500+ with 32bit PCI SATA sil3114 card, but nonetheless, the hardware has the bandwidth to fully saturate the hard drive, as seen by dd from the raw disk device. What is going on? Am I doing something wrong or is ZFS just not designed to be used on humble hardware? My goal is to have it go fast enough to saturate gigabit ethernet - around 75MB/s. I don''t plan on replacing hardware - after all, Linux with RAID10 gives me this already. I was hoping to switch to Solaris/ZFS to get checksums (which wouldn''t seem to account for slowness, because CPU stays under 25% during all this). I can temporarily scrape together an x64 machine with ICH7 SATA interface - I''ll try the same test with same drives on that to elliminate 32-bitness and PCI slowness from the equation. And while someone will say dd has little to do with real-life file server performance - it actually has a lot to do with it, because most of use of this server is to copy multi-gigabyte files to and fro a few times per day. Hardly any random access involved (fragmentation aside). This message posted from opensolaris.org
Al Hopper
2007-May-14 13:11 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
On Mon, 14 May 2007, Marko Milisavljevic wrote: [ ... reformatted ....]> I was trying to simply test bandwidth that Solaris/ZFS (Nevada b63) can > deliver from a drive, and doing this: dd if=(raw disk) of=/dev/null > gives me around 80MB/s, while dd if=(file on ZFS) of=/dev/null gives me > only 35MB/s!?. I am getting basically the same result whether it is > single zfs drive, mirror or a stripe (I am testing with two Seagate > 7200.10 320G drives hanging off the same interface card).Which interface card? ... snip .... Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Richard Elling
2007-May-14 15:57 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> I was trying to simply test bandwidth that Solaris/ZFS (Nevada b63) can deliver from a drive, and doing this: > dd if=(raw disk) of=/dev/null gives me around 80MB/s, while dd if=(file on ZFS) of=/dev/null gives me only 35MB/s!?. I am getting basically the same result whether it is single zfs drive, mirror or a stripe (I am testing with two Seagate 7200.10 320G drives hanging off the same interface card).Checksum is a contributor. AthlonXPs are long in the tooth. Disable checksum and experiment. -- richard> On the test machine I also have an old disk with UFS on PATA interface (Seagate 7200.7 120G). dd from raw disk gives 58MB/s and dd from file on UFS gives 45MB/s - far less relative slowdown compared to raw disk. > > This is just an AthlonXP 2500+ with 32bit PCI SATA sil3114 card, but nonetheless, the hardware has the bandwidth to fully saturate the hard drive, as seen by dd from the raw disk device. What is going on? Am I doing something wrong or is ZFS just not designed to be used on humble hardware? > > My goal is to have it go fast enough to saturate gigabit ethernet - around 75MB/s. I don''t plan on replacing hardware - after all, Linux with RAID10 gives me this already. I was hoping to switch to Solaris/ZFS to get checksums (which wouldn''t seem to account for slowness, because CPU stays under 25% during all this). > > I can temporarily scrape together an x64 machine with ICH7 SATA interface - I''ll try the same test with same drives on that to elliminate 32-bitness and PCI slowness from the equation. And while someone will say dd has little to do with real-life file server performance - it actually has a lot to do with it, because most of use of this server is to copy multi-gigabyte files to and fro a few times per day. Hardly any random access involved (fragmentation aside). > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marko Milisavljevic
2007-May-14 20:02 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
To reply to my own message.... this article offers lots of insight into why dd access directly through raw disk is fast, while accessing a file through the file system may be slow. http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 So, I guess what I''m wondering now is, does it happen to everyone that ZFS is under half the speed of raw disk access? What speeds are other people getting trying to dd a file through zfs file system? Something like dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default ZFS block size) how does that compare to: dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=10000 If you could please post your MB/s and show output of zpool status so we can see your disk configuration I would appreciate it. Please use file that is 100MB or more - result is be too random with small files. Also make sure zfs is not caching the file already! What I am seeing is that ZFS performance for sequential access is about 45% of raw disk access, while UFS (as well as ext3 on Linux) is around 70%. For workload consisting mostly of reading large files sequentially, it would seem then that ZFS is the wrong tool performance-wise. But, it could be just my setup, so I would appreciate more data points. This message posted from opensolaris.org
johansen-osdev at sun.com
2007-May-14 20:43 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
This certainly isn''t the case on my machine. $ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 1.3 user 0.0 sys 1.2 # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 22.3 user 0.0 sys 2.2 This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool. My pool is configured into a 46 disk RAID-0 stripe. I''m going to omit the zpool status output for the sake of brevity.> What I am seeing is that ZFS performance for sequential access is > about 45% of raw disk access, while UFS (as well as ext3 on Linux) is > around 70%. For workload consisting mostly of reading large files > sequentially, it would seem then that ZFS is the wrong tool > performance-wise. But, it could be just my setup, so I would > appreciate more data points.This isn''t what we''ve observed in much of our performance testing. It may be a problem with your config, although I''m not an expert on storage configurations. Would you mind providing more details about your controller, disks, and machine setup? -j
Marko Milisavljevic
2007-May-14 21:41 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Thank you for those numbers. I should have mentioned that I was mostly interested in single disk or small array performance, as it is not possible for dd to meaningfully access multiple-disk configurations without going through the file system. I find it curious that there is such a large slowdown by going through file system (with single drive configuration), especially compared to UFS or ext3. I simply have a small SOHO server and I am trying to evaluate which OS to use to keep a redundant disk array. With unreliable consumer-level hardware, ZFS and the checksum feature are very interesting and the primary selling point compared to a Linux setup, for as long as ZFS can generate enough bandwidth from the drive array to saturate single gigabit ethernet. My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114 SATA controller on a 32-bit AthlonXP, according to many posts I found. However, since dd over raw disk is capable of extracting 75+MB/s from this setup, I keep feeling that surely I must be able to get at least that much from reading a pair of striped or mirrored ZFS drives. But I can''t - single drive or 2-drive stripes or mirrors, I only get around 34MB/s going through ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.) Everything is stock Nevada b63 installation, so I haven''t messed it up with misguided tuning attempts. Don''t know if it matters, but test file was created originally from /dev/random. Compression is off, and everything is default. CPU utilization remains low at all times (haven''t seen it go over 25%). On 5/14/07, johansen-osdev at sun.com <johansen-osdev at sun.com> wrote:> > This certainly isn''t the case on my machine. > > $ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 1.3 > user 0.0 > sys 1.2 > > # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 22.3 > user 0.0 > sys 2.2 > > This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool. > > My pool is configured into a 46 disk RAID-0 stripe. I''m going to omit > the zpool status output for the sake of brevity. > > > What I am seeing is that ZFS performance for sequential access is > > about 45% of raw disk access, while UFS (as well as ext3 on Linux) is > > around 70%. For workload consisting mostly of reading large files > > sequentially, it would seem then that ZFS is the wrong tool > > performance-wise. But, it could be just my setup, so I would > > appreciate more data points. > > This isn''t what we''ve observed in much of our performance testing. > It may be a problem with your config, although I''m not an expert on > storage configurations. Would you mind providing more details about > your controller, disks, and machine setup? > > -j > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070514/161154b9/attachment.html>
Marko Milisavljevic
2007-May-14 22:16 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
I missed an important conclusion from j''s data, and that is that single disk raw access gives him 56MB/s, and RAID 0 array gives him 961/46=21MB/s per disk, which comes in at 38% of potential performance. That is in the ballpark of getting 45% of potential performance, as I am seeing with my puny setup of single or dual drives. Of course, I don''t expect a complex file system to match raw disk dd performance, but it doesn''t compare favourably to common file systems like UFS or ext3, so the question remains, is ZFS overhead normally this big? That would mean that one needs to have at least 4-5 way stripe to generate enough data to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" filesystem, a possibly important consideration in SOHO situation. On 5/14/07, johansen-osdev at sun.com <johansen-osdev at sun.com> wrote:> > This certainly isn''t the case on my machine. > > $ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 1.3 > user 0.0 > sys 1.2 > > # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 22.3 > user 0.0 > sys 2.2 > > This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool. > > My pool is configured into a 46 disk RAID-0 stripe. I''m going to omit > the zpool status output for the sake of brevity. > > > What I am seeing is that ZFS performance for sequential access is > > about 45% of raw disk access, while UFS (as well as ext3 on Linux) is > > around 70%. For workload consisting mostly of reading large files > > sequentially, it would seem then that ZFS is the wrong tool > > performance-wise. But, it could be just my setup, so I would > > appreciate more data points. > > This isn''t what we''ve observed in much of our performance testing. > It may be a problem with your config, although I''m not an expert on > storage configurations. Would you mind providing more details about > your controller, disks, and machine setup? > > -j > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070514/c048c7de/attachment.html>
Al Hopper
2007-May-14 22:44 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
On Mon, 14 May 2007, Marko Milisavljevic wrote:> To reply to my own message.... this article offers lots of insight into why dd access directly through raw disk is fast, while accessing a file through the file system may be slow. > > http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 > > So, I guess what I''m wondering now is, does it happen to everyone that ZFS is under half the speed of raw disk access? What speeds are other people getting trying to dd a file through zfs file system? Something like > > dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default ZFS block size) > > how does that compare to: > > dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=10000 > > If you could please post your MB/s and show output of zpool status so we > can see your disk configuration I would appreciate it. Please use file > that is 100MB or more - result is be too random with small files. Also > make sure zfs is not caching the file already!# ptime dd if=./allhomeal20061209_01.tar of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 6.407 user 0.008 sys 1.624 pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 3-way mirror: 10000+0 records in 10000+0 records out real 12.500 user 0.007 sys 1.216 2-way mirror: 10000+0 records in 10000+0 records out real 18.356 user 0.006 sys 0.935 # psrinfo -v Status of virtual processor 0 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:21. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:24. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:26. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:28. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor.> What I am seeing is that ZFS performance for sequential access is about 45% of raw disk access, while UFS (as well as ext3 on Linux) is around 70%. For workload consisting mostly of reading large files sequentially, it would seem then that ZFS is the wrong tool performance-wise. But, it could be just my setup, so I would appreciate more data points. >Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Richard Elling
2007-May-14 22:52 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> I missed an important conclusion from j''s data, and that is that single > disk raw access gives him 56MB/s, and RAID 0 array gives him > 961/46=21MB/s per disk, which comes in at 38% of potential performance. > That is in the ballpark of getting 45% of potential performance, as I am > seeing with my puny setup of single or dual drives. Of course, I don''t > expect a complex file system to match raw disk dd performance, but it > doesn''t compare favourably to common file systems like UFS or ext3, so > the question remains, is ZFS overhead normally this big? That would mean > that one needs to have at least 4-5 way stripe to generate enough data > to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" > filesystem, a possibly important consideration in SOHO situation.Could you post iostat data for these runs? Also, as I suggested previously, try with checksum off. AthlonXP doesn''t have a reputation as a speed deamon. BTW, for 7,200 rpm drives, which are typical in desktops, 56 MBytes/s isn''t bad. The media speed will range from perhaps [30-40]-[60-75] MBytes/s judging from a quick scan of disk vendor datasheets. In other words, it would not surprise me to see 4-5 way stripe being required to keep a GbE saturated. -- richard
Marko Milisavljevic
2007-May-14 22:53 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Thank you, Al. Would you mind also doing: ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000 to see the raw performance of underlying hardware. On 5/14/07, Al Hopper <al at logical-approach.com> wrote:> > # ptime dd if=./allhomeal20061209_01.tar of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 6.407 > user 0.008 > sys 1.624 > > pool: tank > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > c2t3d0 ONLINE 0 0 0 > c2t4d0 ONLINE 0 0 0 > > 3-way mirror: > > 10000+0 records in > 10000+0 records out > > real 12.500 > user 0.007 > sys 1.216 > > 2-way mirror: > > 10000+0 records in > 10000+0 records out > > real 18.356 > user 0.006 > sys 0.935 > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070514/53ed3bcf/attachment.html>
Bart Smaalders
2007-May-14 22:59 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> I missed an important conclusion from j''s data, and that is that single > disk raw access gives him 56MB/s, and RAID 0 array gives him > 961/46=21MB/s per disk, which comes in at 38% of potential performance. > That is in the ballpark of getting 45% of potential performance, as I am > seeing with my puny setup of single or dual drives. Of course, I don''t > expect a complex file system to match raw disk dd performance, but it > doesn''t compare favourably to common file systems like UFS or ext3, so > the question remains, is ZFS overhead normally this big? That would mean > that one needs to have at least 4-5 way stripe to generate enough data > to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" > filesystem, a possibly important consideration in SOHO situation. >I don''t see this on my system, but it has more CPU (dual core 2.6 GHz). It saturates a GB net w/ 4 drives & samba, not working hard at all. A thumper does 2 GB/sec w 2 dual core CPUs. Do you have compression enabled? This can be a choke point for weak CPUs. - Bart Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Ian Collins
2007-May-14 23:15 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> To reply to my own message.... this article offers lots of insight into why dd access directly through raw disk is fast, while accessing a file through the file system may be slow. > > http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 > > So, I guess what I''m wondering now is, does it happen to everyone that ZFS is under half the speed of raw disk access? What speeds are other people getting trying to dd a file through zfs file system? Something like > > dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default ZFS block size) > > how does that compare to: > > dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=10000 > >Testing on a old Athlon MP box, two U160 10K SCSI drives. bash-3.00# time dd if=/dev/dsk/c2t0d0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 0m44.470s user 0m0.018s sys 0m8.290s time dd if=/test/play/sol-nv-b62-x86-dvd.iso of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 0m22.714s user 0m0.020s sys 0m3.228s zpool status pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 Ian
Marko Milisavljevic
2007-May-14 23:27 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Right now, the AthlonXP machine is booted into Linux, and I''m getting same raw speed as when it is in Solaris, from PCI Sil3114 with Seagate 320G ( 7200.10): dd if=/dev/sdb of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes (1.3 GB) copied, 16.7756 seconds, 78.1 MB/s sudo dd if=./test.mov of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes (1.3 GB) copied, 24.2731 seconds, 54.0 MB/s <-- some overhead compared to raw speed of same disk above same machine, onboard ATA, Seagate 120G: dd if=/dev/hda of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes (1.3 GB) copied, 22.5892 seconds, 58.0 MB/s On another machine with Pentium D 3.0GHz and ICH7 onboard SATA in AHCI mode, running Darwin OS: from a Seagate 500G (7200.10): dd if=/dev/rdisk0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes transferred in 17.697512 secs (74062388 bytes/sec) same disk, access through file system (HFS+) dd if=./Summer\ 2006\ with\ Cohen\ 4 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes transferred in 20.381901 secs (64308035 bytes/sec) <- very small overhead compared to raw access above! same Intel machine, Seagate 200G (7200.8, I think): dd if=/dev/rdisk1 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes transferred in 20.850229 secs (62863578 bytes/sec) Modern disk drives are definitely fast and pushing close to 80MB/s raw performance. And some file systems can get over 85% of that with simple sequential access. So far, on these particular hardware and software combinations, I have, filesystem performance as percentage of raw disk performance for sequential unchached read: HFS+: 86% ext3 and UFS: 70% ZFS: 45% On 5/14/07, Richard Elling <Richard.Elling at sun.com> wrote:> > Marko Milisavljevic wrote: > > I missed an important conclusion from j''s data, and that is that single > > disk raw access gives him 56MB/s, and RAID 0 array gives him > > 961/46=21MB/s per disk, which comes in at 38% of potential performance. > > That is in the ballpark of getting 45% of potential performance, as I am > > seeing with my puny setup of single or dual drives. Of course, I don''t > > expect a complex file system to match raw disk dd performance, but it > > doesn''t compare favourably to common file systems like UFS or ext3, so > > the question remains, is ZFS overhead normally this big? That would mean > > that one needs to have at least 4-5 way stripe to generate enough data > > to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" > > filesystem, a possibly important consideration in SOHO situation. > > Could you post iostat data for these runs? > > Also, as I suggested previously, try with checksum off. AthlonXP doesn''t > have a reputation as a speed deamon. > > BTW, for 7,200 rpm drives, which are typical in desktops, 56 MBytes/s > isn''t bad. The media speed will range from perhaps [30-40]-[60-75] > MBytes/s > judging from a quick scan of disk vendor datasheets. In other words, it > would not surprise me to see 4-5 way stripe being required to keep a > GbE saturated. > -- richard > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070514/a8df8776/attachment.html>
Marko Milisavljevic
2007-May-14 23:39 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Thank you, Ian, You are getting ZFS over 2-disk RAID-0 to be twice as fast as dd raw disk read on one disk, which sounds more encouraging. But, there is something odd with dd from raw drive - it is only 28MB/s or so, if I divided that right? I would expect it to be around 100MB/s on 10K drives, or at least that should be roughly potential throughput rate. Compared to throughput from ZFS 2-disk RAID-0 which is showing 57MB/s. Any idea why raw dd read is so slow? Also, I wonder if everyone is using different dd command then I am - I get summary line that shows elapsed time and MB/s. On 5/14/07, Ian Collins <ian at ianshome.com> wrote:> > Marko Milisavljevic wrote: > > To reply to my own message.... this article offers lots of insight into > why dd access directly through raw disk is fast, while accessing a file > through the file system may be slow. > > > > http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 > > > > So, I guess what I''m wondering now is, does it happen to everyone that > ZFS is under half the speed of raw disk access? What speeds are other people > getting trying to dd a file through zfs file system? Something like > > > > dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using > default ZFS block size) > > > > how does that compare to: > > > > dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=10000 > > > > > Testing on a old Athlon MP box, two U160 10K SCSI drives. > > bash-3.00# time dd if=/dev/dsk/c2t0d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 0m44.470s > user 0m0.018s > sys 0m8.290s > > time dd if=/test/play/sol-nv-b62-x86-dvd.iso of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 0m22.714s > user 0m0.020s > sys 0m3.228s > > zpool status > pool: test > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > > Ian > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070514/f6e4ed93/attachment.html>
Nick G
2007-May-15 01:19 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Don''t know how much this will help, but my results: Ultra 20 we just got at work: # uname -a SunOS unknown 5.10 Generic_118855-15 i86pc i386 i86pc raw disk dd if=/dev/dsk/c1d0s6 of=/dev/null bs=128k count=10000 0.00s user 2.16s system 14% cpu 15.131 total 1,280,000k in 15.131 seconds 84768k/s through filesystem dd if=testfile of=/dev/null bs=128k count=10000 0.01s user 0.88s system 4% cpu 19.666 total 1,280,000k in 19.666 seconds 65087k/s AMD64 Freebsd 7 on a Lenovo something or other, Athlon X2 3800+ uname -a FreeBSD 7.0-CURRENT-200705 FreeBSD 7.0-CURRENT-200705 #0: Fri May 11 14:41:37 UTC 2007 root@:/usr/src/sys/amd64/compile/ZFS amd64 raw disk dd if=/dev/ad6p1 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes transferred in 17.126926 secs (76529787 bytes/sec) (74735k/s) filesystem # dd of=/dev/null if=testfile bs=128k count=10000 10000+0 records in 10000+0 records out 1310720000 bytes transferred in 17.174395 secs (76318263 bytes/sec) (74529k/s) Odd to say the least since "du" for instance is faster on Solaris ZFS... FWIW Freebsd is running version 6 of ZFS and the unpatched but _new_ Ultra 20 is running version 2 of ZFS according to zdb Make sure you''re all patched up? This message posted from opensolaris.org
johansen-osdev at sun.com
2007-May-15 01:42 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko, I tried this experiment again using 1 disk and got nearly identical times: # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 21.4 user 0.0 sys 2.4 $ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 21.0 user 0.0 sys 0.7> [I]t is not possible for dd to meaningfully access multiple-disk > configurations without going through the file system. I find it > curious that there is such a large slowdown by going through file > system (with single drive configuration), especially compared to UFS > or ext3.Comparing a filesystem to raw dd access isn''t a completely fair comparison either. Few filesystems actually layout all of their data and metadata so that every read is a completely sequential read.> I simply have a small SOHO server and I am trying to evaluate which OS to > use to keep a redundant disk array. With unreliable consumer-level hardware, > ZFS and the checksum feature are very interesting and the primary selling > point compared to a Linux setup, for as long as ZFS can generate enough > bandwidth from the drive array to saturate single gigabit ethernet.I would take Bart''s reccomendation and go with Solaris on something like a dual-core box with 4 disks.> My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114 > SATA controller on a 32-bit AthlonXP, according to many posts I found.Bill Moore lists some controller reccomendations here: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html> However, since dd over raw disk is capable of extracting 75+MB/s from this > setup, I keep feeling that surely I must be able to get at least that much > from reading a pair of striped or mirrored ZFS drives. But I can''t - single > drive or 2-drive stripes or mirrors, I only get around 34MB/s going through > ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.)Maybe this is a problem with your controller? What happens when you have two simultaneous dd''s to different disks running? This would simulate the case where you''re reading from the two disks at the same time. -j
Al Hopper
2007-May-15 03:16 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
On Mon, 14 May 2007, Marko Milisavljevic wrote:> Thank you, Al. > > Would you mind also doing: > > ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000# ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000 real 20.046 user 0.013 sys 3.568> to see the raw performance of underlying hardware.Regards, Al Hopper
Marko Milisavljevic
2007-May-15 05:48 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
I am very grateful to everyone who took the time to run a few tests to help me figure what is going on. As per j''s suggestions, I tried some simultaneous reads, and a few other things, and I am getting interesting and confusing results. All tests are done using two Seagate 320G drives on sil3114. In each test I am using dd if=.... of=/dev/null bs=128k count=10000. Each drive is freshly formatted with one 2G file copied to it. That way dd from raw disk and from file are using roughly same area of disk. I tried using raw, zfs and ufs, single drives and two simultaneously (just executing dd commands in separate terminal windows). These are snapshots of iostat -xnczpm 3 captured somewhere in the middle of the operation. I am not bothering to report CPU% as it never rose over 50%, and was uniformly proportional to reported throughput. single drive raw: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1378.4 0.0 77190.7 0.0 0.0 1.7 0.0 1.2 0 98 c0d1 single drive, ufs file r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1255.1 0.0 69949.6 0.0 0.0 1.8 0.0 1.4 0 100 c0d0 Small slowdown, but pretty good. single drive, zfs file r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 258.3 0.0 33066.6 0.0 33.0 2.0 127.7 7.7 100 100 c0d1 Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / r/s gives 256K, as I would imagine it should. simultaneous raw: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 797.0 0.0 44632.0 0.0 0.0 1.8 0.0 2.3 0 100 c0d0 795.7 0.0 44557.4 0.0 0.0 1.8 0.0 2.3 0 100 c0d1 This PCI interface seems to be saturated at 90MB/s. Adequate if the goal is to serve files on gigabit SOHO network. sumultaneous raw on c0d1 and ufs on c0d0: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 722.4 0.0 40246.8 0.0 0.0 1.8 0.0 2.5 0 100 c0d0 717.1 0.0 40156.2 0.0 0.0 1.8 0.0 2.5 0 99 c0d1 hmm, can no longer get the 90MB/sec. simultaneous zfs on c0d1 and raw on c0d0: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.7 0.0 1.8 0.0 0.0 0.0 0.1 0 0 c1d0 334.9 0.0 18756.0 0.0 0.0 1.9 0.0 5.5 0 97 c0d0 172.5 0.0 22074.6 0.0 33.0 2.0 191.3 11.6 100 100 c0d1 Everything is slow. What happens if we throw onboard IDE interface into the mix? simultaneous raw SATA and raw PATA: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1036.3 0.3 58033.9 0.3 0.0 1.6 0.0 1.6 0 99 c1d0 1422.6 0.0 79668.3 0.0 0.0 1.6 0.0 1.1 1 98 c0d0 Both at maximum throughput. Read ZFS on SATA drive and raw disk on PATA interface: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1018.9 0.3 57056.1 4.0 0.0 1.7 0.0 1.7 0 99 c1d0 268.4 0.0 34353.1 0.0 33.0 2.0 122.9 7.5 100 100 c0d0 SATA is slower with ZFS as expected by now, but ATA remains at full speed. So they are operating quite independantly. Except... What if we read a UFS file from the PATA disk and ZFS from SATA: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 792.8 0.0 44092.9 0.0 0.0 1.8 0.0 2.2 1 98 c1d0 224.0 0.0 28675.2 0.0 33.0 2.0 147.3 8.9 100 100 c0d0 Now that is confusing! Why did SATA/ZFS slow down too? I''ve retried this a number of times, not a fluke. Finally, after reviewing all this, I''ve noticed another interesting bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s over r/s is 56k, suggesting that underlying IO system is using that as some kind of a native block size? (even though dd is requesting 128k). But when reading ZFS files, this always comes to 128k, which is expected, since that is ZFS default (and same thing happens regardless of bs= in dd). On the theory that my system just doesn''t like 128k reads (I''m desperate!), and that this would explain the whole slowdown and wait/wsvc_t column, I tried changing recsize to 32k and rewriting the test file. However, accessing ZFS files continues to show 128k reads, and it is just as slow. Is there a way to either confirm that the ZFS file in question is indeed written with 32k records or, even better, to force ZFS to use 56k when accessing the disk. Or perhaps I just misunderstand implications of iostat output. I''ve repeated each of these tests a few times and doublechecked, and the numbers, although snapshots of a point in time, fairly represent averages. I have no idea what to make of all this, except that it ZFS has a problem with this hardware/drivers that UFS and other traditional file systems, don''t. Is it a bug in the driver that ZFS is inadvertently exposing? A specific feature that ZFS assumes the hardware to have, but it doesn''t? Who knows! I will have to give up on Solaris/ZFS on this hardware for now, but I hope to try it again sometime in the future. I''ll give FreeBSD/ZFS a spin to see if it fares better (although at this point in its development it is probably more risky then just sticking with Linux and missing out on ZFS). (Another contributor suggested turning checksumming off - it made no difference. Same for atime. Compression was always off.) On 5/14/07, johansen-osdev at sun.com <johansen-osdev at sun.com> wrote:> > Marko, > > I tried this experiment again using 1 disk and got nearly identical > times: > > # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 21.4 > user 0.0 > sys 2.4 > > $ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 21.0 > user 0.0 > sys 0.7 > > > > [I]t is not possible for dd to meaningfully access multiple-disk > > configurations without going through the file system. I find it > > curious that there is such a large slowdown by going through file > > system (with single drive configuration), especially compared to UFS > > or ext3. > > Comparing a filesystem to raw dd access isn''t a completely fair > comparison either. Few filesystems actually layout all of their data > and metadata so that every read is a completely sequential read. > > > I simply have a small SOHO server and I am trying to evaluate which OS > to > > use to keep a redundant disk array. With unreliable consumer-level > hardware, > > ZFS and the checksum feature are very interesting and the primary > selling > > point compared to a Linux setup, for as long as ZFS can generate enough > > bandwidth from the drive array to saturate single gigabit ethernet. > > I would take Bart''s reccomendation and go with Solaris on something like a > dual-core box with 4 disks. > > > My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI > 3114 > > SATA controller on a 32-bit AthlonXP, according to many posts I found. > > Bill Moore lists some controller reccomendations here: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html > > > However, since dd over raw disk is capable of extracting 75+MB/s from > this > > setup, I keep feeling that surely I must be able to get at least that > much > > from reading a pair of striped or mirrored ZFS drives. But I can''t - > single > > drive or 2-drive stripes or mirrors, I only get around 34MB/s going > through > > ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.) > > Maybe this is a problem with your controller? What happens when you > have two simultaneous dd''s to different disks running? This would > simulate the case where you''re reading from the two disks at the same > time. > > -j > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070514/b332dc89/attachment.html>
Nick G
2007-May-15 11:31 UTC
[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
> I have no idea what to make of all > this, except that it ZFS has a problem with this > hardware/drivers that UFS and other traditional file > systems, don''t. Is it a bug in the driver that > ZFS is inadvertently exposing? A specific feature > that ZFS assumes the hardware to have, but it > doesn''t? Who knows! I will have to give up on > Solaris/ZFS on this hardware for now, but I hope to > try it again sometime in the future. I''ll give > FreeBSD/ZFS a spin to see if it fares better > (although at this point in its development it is > probably more risky then just sticking with Linux and > missing out on ZFS).If you do give FreeBSD a try, if just for the sake of seeing if ZFS continues to perform badly on your hardware, use the 200705 snapshot or newer, and make sure your turn off the debugging support that is built in to -CURRENT by default, ZFS seems to like _fast_ memory. Make malloc behave like a release: # cd /etc # ln -s malloc.conf aj Rebuild your kernel to disable sanity checks in -CURRENT, you could probably just comment out WITNESS* and INVARIANT*, but I wanted to test the equivalent of a production release system here, so I commented all of it out and recompiled. #makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols #options KDB # Enable kernel debugger support. #options DDB # Support DDB. #options GDB # Support remote GDB. #options INVARIANTS # Enable calls of extra sanity checking #options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS #options WITNESS # Enable checks to detect deadlocks and cycles #options WITNESS_SKIPSPIN # Don''t run witness on spinlocks for speed Your filesystem/data should be safe on FreeBSD right now since pretty much all of the core ZFS code is the same. That doesn''t mean something else won''t cause a panic/reboot, since it is a devel branch! You are right to be hesitant to put it into production for a client. If it''s just for home use, I say go for it, I''ve been beating on it for a few days and have been pleasantly suprised. Obviously if you can trigger a panic, you''d want to reenable debugging if you care to fix it. This message posted from opensolaris.org
Jürgen Keil
2007-May-15 17:13 UTC
[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
> Would you mind also doing: > > ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000 > > to see the raw performance of underlying hardware.This dd command is reading from the block device, which might cache dataand probably splits requests into "maxphys" pieces (which happens to be 56K on an x86 box). I''d read from the raw device, /dev/rdsk/c2t1d0 ... This message posted from opensolaris.org
Jonathan Edwards
2007-May-15 17:35 UTC
[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
On May 15, 2007, at 13:13, J?rgen Keil wrote:>> Would you mind also doing: >> >> ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000 >> >> to see the raw performance of underlying hardware. > > This dd command is reading from the block device, > which might cache dataand probably splits requests > into "maxphys" pieces (which happens to be 56K on an > x86 box).to increase this to say 8MB, add the following to /etc/system: set maxphys=0x800000 and you''ll probably want to increase sd_max_xfer_size as well (should be 256K on x86/x64) .. add the following to /kernel/drv/sd.conf: sd_max_xfer_size=0x800000; then reboot to get the kernel and sd tunings to take. --- .je btw - the defaults on sparc: maxphys = 128K ssd_max_xfer_size = maxphys sd_max_xfer_size = maxphys
johansen-osdev at sun.com
2007-May-15 21:03 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
> Each drive is freshly formatted with one 2G file copied to it.How are you creating each of these files? Also, would you please include a the output from the isalist(1) command?> These are snapshots of iostat -xnczpm 3 captured somewhere in the > middle of the operation.Have you double-checked that this isn''t a measurement problem by measuring zfs with zpool iostat (see zpool(1M)) and verifying that outputs from both iostats match?> single drive, zfs file > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 258.3 0.0 33066.6 0.0 33.0 2.0 127.7 7.7 100 100 c0d1 > > Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / > r/s gives 256K, as I would imagine it should.Not sure. If we can figure out why ZFS is slower than raw disk access in your case, it may explain why you''re seeing these results.> What if we read a UFS file from the PATA disk and ZFS from SATA: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 792.8 0.0 44092.9 0.0 0.0 1.8 0.0 2.2 1 98 c1d0 > 224.0 0.0 28675.2 0.0 33.0 2.0 147.3 8.9 100 100 c0d0 > > Now that is confusing! Why did SATA/ZFS slow down too? I''ve retried this a > number of times, not a fluke.This could be cache interference. ZFS and UFS use different caches. How much memory is in this box?> I have no idea what to make of all this, except that it ZFS has a problem > with this hardware/drivers that UFS and other traditional file systems, > don''t. Is it a bug in the driver that ZFS is inadvertently exposing? A > specific feature that ZFS assumes the hardware to have, but it doesn''t? Who > knows!This may be a more complicated interaction than just ZFS and your hardware. There are a number of layers of drivers underneath ZFS that may also be interacting with your hardware in an unfavorable way. If you''d like to do a little poking with MDB, we can see the features that your SATA disks claim they support. As root, type mdb -k, and then at the ">" prompt that appears, enter the following command (this is one very long line): *sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | ::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t satadrv_features_support satadrv_settings satadrv_features_enabled This should show satadrv_features_support, satadrv_settings, and satadrv_features_enabled for each SATA disk on the system. The values for these variables are defined in: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/sata/impl/sata.h this is the relevant snippet for interpreting these values: /* * Device feature_support (satadrv_features_support) */ #define SATA_DEV_F_DMA 0x01 #define SATA_DEV_F_LBA28 0x02 #define SATA_DEV_F_LBA48 0x04 #define SATA_DEV_F_NCQ 0x08 #define SATA_DEV_F_SATA1 0x10 #define SATA_DEV_F_SATA2 0x20 #define SATA_DEV_F_TCQ 0x40 /* Non NCQ tagged queuing */ /* * Device features enabled (satadrv_features_enabled) */ #define SATA_DEV_F_E_TAGGED_QING 0x01 /* Tagged queuing enabled */ #define SATA_DEV_F_E_UNTAGGED_QING 0x02 /* Untagged queuing enabled */ /* * Drive settings flags (satdrv_settings) */ #define SATA_DEV_READ_AHEAD 0x0001 /* Read Ahead enabled */ #define SATA_DEV_WRITE_CACHE 0x0002 /* Write cache ON */ #define SATA_DEV_SERIAL_FEATURES 0x8000 /* Serial ATA feat. enabled */ #define SATA_DEV_ASYNCH_NOTIFY 0x2000 /* Asynch-event enabled */ This may give us more information if this is indeed a problem with hardware/drivers supporting the right features. -j
Matthew Ahrens
2007-May-16 02:10 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> I was trying to simply test bandwidth that Solaris/ZFS (Nevada b63) can > deliver from a drive, and doing this: dd if=(raw disk) of=/dev/null gives > me around 80MB/s, while dd if=(file on ZFS) of=/dev/null gives me only > 35MB/s!?.Our experience is that ZFS gets very close to raw performance for streaming reads (assuming that there is adequate CPU and memory available). When doing reads, prefetching (and thus caching) is a critical component of performance. It may be that ZFS''s prefetching or caching is misbehaving somehow. Your machine is 32-bit, right? This could be causing some caching pain... How much memory do you have? While you''re running the test on ZFS, can you send the output of: echo ::memstat | mdb -k echo ::arc | mdb -k Next, try running your test with prefetch disabled, by putting set zfs:zfs_prefetch_disable=1 in /etc/system and rebooting before running your test. Send the ''iostat -xnpcz'' output while this test is running. Finally, on modern drive the streaming performance can vary by up to 2x when reading the outside vs. the inside of the disk. If your pool had been used before you created your test file, it could be laid out on the inside part of the disk. Then you would be comparing raw reads of the outside of the disk vs. zfs reads of the inside of the disk. When the pool is empty, ZFS will start allocating from the outside, so you can try destroying and recreating your pool and creating the file on the fresh pool. Alternatively, create a small partition (say, 10% of the disk size) and do your tests on that to ensure that the file is not far from where your raw reads are going. Let us know how that goes. --matt
Marko Milisavljevic
2007-May-16 05:09 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Hello Matthew, Yes, my machine is 32-bit, with 1.5G of RAM. -bash-3.00# echo ::memstat | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 123249 481 32% Anon 33704 131 9% Exec and libs 7637 29 2% Page cache 1116 4 0% Free (cachelist) 222661 869 57% Free (freelist) 2685 10 1% Total 391052 1527 Physical 391051 1527 -bash-3.00# echo ::arc | mdb -k { anon = -759566176 mru = -759566136 mru_ghost = -759566096 mfu = -759566056 mfu_ghost = -759566016 size = 0x17f20c00 p = 0x160ef900 c = 0x17f16ae0 c_min = 0x4000000 c_max = 0x1da00000 hits = 0x353b misses = 0x264b deleted = 0x13bc recycle_miss = 0x31 mutex_miss = 0 evict_skip = 0 hash_elements = 0x127b hash_elements_max = 0x1a19 hash_collisions = 0x61 hash_chains = 0x4c hash_chain_max = 0x1 no_grow = 1 } now lets try: set zfs:zfs_prefetch_disable=1 bingo! r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 609.0 0.0 77910.0 0.0 0.0 0.8 0.0 1.4 0 83 c0d0 only 1-2 % slower then dd from /dev/dsk. Do you think this is general 32-bit problem, or specific to this combination of hardware? I am using PCI/SATA Sil3114 card, and other then ZFS, performance of this interface has some limitations in Solaris. That is, single drive gives 80MB/s, but doing dd /dev/dsk/xyz simultaneously on 2 drives attached to the card gives only 46MB/s each. On Linux, however, that gives 60MB/s each, close to saturating theoretical throughput of PCI bus. Having both drives in zpool stripe gives, with prefetch disabled, close to 45MB/s each through dd from zfs file. I think that under Solaris, this card is accessed through ATA driver. There shouldn''t be any issues on inside vs outside. all the reading is done on the first gig or two of the drive, as there is nothing else on them, except one 2 gig file. (well, i''m assuming simple copy onto a newly formatted zfs drive puts it at start of the drive.) Drives are completely owned by ZFS, using zpool create c0d0 c0d1 Finally, should I file a bug somewhere regarding prefetch, or is this a known issue? Many thanks. On 5/15/07, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> Marko Milisavljevic wrote: > > I was trying to simply test bandwidth that Solaris/ZFS (Nevada b63) can > > deliver from a drive, and doing this: dd if=(raw disk) of=/dev/null gives > > me around 80MB/s, while dd if=(file on ZFS) of=/dev/null gives me only > > 35MB/s!?. > > Our experience is that ZFS gets very close to raw performance for streaming > reads (assuming that there is adequate CPU and memory available). > > When doing reads, prefetching (and thus caching) is a critical component of > performance. It may be that ZFS''s prefetching or caching is misbehaving somehow. > > Your machine is 32-bit, right? This could be causing some caching pain... > How much memory do you have? While you''re running the test on ZFS, can you > send the output of: > > echo ::memstat | mdb -k > echo ::arc | mdb -k > > Next, try running your test with prefetch disabled, by putting > set zfs:zfs_prefetch_disable=1 > in /etc/system and rebooting before running your test. Send the ''iostat > -xnpcz'' output while this test is running. > > Finally, on modern drive the streaming performance can vary by up to 2x when > reading the outside vs. the inside of the disk. If your pool had been used > before you created your test file, it could be laid out on the inside part of > the disk. Then you would be comparing raw reads of the outside of the disk > vs. zfs reads of the inside of the disk. When the pool is empty, ZFS will > start allocating from the outside, so you can try destroying and recreating > your pool and creating the file on the fresh pool. Alternatively, create a > small partition (say, 10% of the disk size) and do your tests on that to > ensure that the file is not far from where your raw reads are going. > > Let us know how that goes. > > --matt >
Marko Milisavljevic
2007-May-16 05:14 UTC
[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
I tried as you suggested, but I notice that output from iostat while doing dd if=/dev/dsk/... still shows that reading is done in 56k chunks. I haven''t see any change in performance. Perhaps iostat doesn''t say what I think it does. Using dd if=/dev/rdsk/.. gives 256k, and dd if=zfsfile gives 128k read sizes. On 5/15/07, Jonathan Edwards <Jonathan.Edwards at sun.com> wrote:> > On May 15, 2007, at 13:13, J?rgen Keil wrote: > > >> Would you mind also doing: > >> > >> ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000 > >> > >> to see the raw performance of underlying hardware. > > > > This dd command is reading from the block device, > > which might cache dataand probably splits requests > > into "maxphys" pieces (which happens to be 56K on an > > x86 box). > > to increase this to say 8MB, add the following to /etc/system: > > set maxphys=0x800000 > > and you''ll probably want to increase sd_max_xfer_size as > well (should be 256K on x86/x64) .. add the following to > /kernel/drv/sd.conf: > > sd_max_xfer_size=0x800000; > > then reboot to get the kernel and sd tunings to take. > > --- > .je > > btw - the defaults on sparc: > maxphys = 128K > ssd_max_xfer_size = maxphys > sd_max_xfer_size = maxphys > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Marko Milisavljevic
2007-May-16 05:41 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
On 5/15/07, johansen-osdev at sun.com <johansen-osdev at sun.com> wrote:> > Each drive is freshly formatted with one 2G file copied to it. > > How are you creating each of these files?zpool create tank c0d0 c0d1; zfs create tank/test; cp ~/bigfile /tank/test/ Actual content of the file is random junk from /dev/random.> Also, would you please include a the output from the isalist(1) command?pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86> Have you double-checked that this isn''t a measurement problem by > measuring zfs with zpool iostat (see zpool(1M)) and verifying that > outputs from both iostats match?Both give same kb/s.> How much memory is in this box?1.5g, I can see in /var/adm/messages that it is recognized.> As root, type mdb -k, and then at the ">" prompt that appears, enter the > following command (this is one very long line): > > *sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | ::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t satadrv_features_support satadrv_settings satadrv_features_enabledThis gives me "mdb: failed to dereference symbol: unknown symbol name". I don''t know enough about the syntax here to try to isolate which token it is complaining about. But, I don''t know if my PCI/SATA card is going through sd driver, if that is what commands above assume... my understanding is that sil3114 goes through ata driver, as per this blog: http://blogs.sun.com/mlf/entry/ata_on_solaris_x86_at If there is any other testing I can do, I would be happy to.
Marko Milisavljevic
2007-May-16 09:47 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Got excited too quickly on one thing... reading single zfs file does give me almost same speed as dd /dev/dsk... around 78MB/s... however, creating a 2-drive stripe, still doesn''t perform as well as it ought to: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 294.3 0.0 37675.6 0.0 0.0 0.4 0.0 1.4 0 40 c3d0 293.0 0.0 37504.9 0.0 0.0 0.4 0.0 1.4 0 40 c3d1 Simultaneous dd on those 2 drives from /dev/dsk runs at 46MB/s per drive. r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 800.4 0.0 44824.6 0.0 0.0 1.8 0.0 2.2 0 99 c3d0 792.1 0.0 44357.9 0.0 0.0 1.8 0.0 2.2 0 98 c3d1 (and in Linux it saturates PCI bus at 60MB/s per drive) On 5/15/07, Marko Milisavljevic <marko at cognistudio.com> wrote:> > set zfs:zfs_prefetch_disable=1 > > bingo! > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 609.0 0.0 77910.0 0.0 0.0 0.8 0.0 1.4 0 83 c0d0 > > only 1-2 % slower then dd from /dev/dsk. Do you think this is general > 32-bit problem, or specific to this combination of hardware? I am > using PCI/SATA Sil3114 card, and other then ZFS, performance of this > interface has some limitations in Solaris. That is, single drive gives > 80MB/s, but doing dd /dev/dsk/xyz simultaneously on 2 drives attached > to the card gives only 46MB/s each. On Linux, however, that gives > 60MB/s each, close to saturating theoretical throughput of PCI bus. > Having both drives in zpool stripe gives, with prefetch disabled, > close to 45MB/s each through dd from zfs file.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070516/79ce320b/attachment.html>
Matthew Ahrens
2007-May-16 16:29 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> Got excited too quickly on one thing... reading single zfs file does > give me almost same speed as dd /dev/dsk... around 78MB/s... however, > creating a 2-drive stripe, still doesn''t perform as well as it ought to:Yes, that makes sense. Because prefetch is disabled, ZFS will only issue one read i/o at a time (for that stream). This is one of the reasons prefetch is important :-) Eg, in your output below you can see that each disk is only busy 40% of the time when using ZFS with no prefetch:> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 294.3 0.0 37675.6 0.0 0.0 0.4 0.0 1.4 0 40 c3d0 > 293.0 0.0 37504.9 0.0 0.0 0.4 0.0 1.4 0 40 c3d1 > > Simultaneous dd on those 2 drives from /dev/dsk runs at 46MB/s per drive. > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 800.4 0.0 44824.6 0.0 0.0 1.8 0.0 2.2 0 99 c3d0 > 792.1 0.0 44357.9 0.0 0.0 1.8 0.0 2.2 0 98 c3d1--matt
Matthew Ahrens
2007-May-16 16:32 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> now lets try: > set zfs:zfs_prefetch_disable=1 > > bingo! > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 609.0 0.0 77910.0 0.0 0.0 0.8 0.0 1.4 0 83 c0d0 > > only 1-2 % slower then dd from /dev/dsk. Do you think this is general > 32-bit problem, or specific to this combination of hardware?I suspect that it''s fairly generic, but more analysis will be necessary.> Finally, should I file a bug somewhere regarding prefetch, or is this > a known issue?It may be related to 6469558, but yes please do file another bug report. I''ll have someone on the ZFS team take a look at it. --matt
Marko Milisavljevic
2007-May-16 17:06 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
I will do that, but I''ll do a couple of things first, to try to isolate the problem more precisely: - Use ZFS on a plain PATA drive on onboard IDE connector to see if it works with prefetch on this 32-bit machine. - Use this PCI-SATA card in a 64-bit, 2g RAM machine and see how it performs there, and also compare it to that machine''s onboard ICH7 SATA interface (I assume I can force it to use AHCI drivers or not by changing the mode of operation for ICH7 in BIOS). Marko On 5/16/07, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> > > > Finally, should I file a bug somewhere regarding prefetch, or is this > > a known issue? > > It may be related to 6469558, but yes please do file another bug report. > I''ll have someone on the ZFS team take a look at it. > > --matt >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070516/023d80f5/attachment.html>
johansen-osdev at sun.com
2007-May-16 17:26 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
> >*sata_hba_list::list sata_hba_inst_t satahba_next | ::print > >sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | > >::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | > >::print -a sata_drive_info_t satadrv_features_support satadrv_settings > >satadrv_features_enabled> This gives me "mdb: failed to dereference symbol: unknown symbol > name".You may not have the SATA module installed. If you type: ::modinfo ! grep sata and don''t get any output, your sata driver is attached some other way. My apologies for the confusion. -K
johansen-osdev at sun.com
2007-May-16 18:38 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
At Matt''s request, I did some further experiments and have found that this appears to be particular to your hardware. This is not a general 32-bit problem. I re-ran this experiment on a 1-disk pool using a 32 and 64-bit kernel. I got identical results: 64-bit ===== $ /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 20.1 user 0.0 sys 1.2 62 Mb/s # /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 19.0 user 0.0 sys 2.6 65 Mb/s 32-bit ===== /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 20.1 user 0.0 sys 1.7 62 Mb/s # /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 19.1 user 0.0 sys 4.3 65 Mb/s -j On Wed, May 16, 2007 at 09:32:35AM -0700, Matthew Ahrens wrote:> Marko Milisavljevic wrote: > >now lets try: > >set zfs:zfs_prefetch_disable=1 > > > >bingo! > > > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 609.0 0.0 77910.0 0.0 0.0 0.8 0.0 1.4 0 83 c0d0 > > > >only 1-2 % slower then dd from /dev/dsk. Do you think this is general > >32-bit problem, or specific to this combination of hardware? > > I suspect that it''s fairly generic, but more analysis will be necessary. > > >Finally, should I file a bug somewhere regarding prefetch, or is this > >a known issue? > > It may be related to 6469558, but yes please do file another bug report. > I''ll have someone on the ZFS team take a look at it. > > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
johansen-osdev at sun.com
2007-May-16 20:18 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Marko, Matt and I discussed this offline some more and he had a couple of ideas about double-checking your hardware. It looks like your controller (or disks, maybe?) is having trouble with multiple simultaneous I/Os to the same disk. It looks like prefetch aggravates this problem. When I asked Matt what we could do to verify that it''s the number of concurrent I/Os that is causing performance to be poor, he had the following suggestions: set zfs_vdev_{min,max}_pending=1 and run with prefetch on, then iostat should show 1 outstanding io and perf should be good. or turn prefetch off, and have multiple threads reading concurrently, then iostat should show multiple outstanding ios and perf should be bad. Let me know if you have any additional questions. -j On Wed, May 16, 2007 at 11:38:24AM -0700, johansen-osdev at sun.com wrote:> At Matt''s request, I did some further experiments and have found that > this appears to be particular to your hardware. This is not a general > 32-bit problem. I re-ran this experiment on a 1-disk pool using a 32 > and 64-bit kernel. I got identical results: > > 64-bit > =====> > $ /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 20.1 > user 0.0 > sys 1.2 > > 62 Mb/s > > # /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 19.0 > user 0.0 > sys 2.6 > > 65 Mb/s > > 32-bit > =====> > /usr/bin/time dd if=/testpool1/filebench/testfile of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 20.1 > user 0.0 > sys 1.7 > > 62 Mb/s > > # /usr/bin/time dd if=/dev/dsk/c1t3d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 19.1 > user 0.0 > sys 4.3 > > 65 Mb/s > > -j > > On Wed, May 16, 2007 at 09:32:35AM -0700, Matthew Ahrens wrote: > > Marko Milisavljevic wrote: > > >now lets try: > > >set zfs:zfs_prefetch_disable=1 > > > > > >bingo! > > > > > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > > 609.0 0.0 77910.0 0.0 0.0 0.8 0.0 1.4 0 83 c0d0 > > > > > >only 1-2 % slower then dd from /dev/dsk. Do you think this is general > > >32-bit problem, or specific to this combination of hardware? > > > > I suspect that it''s fairly generic, but more analysis will be necessary. > > > > >Finally, should I file a bug somewhere regarding prefetch, or is this > > >a known issue? > > > > It may be related to 6469558, but yes please do file another bug report. > > I''ll have someone on the ZFS team take a look at it. > > > > --matt > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marko Milisavljevic
2007-May-17 06:58 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Thank you, following your suggestion improves things - reading a ZFS file from a RAID-0 pair now gives me 95MB/sec - about the same as from /dev/dsk. What I find surprising is that reading from RAID-1 2-drive zpool gives me only 56MB/s - I imagined it would be roughly like reading from RAID-0. I can see that it can''t be identical - when reading mirrored drives simultaneously, some data will need to be skipped if the file is laid out sequentially, but it doesn''t seem intuitively obvious how my broken drvers/card would affect it to that degree, especially since reading from a file from one-disk zpool gives me 70MB/s. My plan was to make 4-disk RAID-Z - we''ll see how it works out when all drives arrive. Given how common Sil3114 chipset is in my-old-computer-became-home-server segment, I am sure this workaround will be appreciated by many who google their way here. And just in case it is not clear, what j means below is to add these two lines in /etc/system: set zfs:zfs_vdev_min_pending=1 set zfs:zfs_vdev_max_pending=1 I''ve been doing a lot of reading, and it seem unlikely that any effort will be made to address the driver performance with either ATA or Sil311x chipset specifically - by the time more pressing enhancements are made with various SATA drivers, this will be too obsolete to matter. With your workaround things are working well enough for the purpose that I am able to chose Solaris over Linux - thanks again. Marko On 5/16/07, johansen-osdev at sun.com <johansen-osdev at sun.com> wrote:> Marko, > Matt and I discussed this offline some more and he had a couple of ideas > about double-checking your hardware. > > It looks like your controller (or disks, maybe?) is having trouble with > multiple simultaneous I/Os to the same disk. It looks like prefetch > aggravates this problem. > > When I asked Matt what we could do to verify that it''s the number of > concurrent I/Os that is causing performance to be poor, he had the > following suggestions: > > set zfs_vdev_{min,max}_pending=1 and run with prefetch on, then > iostat should show 1 outstanding io and perf should be good. > > or turn prefetch off, and have multiple threads reading > concurrently, then iostat should show multiple outstanding ios > and perf should be bad. > > Let me know if you have any additional questions. > > -j
Richard Elling
2007-May-17 14:50 UTC
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
queuing theory should explain this rather nicely. iostat measures %busy by counting if there is an entry in the queue for the clock ticks. There are two queues, one in the controller and one on the disk. As you can clearly see the way ZFS pushes the load is very different than dd or UFS. -- richard Marko Milisavljevic wrote:> I am very grateful to everyone who took the time to run a few tests to > help me figure what is going on. As per j''s suggestions, I tried some > simultaneous reads, and a few other things, and I am getting interesting > and confusing results. > > All tests are done using two Seagate 320G drives on sil3114. In each > test I am using dd if=.... of=/dev/null bs=128k count=10000. Each drive > is freshly formatted with one 2G file copied to it. That way dd from raw > disk and from file are using roughly same area of disk. I tried using > raw, zfs and ufs, single drives and two simultaneously (just executing > dd commands in separate terminal windows). These are snapshots of iostat > -xnczpm 3 captured somewhere in the middle of the operation. I am not > bothering to report CPU% as it never rose over 50%, and was uniformly > proportional to reported throughput. > > single drive raw: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1378.4 0.0 77190.7 0.0 0.0 1.7 0.0 1.2 0 98 c0d1 > > single drive, ufs file > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1255.1 0.0 69949.6 0.0 0.0 1.8 0.0 1.4 0 100 c0d0 > > Small slowdown, but pretty good. > > single drive, zfs file > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 258.3 0.0 33066.6 0.0 33.0 2.0 127.7 7.7 100 100 c0d1 > > Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s > / r/s gives 256K, as I would imagine it should. > > simultaneous raw: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 797.0 0.0 44632.0 0.0 0.0 1.8 0.0 2.3 0 100 c0d0 > 795.7 0.0 44557.4 0.0 0.0 1.8 0.0 2.3 0 100 c0d1 > > This PCI interface seems to be saturated at 90MB/s. Adequate if the goal > is to serve files on gigabit SOHO network. > > sumultaneous raw on c0d1 and ufs on c0d0: > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 722.4 0.0 40246.8 0.0 0.0 1.8 0.0 2.5 0 100 c0d0 > 717.1 0.0 40156.2 0.0 0.0 1.8 0.0 2.5 0 99 c0d1 > > hmm, can no longer get the 90MB/sec. > > simultaneous zfs on c0d1 and raw on c0d0: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.7 0.0 1.8 0.0 0.0 0.0 0.1 0 0 c1d0 > 334.9 0.0 18756.0 0.0 0.0 1.9 0.0 5.5 0 97 c0d0 > 172.5 0.0 22074.6 0.0 33.0 2.0 191.3 11.6 100 100 c0d1 > > Everything is slow. > > What happens if we throw onboard IDE interface into the mix? > simultaneous raw SATA and raw PATA: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1036.3 0.3 58033.9 0.3 0.0 1.6 0.0 1.6 0 99 c1d0 > 1422.6 0.0 79668.3 0.0 0.0 1.6 0.0 1.1 1 98 c0d0 > > Both at maximum throughput. > > Read ZFS on SATA drive and raw disk on PATA interface: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1018.9 0.3 57056.1 4.0 0.0 1.7 0.0 1.7 0 99 c1d0 > 268.4 0.0 34353.1 0.0 33.0 2.0 122.9 7.5 100 100 c0d0 > > SATA is slower with ZFS as expected by now, but ATA remains at full > speed. So they are operating quite independantly. Except... > > What if we read a UFS file from the PATA disk and ZFS from SATA: > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 792.8 0.0 44092.9 0.0 0.0 1.8 0.0 2.2 1 98 c1d0 > 224.0 0.0 28675.2 0.0 33.0 2.0 147.3 8.9 100 100 c0d0 > > Now that is confusing! Why did SATA/ZFS slow down too? I''ve retried this > a number of times, not a fluke. > > Finally, after reviewing all this, I''ve noticed another interesting > bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s > over r/s is 56k, suggesting that underlying IO system is using that as > some kind of a native block size? (even though dd is requesting 128k). > But when reading ZFS files, this always comes to 128k, which is > expected, since that is ZFS default (and same thing happens regardless > of bs= in dd). On the theory that my system just doesn''t like 128k reads > (I''m desperate!), and that this would explain the whole slowdown and > wait/wsvc_t column, I tried changing recsize to 32k and rewriting the > test file. However, accessing ZFS files continues to show 128k reads, > and it is just as slow. Is there a way to either confirm that the ZFS > file in question is indeed written with 32k records or, even better, to > force ZFS to use 56k when accessing the disk. Or perhaps I just > misunderstand implications of iostat output. > > I''ve repeated each of these tests a few times and doublechecked, and the > numbers, although snapshots of a point in time, fairly represent averages. > > I have no idea what to make of all this, except that it ZFS has a > problem with this hardware/drivers that UFS and other traditional file > systems, don''t. Is it a bug in the driver that ZFS is inadvertently > exposing? A specific feature that ZFS assumes the hardware to have, but > it doesn''t? Who knows! I will have to give up on Solaris/ZFS on this > hardware for now, but I hope to try it again sometime in the future. > I''ll give FreeBSD/ZFS a spin to see if it fares better (although at this > point in its development it is probably more risky then just sticking > with Linux and missing out on ZFS). > > (Another contributor suggested turning checksumming off - it made no > difference. Same for atime. Compression was always off.) > > On 5/14/07, * johansen-osdev at sun.com <mailto:johansen-osdev at sun.com>* > <johansen-osdev at sun.com <mailto:johansen-osdev at sun.com>> wrote: > > Marko, > > I tried this experiment again using 1 disk and got nearly identical > times: > > # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 > 10000+0 records in > 10000+0 records out > > real 21.4 > user 0.0 > sys 2.4 > > $ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k > count=10000 > 10000+0 records in > 10000+0 records out > > real 21.0 > user 0.0 > sys 0.7 > > > > [I]t is not possible for dd to meaningfully access multiple-disk > > configurations without going through the file system. I find it > > curious that there is such a large slowdown by going through file > > system (with single drive configuration), especially compared to UFS > > or ext3. > > Comparing a filesystem to raw dd access isn''t a completely fair > comparison either. Few filesystems actually layout all of their data > and metadata so that every read is a completely sequential read. > > > I simply have a small SOHO server and I am trying to evaluate > which OS to > > use to keep a redundant disk array. With unreliable > consumer-level hardware, > > ZFS and the checksum feature are very interesting and the primary > selling > > point compared to a Linux setup, for as long as ZFS can generate > enough > > bandwidth from the drive array to saturate single gigabit ethernet. > > I would take Bart''s reccomendation and go with Solaris on something > like a > dual-core box with 4 disks. > > > My hardware at the moment is the "wrong" choice for Solaris/ZFS - > PCI 3114 > > SATA controller on a 32-bit AthlonXP, according to many posts I > found. > > Bill Moore lists some controller reccomendations here: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html > <http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html> > > > However, since dd over raw disk is capable of extracting 75+MB/s > from this > > setup, I keep feeling that surely I must be able to get at least > that much > > from reading a pair of striped or mirrored ZFS drives. But I > can''t - single > > drive or 2-drive stripes or mirrors, I only get around 34MB/s > going through > > ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.) > > Maybe this is a problem with your controller? What happens when you > have two simultaneous dd''s to different disks running? This would > simulate the case where you''re reading from the two disks at the same > time. > > -j > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Torrey McMahon
2007-May-19 20:49 UTC
[zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
Jonathan Edwards wrote:> > On May 15, 2007, at 13:13, J?rgen Keil wrote: > >>> Would you mind also doing: >>> >>> ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=10000 >>> >>> to see the raw performance of underlying hardware. >> >> This dd command is reading from the block device, >> which might cache dataand probably splits requests >> into "maxphys" pieces (which happens to be 56K on an >> x86 box). > > to increase this to say 8MB, add the following to /etc/system: > > set maxphys=0x800000 > > and you''ll probably want to increase sd_max_xfer_size as > well (should be 256K on x86/x64) .. add the following to > /kernel/drv/sd.conf: > > sd_max_xfer_size=0x800000; > > then reboot to get the kernel and sd tunings to take. > > --- > .je > > btw - the defaults on sparc: > maxphys = 128K > ssd_max_xfer_size = maxphys > sd_max_xfer_size = maxphysMaybe we should file a bug to increase the max transfer request sizes?
Trygve Laugstøl
2007-May-20 10:48 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote:> Thank you, following your suggestion improves things - reading a ZFS > file from a RAID-0 pair now gives me 95MB/sec - about the same as from > /dev/dsk. What I find surprising is that reading from RAID-1 2-drive > zpool gives me only 56MB/s - I imagined it would be roughly like > reading from RAID-0. I can see that it can''t be identical - when > reading mirrored drives simultaneously, some data will need to be > skipped if the file is laid out sequentially, but it doesn''t seem > intuitively obvious how my broken drvers/card would affect it to that > degree, especially since reading from a file from one-disk zpool gives > me 70MB/s. My plan was to make 4-disk RAID-Z - we''ll see how it works > out when all drives arrive. > > Given how common Sil3114 chipset is in > my-old-computer-became-home-server segment, I am sure this workaround > will be appreciated by many who google their way here. And just in > case it is not clear, what j means below is to add these two lines in > /etc/system: > > set zfs:zfs_vdev_min_pending=1 > set zfs:zfs_vdev_max_pending=1I just tried the same myself but got these warnins when booting: May 20 01:22:29 deservio genunix: [ID 492708 kern.notice] sorry, variable ''zfs_vdev_min_pending'' is not defined in the ''zfs'' May 20 01:22:29 deservio genunix: [ID 966847 kern.notice] module May 20 01:22:29 deservio genunix: [ID 100000 kern.notice] May 20 01:22:29 deservio genunix: [ID 492708 kern.notice] sorry, variable ''zfs_vdev_max_pending'' is not defined in the ''zfs'' May 20 01:22:29 deservio genunix: [ID 966847 kern.notice] module May 20 01:22:29 deservio genunix: [ID 100000 kern.notice] I''m running b60.
Marko Milisavljevic
2007-May-21 05:42 UTC
[zfs-discuss] Lots of overhead with ZFS - what am I doing wrong?
It is definitely defined in b63... not sure when it got introduced. http://src.opensolaris.org/source/xref/onnv/aside/usr/src/cmd/mdb/common/modules/zfs/zfs.c shows tunable parameters for ZFS, under "zfs_params(...)" On 5/20/07, Trygve Laugst?l <trygvis at codehaus.org> wrote:> Marko Milisavljevic wrote: > > Given how common Sil3114 chipset is in > > my-old-computer-became-home-server segment, I am sure this workaround > > will be appreciated by many who google their way here. And just in > > case it is not clear, what j means below is to add these two lines in > > /etc/system: > > > > set zfs:zfs_vdev_min_pending=1 > > set zfs:zfs_vdev_max_pending=1 > > I just tried the same myself but got these warnins when booting: > > May 20 01:22:29 deservio genunix: [ID 492708 kern.notice] sorry, > variable ''zfs_vdev_min_pending'' is not defined in the ''zfs'' > May 20 01:22:29 deservio genunix: [ID 966847 kern.notice] module > May 20 01:22:29 deservio genunix: [ID 100000 kern.notice] > May 20 01:22:29 deservio genunix: [ID 492708 kern.notice] sorry, > variable ''zfs_vdev_max_pending'' is not defined in the ''zfs'' > May 20 01:22:29 deservio genunix: [ID 966847 kern.notice] module > May 20 01:22:29 deservio genunix: [ID 100000 kern.notice] > > I''m running b60.