I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. One example is doing a ''gtar xzvf php-5.2.0.tar.gz''... over NFS onto a ZFS filesystem. this takes: real 5m12.423s user 0m0.936s sys 0m4.760s Locally on the server (to the same ZFS filesystem) takes: real 0m4.415s user 0m1.884s sys 0m3.395s The same job over NFS to a UFS filesystem takes real 1m22.725s user 0m0.901s sys 0m4.479s Same job locally on server to same UFS filesystem: real 0m10.150s user 0m2.121s sys 0m4.953s This is easily reproducible even with single large files, but the multiple small files seems to illustrate some awful sync latency between each file. Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, but they don''t seem relevant since the local ZFS performance is quite good. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Another thing to keep an eye out for is disk caching. With ZFS, > whenever the NFS server tells us to make sure something is on disk, we > actually make sure it''s on disk by asking the drive to flush dirty data > in its write cache out to the media. Needless to say, this takes a > while. > > With UFS, it isn''t aware of the extra level of caching, and happily > pretends it''s in a world where once the drive ACKs a write, it''s on > stable storage. > > If you use format(1M) and take a look at whether or not the drive''s > write cache is enabled, that should shed some light on this. If it''s > on, try turning it off and re-run your NFS tests on ZFS vs. UFS. > > Either way, let us know what you find out.Slightly OT but you just reminded me of why I like disks that have Sun firmware on them. They never have write cache on. At least I have never seen it. Read cache yes but write cache never. At least in the Seagates and Fujitsus Ultra320 SCSI/FCAL disks that have a Sun logo on them. I have no idea what else that Sun firmware does on a SCSI disk but I''d love to know :-) Dennis _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Another thing to keep an eye out for is disk caching. With ZFS, whenever the NFS server tells us to make sure something is on disk, we actually make sure it''s on disk by asking the drive to flush dirty data in its write cache out to the media. Needless to say, this takes a while. With UFS, it isn''t aware of the extra level of caching, and happily pretends it''s in a world where once the drive ACKs a write, it''s on stable storage. If you use format(1M) and take a look at whether or not the drive''s write cache is enabled, that should shed some light on this. If it''s on, try turning it off and re-run your NFS tests on ZFS vs. UFS. Either way, let us know what you find out. --Bill On Tue, Jan 02, 2007 at 12:40:26PM -0800, Brad Plecs wrote:> I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. > After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. > > One example is doing a ''gtar xzvf php-5.2.0.tar.gz''... over NFS onto a ZFS filesystem. this takes: > > real 5m12.423s > user 0m0.936s > sys 0m4.760s > > Locally on the server (to the same ZFS filesystem) takes: > > real 0m4.415s > user 0m1.884s > sys 0m3.395s > > The same job over NFS to a UFS filesystem takes > > real 1m22.725s > user 0m0.901s > sys 0m4.479s > > Same job locally on server to same UFS filesystem: > > real 0m10.150s > user 0m2.121s > sys 0m4.953s > > > This is easily reproducible even with single large files, but the multiple small files > seems to illustrate some awful sync latency between each file. > > Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, > but they don''t seem relevant since the local ZFS performance is quite good. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Dennis Clarke wrote:>> Another thing to keep an eye out for is disk caching. With ZFS, >> whenever the NFS server tells us to make sure something is on disk, we >> actually make sure it''s on disk by asking the drive to flush dirty data >> in its write cache out to the media. Needless to say, this takes a >> while. >> >> With UFS, it isn''t aware of the extra level of caching, and happily >> pretends it''s in a world where once the drive ACKs a write, it''s on >> stable storage. >> >> If you use format(1M) and take a look at whether or not the drive''s >> write cache is enabled, that should shed some light on this. If it''s >> on, try turning it off and re-run your NFS tests on ZFS vs. UFS. >> >> Either way, let us know what you find out. >> > > Slightly OT but you just reminded me of why I like disks that have Sun > firmware on them. They never have write cache on. At least I have never > seen it. Read cache yes but write cache never. At least in the Seagates and > Fujitsus Ultra320 SCSI/FCAL disks that have a Sun logo on them. >We turned if off when we could but it was possible to re-enable it...which I believe ZFS will do. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Brad Plecs wrote:> I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. > After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. > > One example is doing a ''gtar xzvf php-5.2.0.tar.gz''... over NFS onto a ZFS filesystem. this takes: > > real 5m12.423s > user 0m0.936s > sys 0m4.760s > > Locally on the server (to the same ZFS filesystem) takes: > > real 0m4.415s > user 0m1.884s > sys 0m3.395s > > The same job over NFS to a UFS filesystem takes > > real 1m22.725s > user 0m0.901s > sys 0m4.479s > > Same job locally on server to same UFS filesystem: > > real 0m10.150s > user 0m2.121s > sys 0m4.953s > > > This is easily reproducible even with single large files, but the multiple small files > seems to illustrate some awful sync latency between each file. > > Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, > but they don''t seem relevant since the local ZFS performance is quite good. >Known issue, discussed here: http://www.opensolaris.org/jive/thread.jspa?threadID=14696&tstart=15 benr. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi Brad, I believe benr experienced the same/similar issue here: http://www.opensolaris.org/jive/message.jspa?messageID=77347 If it is the same, I believe its a known ZFS/NFS interaction bug, and has to do with small file creation. Best Regards, Jason On 1/2/07, Brad Plecs <bplecs@cs.umd.edu> wrote:> I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. > After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. > > One example is doing a ''gtar xzvf php-5.2.0.tar.gz''... over NFS onto a ZFS filesystem. this takes: > > real 5m12.423s > user 0m0.936s > sys 0m4.760s > > Locally on the server (to the same ZFS filesystem) takes: > > real 0m4.415s > user 0m1.884s > sys 0m3.395s > > The same job over NFS to a UFS filesystem takes > > real 1m22.725s > user 0m0.901s > sys 0m4.479s > > Same job locally on server to same UFS filesystem: > > real 0m10.150s > user 0m2.121s > sys 0m4.953s > > > This is easily reproducible even with single large files, but the multiple small files > seems to illustrate some awful sync latency between each file. > > Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, > but they don''t seem relevant since the local ZFS performance is quite good. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ah, thanks -- reading that thread did a good job of explaining what I was seeing. I was going nuts trying to isolate the problem. Is work being done to improve this performance? 100% of my users are coming in over NFS, and that''s a huge hit. Even on single large files, writes are slower by a factor of 2 to 10 compared to if I copy via scp or onto a non-zfs filesystem. Thanks! This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I''ve just generated some data for an upcoming blog entry on the subject. This is about a small file tar extract : All times are elapse (single 72GB SAS disk) Local and memory based filesystems tmpfs : 0.077 sec ufs : 0.25 sec zfs : 0.12 sec NFS service that can end up corrupting client''s view of data: nfs/ufs : 7 sec (write cache enable) nfs/zfs : 4.2 sec (write cache enable, zil_disable=1) nfs/zfs : 4.7 sec (write cache disable, zil_disable=1) NFS service that will not corrupt the client''s view: nfs/ufs : 17 sec (write cache disable) nfs/zfs : 12 sec (write cache disable, zil_disable=0) nfs/zfs : 7 sec (write cache enable, zil_disable=0) ZFS numbers tend to have more variability from run to run than UFS. I still need to plow through the data to figure a few things out. Watch this space for more info... -r _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch - PAE wrote:> I''ve just generated some data for an upcoming blog entry on > the subject. This is about a small file tar extract : > > All times are elapse (single 72GB SAS disk) > > Local and memory based filesystems > > tmpfs : 0.077 sec > ufs : 0.25 sec > zfs : 0.12 sec > > NFS service that can end up corrupting client''s view of data: > > nfs/ufs : 7 sec (write cache enable) > nfs/zfs : 4.2 sec (write cache enable, zil_disable=1) > nfs/zfs : 4.7 sec (write cache disable, zil_disable=1) > > NFS service that will not corrupt the client''s view: > > nfs/ufs : 17 sec (write cache disable) > nfs/zfs : 12 sec (write cache disable, zil_disable=0) > nfs/zfs : 7 sec (write cache enable, zil_disable=0)That is very interesting data since it actually has ZFS being faster that UFS in all cases which isn''t what I''ve heard people claim. If you haven''t already done so it would be interesting to add UFS/SVM in there as well just for "completeness". It would also be interesting to see how each RAID style compares here and what the numbers are when "rewritting" the files (for example unpack the tar file on top of itself rather than into a "fresh" filesystem). -- Darren J Moffat _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
write cache was enabled on all the ZFS drives, but disabling it gave a negligible speed improvement: (FWIW, the pool has 50 drives) (write cache on) /bin/time tar xf /tmp/vbulletin_3-6-4.tar real 51.6 user 0.0 sys 1.0 (write cache off) /bin/time tar xf /tmp/vbulletin_3-6-4.tar real 49.2 user 0.0 sys 1.0 ...this is a production system, so I attribute the 2-second (4%) difference more to variable system activity than to the write cache. I suppose I could test with larger samples, but since this is still ten times slower than I want, I think this effectively discounts the disk write cache as anything significant. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss