thr3ads.net - zfs discuss - [zfs-discuss] poor NFS/ZFS performance [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Matthew B Sweeney - Sun Microsystems Inc.

2006-Nov-21 15:59 UTC

[zfs-discuss] poor NFS/ZFS performance

Hi
I have an application that use NFS between a Thumper and a 4600.  The  
Thumper exports 2 ZFS filesystems that the 4600 uses as an inqueue and 
outqueue.

The machines are connected via a point to point 10GE link, all NFS is 
done over that link.  The NFS performance doing a simple cp from one 
partition to the other is well below what I''d expect , 58 MB/s. 
I''ve
some NFS tweaks, tweaks to the neterion cards (soft rings etc) , and 
tweaks to the TCP stack on both sides to no avail.  Jumbo frames are 
enabled and working, which improves performance, but doesn''t make it
fly.

I''ve tested the link with iperf and have been able to sustain 5 - 6 
Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk stripe) 
perform very well (450 - 500 MB/s sustained).

My research points to disabling the ZIL.  So far the only way I''ve
found
to disable the ZIL is through mdb, echo ''zil_disable/W 1''|mdb
-kw.  My
question is can I achieve this setting via a /kernel/drv/zfs.conf or 
/etc/system parameter?


Thanks
Matt

-- 

Matt Sweeney
Engagement Architect
Sun Microsystems
585-368-5930/x29097	desk
585-727-0573	cell

Roch - PAE

2006-Nov-21 16:28 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Matthew B Sweeney - Sun Microsystems Inc. writes:
 > Hi
 > I have an application that use NFS between a Thumper and a 4600.  The  
 > Thumper exports 2 ZFS filesystems that the 4600 uses as an inqueue and 
 > outqueue.
 > 
 > The machines are connected via a point to point 10GE link, all NFS is 
 > done over that link.  The NFS performance doing a simple cp from one 
 > partition to the other is well below what I''d expect , 58 MB/s. 
I''ve
 > some NFS tweaks, tweaks to the neterion cards (soft rings etc) , and 
 > tweaks to the TCP stack on both sides to no avail.  Jumbo frames are 
 > enabled and working, which improves performance, but doesn''t make
it fly.
 > 
 > I''ve tested the link with iperf and have been able to sustain 5 -
6
 > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk stripe) 
 > perform very well (450 - 500 MB/s sustained).
 > 
 > My research points to disabling the ZIL.  So far the only way
I''ve found
 > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw.  My
 > question is can I achieve this setting via a /kernel/drv/zfs.conf or 
 > /etc/system parameter?
 > 

You may set in in /etc/system. We''re thinking of renaming
the variable to 

	set zfs_please_corrupt_my_client''s_data = 1

Just kidding (about the name) but it will corrupt your data.

-r


 > 
 > Thanks
 > Matt
 > 
 > -- 
 > 
 > Matt Sweeney
 > Engagement Architect
 > Sun Microsystems
 > 585-368-5930/x29097	desk
 > 585-727-0573	cell
 > 
 > 
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Joe Little

2006-Nov-21 17:19 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On 11/21/06, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:>
> Matthew B Sweeney - Sun Microsystems Inc. writes:
>  > Hi
>  > I have an application that use NFS between a Thumper and a 4600.  The
>  > Thumper exports 2 ZFS filesystems that the 4600 uses as an inqueue
and
>  > outqueue.
>  >
>  > The machines are connected via a point to point 10GE link, all NFS is
>  > done over that link.  The NFS performance doing a simple cp from one
>  > partition to the other is well below what I''d expect , 58
MB/s.  I''ve
>  > some NFS tweaks, tweaks to the neterion cards (soft rings etc) , and
>  > tweaks to the TCP stack on both sides to no avail.  Jumbo frames are
>  > enabled and working, which improves performance, but doesn''t
make it fly.
>  >
>  > I''ve tested the link with iperf and have been able to
sustain 5 - 6
>  > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk stripe)
>  > perform very well (450 - 500 MB/s sustained).
>  >
>  > My research points to disabling the ZIL.  So far the only way
I''ve found
>  > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw.  My
>  > question is can I achieve this setting via a /kernel/drv/zfs.conf or
>  > /etc/system parameter?
>  >
>
> You may set in in /etc/system. We''re thinking of renaming
> the variable to
>
>         set zfs_please_corrupt_my_client''s_data = 1
>
> Just kidding (about the name) but it will corrupt your data.
>
> -r
>
>
Yes, we''ve entered this thread multiple times before, where NFS
basically sucks compared to the relative performance locally. I''m
waiting, ever so eagerly, for the per pool (or was it per FS)
properties that give finer grained control of the ZIL, named
"sync_deferred". Where is that by the way?

(From Neil:)> NP> We once had plans to add a mount option to allow the admin
> NP> to control the ZIL. Here''s a brief section of the RFE
(6280630):
>
> NP>          sync={deferred,standard,forced}
>
> NP>                  Controls synchronous semantics for the dataset.
>
> NP>                  When set to ''standard'' (the
default), synchronous operations
> NP>                  such as fsync(3C) behave precisely as defined in
> NP>                  fcntl.h(3HEAD).
>
> NP>                  When set to ''deferred'', requests
for synchronous semantics
> NP>                  are ignored.  However, ZFS still guarantees that
ordering
> NP>                  is preserved -- that is, consecutive operations
reach stable
> NP>                  storage in order.  (If a thread performs operation
A followed
> NP>                  by operation B, then the moment that B reaches
stable storage,
> NP>                  A is guaranteed to be on stable storage as well.) 
ZFS also
> NP>                  guarantees that all operations will be scheduled
for write to
> NP>                  stable storage within a few seconds, so that an
unexpected
> NP>                  power loss only takes the last few seconds of
change with it.
>
> NP>                  When set to ''forced'', all
operations become synchronous.
> NP>                  No operation will return until all previous
operations
> NP>                  have been committed to stable storage.  This option
can be
> NP>                  useful if an application is found to depend on
synchronous
> NP>                  semantics without actually requesting them;
otherwise, it
> NP>                  will just make everything slow, and is not
recommended.

Matthew B Sweeney - Sun Microsystems Inc.

2006-Nov-21 17:54 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Roch,

Am I barking up the wrong tree?  Or is ZFS over NFS not the right solution?

As I understand zil''s functionality I may lose updates, but the 
filesystem would remain intact


from  http://www.opensolaris.org/jive/thread.jspa?messageID=20935&#20935
> The ZIL is not required for fsckless operation. If you turned off
> the ZIL, all it would mean is that in the event of a crash, it would
> appear that some of the most recent (last few seconds) synchronous
> system calls never happened. In other words, we wouldn''t have net
> the O_DSYNC specification, but the filesystem would nevertheless
> still be perfectly consistent on disk.
>
> Jeff 

This application isn''t anything transactional, a file is read,
processed
and a new (modified) file is written to another store.  So if all I''m 
risking is the current open file, I can have the application rewrite it.

I haven''t had a chance to test this yet , the machines are physically 
somewhere else and not networked to the outside world.  Should I be 
using UFS over NFS?


Thanks
Matt



Roch - PAE wrote On 11/21/06 11:28,:
>Matthew B Sweeney - Sun Microsystems Inc. writes:
> > Hi
> > I have an application that use NFS between a Thumper and a 4600.  The
> > Thumper exports 2 ZFS filesystems that the 4600 uses as an inqueue and
> > outqueue.
> > 
> > The machines are connected via a point to point 10GE link, all NFS is 
> > done over that link.  The NFS performance doing a simple cp from one 
> > partition to the other is well below what I''d expect , 58
MB/s.  I''ve
> > some NFS tweaks, tweaks to the neterion cards (soft rings etc) , and 
> > tweaks to the TCP stack on both sides to no avail.  Jumbo frames are 
> > enabled and working, which improves performance, but doesn''t
make it fly.
> > 
> > I''ve tested the link with iperf and have been able to sustain
5 - 6
> > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk stripe) 
> > perform very well (450 - 500 MB/s sustained).
> > 
> > My research points to disabling the ZIL.  So far the only way
I''ve found
> > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw.  My
> > question is can I achieve this setting via a /kernel/drv/zfs.conf or 
> > /etc/system parameter?
> > 
>
>You may set in in /etc/system. We''re thinking of renaming
>the variable to 
>
>	set zfs_please_corrupt_my_client''s_data = 1
>
>Just kidding (about the name) but it will corrupt your data.
>
>-r
>
>
> > 
> > Thanks
> > Matt
> > 
> > -- 
> > 
> > Matt Sweeney
> > Engagement Architect
> > Sun Microsystems
> > 585-368-5930/x29097	desk
> > 585-727-0573	cell
> > 
> > 
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>_______________________________________________
>zfs-discuss mailing list
>zfs-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>  
>
-- 

Matt Sweeney
Engagement Architect
Sun Microsystems
585-368-5930/x29097	desk
585-727-0573	cell


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061121/edd5ed46/attachment.html>

Jonathan Edwards

2006-Nov-21 17:56 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On Nov 21, 2006, at 12:19, Joe Little wrote:
> On 11/21/06, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:
>>
>> Matthew B Sweeney - Sun Microsystems Inc. writes:
>>  > Hi
>>  > I have an application that use NFS between a Thumper and a  
>> 4600.  The
>>  > Thumper exports 2 ZFS filesystems that the 4600 uses as an  
>> inqueue and
>>  > outqueue.
>>  >
>>  > The machines are connected via a point to point 10GE link, all  
>> NFS is
>>  > done over that link.  The NFS performance doing a simple cp  
>> from one
>>  > partition to the other is well below what I''d expect ,
58 MB/
>> s.  I''ve
>>  > some NFS tweaks, tweaks to the neterion cards (soft rings  
>> etc) , and
>>  > tweaks to the TCP stack on both sides to no avail.  Jumbo  
>> frames are
>>  > enabled and working, which improves performance, but
doesn''t
>> make it fly.
>>  >
>>  > I''ve tested the link with iperf and have been able to
sustain 5
>> - 6
>>  > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk
stripe)
>>  > perform very well (450 - 500 MB/s sustained).
>>  >
>>  > My research points to disabling the ZIL.  So far the only way  
>> I''ve found
>>  > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -
>> kw.  My
>>  > question is can I achieve this setting via a /kernel/drv/ 
>> zfs.conf or
>>  > /etc/system parameter?
>>  >
>>
>> You may set in in /etc/system. We''re thinking of renaming
>> the variable to
>>
>>         set zfs_please_corrupt_my_client''s_data = 1
>>
>> Just kidding (about the name) but it will corrupt your data.
>>
>> -r
>>
>>
>
> Yes, we''ve entered this thread multiple times before, where NFS
> basically sucks compared to the relative performance locally. I''m
> waiting, ever so eagerly, for the per pool (or was it per FS)
> properties that give finer grained control of the ZIL, named
> "sync_deferred". Where is that by the way?
>
> (From Neil:)
>> NP> We once had plans to add a mount option to allow the admin
>> NP> to control the ZIL. Here''s a brief section of the RFE
(6280630):
>>
>> NP>          sync={deferred,standard,forced}
>>
>> NP>                  Controls synchronous semantics for the dataset.
>>
>> NP>                  When set to ''standard'' (the
default),
>> synchronous operations
>> NP>                  such as fsync(3C) behave precisely as defined
in
>> NP>                  fcntl.h(3HEAD).
>>
>> NP>                  When set to ''deferred'',
requests for
>> synchronous semantics
>> NP>                  are ignored.  However, ZFS still guarantees  
>> that ordering
>> NP>                  is preserved -- that is, consecutive  
>> operations reach stable
>> NP>                  storage in order.  (If a thread performs  
>> operation A followed
>> NP>                  by operation B, then the moment that B  
>> reaches stable storage,
>> NP>                  A is guaranteed to be on stable storage as  
>> well.)  ZFS also
>> NP>                  guarantees that all operations will be  
>> scheduled for write to
>> NP>                  stable storage within a few seconds, so that  
>> an unexpected
>> NP>                  power loss only takes the last few seconds of  
>> change with it.
>>
>> NP>                  When set to ''forced'', all
operations become
>> synchronous.
>> NP>                  No operation will return until all previous  
>> operations
>> NP>                  have been committed to stable storage.  This  
>> option can be
>> NP>                  useful if an application is found to depend  
>> on synchronous
>> NP>                  semantics without actually requesting them;  
>> otherwise, it
>> NP>                  will just make everything slow, and is not  
>> recommended.
the problem with this is that it''s essentially cheating particularly  
when you get
a commit operation from NFS pushing an fsync() .. I know that most  
linux kernels
do this sort of thing with NFS async by lying to the client, but with  
non-battery
backed memory - this pretty much seems like a bad idea to me.  What  
you really
want is some sort of fast write-through, but that would require a  
major rewrite in
several places and a rethink on some of the design philosophy.

.je

Roch - PAE

2006-Nov-21 18:09 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Matthew B Sweeney - Sun Microsystems Inc. writes:
 > Roch,
 > 
 > Am I barking up the wrong tree?  Or is ZFS over NFS not the right
solution?
 > 
 > As I understand zil''s functionality I may lose updates, but the 
 > filesystem would remain intact
 > 

The server filesystem   will remain intact.   The problem is
the  view  provided to the client   in the face of  a server
crash/reboot.  The issue is  not specific to  ZFS and affects
any  client talking to any server-side  FS that would ignore
the client''s commit request (such as linux over a WCE device).

The issue would show up on the client as:

#cd /mynfsmount
#sum fileA
31998 1 fileA
#cp fileA fileB  (in the middle of the cp, server
		  crashes/reboot, this small cp takes 5 minutes to run)
#sum fileB
60712 1 fileB

So the cp is succesful but the output fileB is corrupted.

Nothing ZFS related. Just a server side FS ignoring commits.

	

 > 
 > from 
http://www.opensolaris.org/jive/thread.jspa?messageID=20935&#20935
 > 
 > > The ZIL is not required for fsckless operation. If you turned off
 > > the ZIL, all it would mean is that in the event of a crash, it would
 > > appear that some of the most recent (last few seconds) synchronous
 > > system calls never happened. In other words, we wouldn''t
have net
 > > the O_DSYNC specification, but the filesystem would nevertheless
 > > still be perfectly consistent on disk.
 > >
 > > Jeff 
 > 
 > 
 > This application isn''t anything transactional, a file is read,
processed
 > and a new (modified) file is written to another store.  So if all
I''m
 > risking is the current open file, I can have the application rewrite it.
 > 
 > I haven''t had a chance to test this yet , the machines are
physically
 > somewhere else and not networked to the outside world.  Should I be 
 > using UFS over NFS?
 > 

If UFS  is much faster than ZFS/latest bits (and it may be 
I don''t know), then we need to work on whatever is causing
this but disabling the ZIL is not the proper way (IMO).

-r

 > 
 > Thanks
 > Matt
 > 
 > 
 > 
 > Roch - PAE wrote On 11/21/06 11:28,:
 > 
 > >Matthew B Sweeney - Sun Microsystems Inc. writes:
 > > > Hi
 > > > I have an application that use NFS between a Thumper and a 4600.
The
 > > > Thumper exports 2 ZFS filesystems that the 4600 uses as an
inqueue and
 > > > outqueue.
 > > > 
 > > > The machines are connected via a point to point 10GE link, all
NFS is
 > > > done over that link.  The NFS performance doing a simple cp from
one
 > > > partition to the other is well below what I''d expect ,
58 MB/s.  I''ve
 > > > some NFS tweaks, tweaks to the neterion cards (soft rings etc) ,
and
 > > > tweaks to the TCP stack on both sides to no avail.  Jumbo frames
are
 > > > enabled and working, which improves performance, but
doesn''t make it fly.
 > > > 
 > > > I''ve tested the link with iperf and have been able to
sustain 5 - 6
 > > > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk
stripe)
 > > > perform very well (450 - 500 MB/s sustained).
 > > > 
 > > > My research points to disabling the ZIL.  So far the only way
I''ve found
 > > > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw.  My
 > > > question is can I achieve this setting via a
/kernel/drv/zfs.conf or
 > > > /etc/system parameter?
 > > > 
 > >
 > >You may set in in /etc/system. We''re thinking of renaming
 > >the variable to 
 > >
 > >	set zfs_please_corrupt_my_client''s_data = 1
 > >
 > >Just kidding (about the name) but it will corrupt your data.
 > >
 > >-r
 > >
 > >
 > > > 
 > > > Thanks
 > > > Matt
 > > > 
 > > > -- 
 > > > 
 > > > Matt Sweeney
 > > > Engagement Architect
 > > > Sun Microsystems
 > > > 585-368-5930/x29097	desk
 > > > 585-727-0573	cell
 > > > 
 > > > 
 > > > _______________________________________________
 > > > zfs-discuss mailing list
 > > > zfs-discuss at opensolaris.org
 > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 > >_______________________________________________
 > >zfs-discuss mailing list
 > >zfs-discuss at opensolaris.org
 > >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >  
 > >
 > 
 > -- 
 > 
 > Matt Sweeney
 > Engagement Architect
 > Sun Microsystems
 > 585-368-5930/x29097	desk
 > 585-727-0573	cell
 >

Jim Hranicky

2006-Nov-21 19:40 UTC

head link

[zfs-discuss] Re: poor NFS/ZFS performance

This works for me in /etc/system:

   set zfs:zil_disable=1

I had to run 

   bootadm update-archive

then reboot on x86 -- not sure if that''s needed on SPARC.
 
 
This message posted from opensolaris.org

Joe Little

2006-Nov-21 20:36 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On 11/21/06, Matthew B Sweeney - Sun Microsystems Inc.
<Matthew.Sweeney at sun.com> wrote:>
>  Roch,
>
>  Am I barking up the wrong tree?  Or is ZFS over NFS not the right
solution?
>
I strongly believe it is.. We just are at odds as to some philosophy.
Either we need NVRAM backed storage between NFS and ZFS, battery
backed-memory that can survive other subsystem failure, or a change in
the code path to allow some discretion here. Currently, the third
option, 6280630, ZIL syncronicity, or as I reference it, sync_deferred
functionality.

A combination is best, but the sooner this arrives, the better for
anyone who needs a general purpose file server / NAS that compares
anywhere near to the competion.
>  As I understand zil''s functionality I may lose updates, but the
filesystem
> would remain intact
>
>
>  from
> http://www.opensolaris.org/jive/thread.jspa?messageID=20935&#20935
>
> The ZIL is not required for fsckless operation. If you turned off
>  the ZIL, all it would mean is that in the event of a crash, it would
>  appear that some of the most recent (last few seconds) synchronous
>  system calls never happened. In other words, we wouldn''t have net
>  the O_DSYNC specification, but the filesystem would nevertheless
>  still be perfectly consistent on disk.
>
>  Jeff
>  This application isn''t anything transactional, a file is read,
processed
> and a new (modified) file is written to another store.  So if all
I''m
> risking is the current open file, I can have the application rewrite it.
>
>  I haven''t had a chance to test this yet , the machines are
physically
> somewhere else and not networked to the outside world.  Should I be using
> UFS over NFS?
>
>
>  Thanks
>  Matt
>
>
>
>  Roch - PAE wrote On 11/21/06 11:28,:
>  Matthew B Sweeney - Sun Microsystems Inc. writes:
>  > Hi
>  > I have an application that use NFS between a Thumper and a 4600. The
>  > Thumper exports 2 ZFS filesystems that the 4600 uses as an inqueue
and
>  > outqueue.
>  >
>  > The machines are connected via a point to point 10GE link, all NFS is
>  > done over that link. The NFS performance doing a simple cp from one
>  > partition to the other is well below what I''d expect , 58
MB/s. I''ve
>  > some NFS tweaks, tweaks to the neterion cards (soft rings etc) , and
>  > tweaks to the TCP stack on both sides to no avail. Jumbo frames are
>  > enabled and working, which improves performance, but doesn''t
make it fly.
>  >
>  > I''ve tested the link with iperf and have been able to
sustain 5 - 6
>  > Gb/s. The local ZFS file systems (12 disk stripe, 34 disk stripe)
>  > perform very well (450 - 500 MB/s sustained).
>  >
>  > My research points to disabling the ZIL. So far the only way
I''ve found
>  > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw. My
>  > question is can I achieve this setting via a /kernel/drv/zfs.conf or
>  > /etc/system parameter?
>  >
>
> You may set in in /etc/system. We''re thinking of renaming
> the variable to
>
>  set zfs_please_corrupt_my_client''s_data = 1
>
> Just kidding (about the name) but it will corrupt your data.
>
> -r
>
>
>  >
>  > Thanks
>  > Matt
>  >
>  > --
>  >
>  > Matt Sweeney
>  > Engagement Architect
>  > Sun Microsystems
>  > 585-368-5930/x29097 desk
>  > 585-727-0573 cell
>  >
>  >
>  > _______________________________________________
>  > zfs-discuss mailing list
>  > zfs-discuss at opensolaris.org
>  > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>  --
>
> Matt Sweeney
> Engagement Architect
> Sun Microsystems
> 585-368-5930/x29097 desk
> 585-727-0573 cell
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>

Chad Leigh -- Shire.Net LLC

2006-Nov-22 05:27 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On Nov 21, 2006, at 1:36 PM, Joe Little wrote:
> On 11/21/06, Matthew B Sweeney - Sun Microsystems Inc.
> <Matthew.Sweeney at sun.com> wrote:
>>
>>  Roch,
>>
>>  Am I barking up the wrong tree?  Or is ZFS over NFS not the right  
>> solution?
>>
>
> I strongly believe it is.. We just are at odds as to some philosophy.
> Either we need NVRAM backed storage between NFS and ZFS, battery
> backed-memory that can survive other subsystem failure, or a change in
> the code path to allow some discretion here. Currently, the third
> option, 6280630, ZIL syncronicity, or as I reference it, sync_deferred
> functionality.
>
> A combination is best, but the sooner this arrives, the better for
> anyone who needs a general purpose file server / NAS that compares
> anywhere near to the competion.
I had heard that some stuff in the latest OS and coming in Sol10 U3  
should greatly help in NFS/ZFS performance.  Something to do with ZFS  
not synching the entire pool on every sync but just the stuff needed  
or something like that.  I heard it kind of 2nd or 3rd hand so cannot  
be to detailed in my description.  Can someone here "in the know"  
confirm that this is so (or not)?

Thanks
Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061121/7d2d2104/attachment.bin>

Roch - PAE

2006-Nov-22 11:47 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

To accelerate NFS (in particular single threaded loads)
you need (somewhat badly) some *RAM between the Server FS and 
it''s storage; that *RAM is where NFS commited data may be stored.

If the *RAM does not survive a server reboot, the client is
at risk of seeing corruption.

For example, UFS over  WCE storage will  be fast and corrupt
prone  (from the client side  point  of view).  ZFS over WCE
storage   behaves  differently    because  it   manages  the
writecache is  a  way    that makes serving  NFS  slow   but
safe. zil_disable can be used to make ZFS serve NFS fast and
corrupt prone (from the client side  point  of view).

-r



Joe Little writes:
 > On 11/21/06, Matthew B Sweeney - Sun Microsystems Inc.
 > <Matthew.Sweeney at sun.com> wrote:
 > >
 > >  Roch,
 > >
 > >  Am I barking up the wrong tree?  Or is ZFS over NFS not the right
solution?
 > >
 > 
 > I strongly believe it is.. We just are at odds as to some philosophy.
 > Either we need NVRAM backed storage between NFS and ZFS, battery
 > backed-memory that can survive other subsystem failure, or a change in
 > the code path to allow some discretion here. Currently, the third
 > option, 6280630, ZIL syncronicity, or as I reference it, sync_deferred
 > functionality.
 > 
 > A combination is best, but the sooner this arrives, the better for
 > anyone who needs a general purpose file server / NAS that compares
 > anywhere near to the competion.
 > 
 > >  As I understand zil''s functionality I may lose updates, but
the filesystem
 > > would remain intact
 > >
 > >
 > >  from
 > >
http://www.opensolaris.org/jive/thread.jspa?messageID=20935&#20935
 > >
 > > The ZIL is not required for fsckless operation. If you turned off
 > >  the ZIL, all it would mean is that in the event of a crash, it would
 > >  appear that some of the most recent (last few seconds) synchronous
 > >  system calls never happened. In other words, we wouldn''t
have net
 > >  the O_DSYNC specification, but the filesystem would nevertheless
 > >  still be perfectly consistent on disk.
 > >
 > >  Jeff
 > >  This application isn''t anything transactional, a file is
read, processed
 > > and a new (modified) file is written to another store.  So if all
I''m
 > > risking is the current open file, I can have the application rewrite
it.
 > >
 > >  I haven''t had a chance to test this yet , the machines are
physically
 > > somewhere else and not networked to the outside world.  Should I be
using
 > > UFS over NFS?
 > >
 > >
 > >  Thanks
 > >  Matt
 > >
 > >
 > >
 > >  Roch - PAE wrote On 11/21/06 11:28,:
 > >  Matthew B Sweeney - Sun Microsystems Inc. writes:
 > >  > Hi
 > >  > I have an application that use NFS between a Thumper and a
4600. The
 > >  > Thumper exports 2 ZFS filesystems that the 4600 uses as an
inqueue and
 > >  > outqueue.
 > >  >
 > >  > The machines are connected via a point to point 10GE link, all
NFS is
 > >  > done over that link. The NFS performance doing a simple cp from
one
 > >  > partition to the other is well below what I''d expect ,
58 MB/s. I''ve
 > >  > some NFS tweaks, tweaks to the neterion cards (soft rings etc)
, and
 > >  > tweaks to the TCP stack on both sides to no avail. Jumbo frames
are
 > >  > enabled and working, which improves performance, but
doesn''t make it fly.
 > >  >
 > >  > I''ve tested the link with iperf and have been able to
sustain 5 - 6
 > >  > Gb/s. The local ZFS file systems (12 disk stripe, 34 disk
stripe)
 > >  > perform very well (450 - 500 MB/s sustained).
 > >  >
 > >  > My research points to disabling the ZIL. So far the only way
I''ve found
 > >  > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw. My
 > >  > question is can I achieve this setting via a
/kernel/drv/zfs.conf or
 > >  > /etc/system parameter?
 > >  >
 > >
 > > You may set in in /etc/system. We''re thinking of renaming
 > > the variable to
 > >
 > >  set zfs_please_corrupt_my_client''s_data = 1
 > >
 > > Just kidding (about the name) but it will corrupt your data.
 > >
 > > -r
 > >
 > >
 > >  >
 > >  > Thanks
 > >  > Matt
 > >  >
 > >  > --
 > >  >
 > >  > Matt Sweeney
 > >  > Engagement Architect
 > >  > Sun Microsystems
 > >  > 585-368-5930/x29097 desk
 > >  > 585-727-0573 cell
 > >  >
 > >  >
 > >  > _______________________________________________
 > >  > zfs-discuss mailing list
 > >  > zfs-discuss at opensolaris.org
 > >  > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 > > _______________________________________________
 > > zfs-discuss mailing list
 > > zfs-discuss at opensolaris.org
 > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 > >
 > >  --
 > >
 > > Matt Sweeney
 > > Engagement Architect
 > > Sun Microsystems
 > > 585-368-5930/x29097 desk
 > > 585-727-0573 cell
 > >
 > >
 > >
 > > _______________________________________________
 > > zfs-discuss mailing list
 > > zfs-discuss at opensolaris.org
 > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 > >
 > >

Al Hopper

2006-Nov-22 23:11 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On Tue, 21 Nov 2006, Joe Little wrote:
> On 11/21/06, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:
> >
> > Matthew B Sweeney - Sun Microsystems Inc. writes:
> >  > Hi
> >  > I have an application that use NFS between a Thumper and a 4600.
The
> >  > Thumper exports 2 ZFS filesystems that the 4600 uses as an
inqueue and
> >  > outqueue.
> >  >
> >  > The machines are connected via a point to point 10GE link, all
NFS is
> >  > done over that link.  The NFS performance doing a simple cp from
one
> >  > partition to the other is well below what I''d expect ,
58 MB/s.  I''ve
> >  > some NFS tweaks, tweaks to the neterion cards (soft rings etc) ,
and
> >  > tweaks to the TCP stack on both sides to no avail.  Jumbo frames
are
> >  > enabled and working, which improves performance, but
doesn''t make it fly.
> >  >
> >  > I''ve tested the link with iperf and have been able to
sustain 5 - 6
> >  > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk
stripe)
> >  > perform very well (450 - 500 MB/s sustained).
> >  >
> >  > My research points to disabling the ZIL.  So far the only way
I''ve found
> >  > to disable the ZIL is through mdb, echo ''zil_disable/W
1''|mdb -kw.  My
> >  > question is can I achieve this setting via a
/kernel/drv/zfs.conf or
> >  > /etc/system parameter?
> >  >
> >
> > You may set in in /etc/system. We''re thinking of renaming
> > the variable to
> >
> >         set zfs_please_corrupt_my_client''s_data = 1
> >
> > Just kidding (about the name) but it will corrupt your data.
> >
> > -r
> >
> >
>
> Yes, we''ve entered this thread multiple times before, where NFS
> basically sucks compared to the relative performance locally. I''m
> waiting, ever so eagerly, for the per pool (or was it per FS)
> properties that give finer grained control of the ZIL, named
> "sync_deferred". Where is that by the way?
Agreed - it sucks - especially for small file use.  Here''s a 5,000 ft
view
of the performance while unzipping and extracting a tar archive.  First
the test is run on a SPARC 280R running Build 51a with dual 900MHz USIII
CPUs and 4Gb of RAM:

$ cp emacs-21.4a.tar.gz /tmp
$ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -

real       13.092
user        2.083
sys         0.183

Next, the test in run on the same box in /tmp

$ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -

real        2.983
user        2.038
sys         0.201

Next the test is run on a NFS mount of a zfs filesystem on a 5 disk
raidz device over a gigabit ethernet interface with only two hosts on the
VLAN (the zfs server is a dual socket AMD whitebox with two dual-core
2.2GHz CPUs):

$ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -

real     2:32.667
user        2.410
sys         0.233

Houston - we have a problem.  What OS is the ZFS based NFS server running?
I can''t say, but lets say that its close to Update 3.

Next we move emacs-21.4a.tar.gz to the NFS server and run it in the same
filesystem that is NFS mounted to the 280R:

$ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -

real        3.365
user        0.880
sys         0.154

No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.

Happy Thanksgiving (to those stateside).

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

Chad Leigh -- Shire.Net LLC

2006-Nov-22 23:18 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On Nov 22, 2006, at 4:11 PM, Al Hopper wrote:
> No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.
Has anyone tried sharing a ZFS fs using samba or afs or something  
else besides nfs?  Do we have the same issues?

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061122/f1df4b6b/attachment.bin>

Joe Little

2006-Nov-22 23:26 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On 11/22/06, Chad Leigh -- Shire.Net LLC <chad at shire.net>
wrote:>
> On Nov 22, 2006, at 4:11 PM, Al Hopper wrote:
>
> > No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.
>
> Has anyone tried sharing a ZFS fs using samba or afs or something
> else besides nfs?  Do we have the same issues?
>
I''ve done some CIFS tests in the past, and off the top of my head, it
was about 3-5x faster than NFS.

> Chad
>
> ---
> Chad Leigh -- Shire.Net LLC
> Your Web App and Email hosting provider
> chad at shire.net
>
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>
>

Cameron Bahar

2006-Nov-22 23:34 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Yes, I''ve tried NFS and CIFS. I wouldn''t call this a problem
though. This is
the way it was designed to work to prevent loss of client data. If you want
faster performance put a battery-backed RAID card in your system and turn on
write-back caching on the card so that the RAM in the RAID controller
effectively acts as your NVRAM. I''ve tested this and obviously get much
better performance for small files. If you want to compare with other
filesystems that don''t guarantee your data, then as others have pointed
out
you can disable the ZIL and take your chances. Here you''re no worst off
than
other os/fs implementations that lie to you when they tell you that
they''ve
committed your data to persistent storage and keep it in RAM and risk the
failure modes associated with that. If you avoid remote filesystems like
NFS/CIFS and run locally, then this is obviously not an issue.

Cameron
--

On 11/22/06, Chad Leigh -- Shire.Net LLC <chad at shire.net>
wrote:>
>
> On Nov 22, 2006, at 4:11 PM, Al Hopper wrote:
>
> > No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.
>
> Has anyone tried sharing a ZFS fs using samba or afs or something
> else besides nfs?  Do we have the same issues?
>
> Chad
>
> ---
> Chad Leigh -- Shire.Net LLC
> Your Web App and Email hosting provider
> chad at shire.net
>
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061122/ad993526/attachment.html>

Dennis Clarke

2006-Nov-23 00:34 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Have a gander below :
> Agreed - it sucks - especially for small file use.  Here''s a 5,000
ft view
> of the performance while unzipping and extracting a tar archive.  First
> the test is run on a SPARC 280R running Build 51a with dual 900MHz USIII
> CPUs and 4Gb of RAM:
>
> $ cp emacs-21.4a.tar.gz /tmp
> $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
>
> real       13.092
> user        2.083
> sys         0.183
here is my machine here ( Solaris 8 Ultra 2 200MHz )

# cd /tmp
# ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes
74570.00k).
/export/home/dclarke/star: Total time 11.057sec (6744 kBytes/sec)

real       11.146
user        0.300
sys         1.762

and the same test on the same machine with a local UFS filesystem :

# cd /mnt/test
# ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes
74570.00k).
/export/home/dclarke/star: Total time 92.378sec (807 kBytes/sec)

real     1:32.463
user        0.351
sys         3.658

Pretty much what I expect for an old old Solaris 8 box.

Then I try using a mounted NFS filesystem shared from ZFS on snv_46

# cat /etc/release
                           Solaris Nevada snv_46 SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 14 August 2006

# zfs set sharenfs=nosub,nosuid,rw=pluto,root=pluto zfs0/backup
# zfs get sharenfs zfs0/backup
NAME             PROPERTY       VALUE                      SOURCE
zfs0/backup      sharenfs       nosub,nosuid,rw=pluto,root=pluto  local
#

# tip hardwire
connected

pluto console login: root
Password:
Nov 22 18:41:50 pluto login: ROOT LOGIN /dev/console
Last login: Tue Nov 21 02:07:39 on console
Sun Microsystems Inc.   SunOS 5.8       Generic Patch   February 2004
# cat /etc/release
                       Solaris 8 2/04 s28s_hw4wos_05a SPARC
           Copyright 2004 Sun Microsystems, Inc.  All Rights Reserved.
                            Assembled 08 January 2004

# dfshares mars
RESOURCE                                  SERVER ACCESS    TRANSPORT
      mars:/export/zfs/backup               mars  -         -
      mars:/export/zfs/qemu                 mars  -         -
#

# mkdir /export/nfs
# mount -F nfs -o bg,intr,nosuid mars:/export/zfs/backup /export/nfs
#
# cd /export/nfs/titan
# ls -lap
total 142780
drwxr-xr-x   3 dclarke  other          8 Nov 22 19:08 ./
drwxr-xr-x   9 root     sys           12 Nov 15 20:14 ../
-rw-r--r--   1 phil     csw        13102 Jul 12 12:32 README.csw
-rw-r--r--   1 dclarke  csw       189389 Sep 14 19:33 ae-2.2.0.tar.gz
-rw-r--r--   1 dclarke  csw      91965440 Jul 25 12:56 dclarke.tar
-rw-r--r--   1 dclarke  csw      20403483 Nov 22 19:07 emacs-21.4a.tar.gz
-rw-r--r--   1 dclarke  csw      5468160 Jul 25 12:57 root.tar
drwxr-xr-x   5 dclarke  csw            5 May 24  2006 schily/
#

Now that my Solaris 8 box has a mounted ZFS/NFS filesystem I test again

# ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes
74570.00k).
/export/home/dclarke/star: Total time 215.958sec (345 kBytes/sec)

real     3:36.048
user        0.397
sys         5.961
#

That is based on the ZFS/NFS mounted filesystem.

What if I run the same test on my server locally? On ZFS ?

# ptime /root/bin/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/root/bin/star: 7457 blocks + 0 bytes (total of 76359680 bytes = 74570.00k).
/root/bin/star: Total time 32.238sec (2313 kBytes/sec)

real       32.680
user        6.973
sys         9.945
#

So gee ... thats all pretty slow but really really slow with ZFS shared out
via NFS.

wow .. good to know.   I *never* would have seen that coming.

Dennis

Roch - PAE

2006-Nov-23 09:33 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Hi Al, You conclude:

	No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.

But my reading of your data leads to:

	single threaded small file creation is much slower
	over NFS than locally. regardless of the server FS.

It''s been posted on this alias before, Change ZFS to
anything else and it won''t change the conclusion.
NFS/AnyFS is a bad combination for single threaded tar x.

-r

Al Hopper writes:
 > On Tue, 21 Nov 2006, Joe Little wrote:
 > 
 > > On 11/21/06, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:
 > > >
 > > > Matthew B Sweeney - Sun Microsystems Inc. writes:
 > > >  > Hi
 > > >  > I have an application that use NFS between a Thumper and a
4600.  The
 > > >  > Thumper exports 2 ZFS filesystems that the 4600 uses as an
inqueue and
 > > >  > outqueue.
 > > >  >
 > > >  > The machines are connected via a point to point 10GE link,
all NFS is
 > > >  > done over that link.  The NFS performance doing a simple
cp from one
 > > >  > partition to the other is well below what I''d
expect , 58 MB/s.  I''ve
 > > >  > some NFS tweaks, tweaks to the neterion cards (soft rings
etc) , and
 > > >  > tweaks to the TCP stack on both sides to no avail.  Jumbo
frames are
 > > >  > enabled and working, which improves performance, but
doesn''t make it fly.
 > > >  >
 > > >  > I''ve tested the link with iperf and have been
able to sustain 5 - 6
 > > >  > Gb/s.  The local ZFS file systems (12 disk stripe, 34 disk
stripe)
 > > >  > perform very well (450 - 500 MB/s sustained).
 > > >  >
 > > >  > My research points to disabling the ZIL.  So far the only
way I''ve found
 > > >  > to disable the ZIL is through mdb, echo
''zil_disable/W 1''|mdb -kw.  My
 > > >  > question is can I achieve this setting via a
/kernel/drv/zfs.conf or
 > > >  > /etc/system parameter?
 > > >  >
 > > >
 > > > You may set in in /etc/system. We''re thinking of
renaming
 > > > the variable to
 > > >
 > > >         set zfs_please_corrupt_my_client''s_data = 1
 > > >
 > > > Just kidding (about the name) but it will corrupt your data.
 > > >
 > > > -r
 > > >
 > > >
 > >
 > > Yes, we''ve entered this thread multiple times before, where
NFS
 > > basically sucks compared to the relative performance locally.
I''m
 > > waiting, ever so eagerly, for the per pool (or was it per FS)
 > > properties that give finer grained control of the ZIL, named
 > > "sync_deferred". Where is that by the way?
 > 
 > Agreed - it sucks - especially for small file use.  Here''s a
5,000 ft view
 > of the performance while unzipping and extracting a tar archive.  First
 > the test is run on a SPARC 280R running Build 51a with dual 900MHz USIII
 > CPUs and 4Gb of RAM:
 > 
 > $ cp emacs-21.4a.tar.gz /tmp
 > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 > 
 > real       13.092
 > user        2.083
 > sys         0.183
 > 
 > Next, the test in run on the same box in /tmp
 > 
 > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 > 
 > real        2.983
 > user        2.038
 > sys         0.201
 > 
 > Next the test is run on a NFS mount of a zfs filesystem on a 5 disk
 > raidz device over a gigabit ethernet interface with only two hosts on the
 > VLAN (the zfs server is a dual socket AMD whitebox with two dual-core
 > 2.2GHz CPUs):
 > 
 > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 > 
 > real     2:32.667
 > user        2.410
 > sys         0.233
 > 
 > Houston - we have a problem.  What OS is the ZFS based NFS server running?
 > I can''t say, but lets say that its close to Update 3.
 > 
 > Next we move emacs-21.4a.tar.gz to the NFS server and run it in the same
 > filesystem that is NFS mounted to the 280R:
 > 
 > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 > 
 > real        3.365
 > user        0.880
 > sys         0.154
 > 
 > No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.
 > 
 > Happy Thanksgiving (to those stateside).
 > 
 > Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
 >            Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
 > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 >              OpenSolaris Governing Board (OGB) Member - Feb 2006
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Joerg Schilling

2006-Nov-23 09:46 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

"Dennis Clarke" <dclarke at blastwave.org> wrote:
> here is my machine here ( Solaris 8 Ultra 2 200MHz )
>
> # cd /tmp
> # ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
> /export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes
> 74570.00k).
> /export/home/dclarke/star: Total time 11.057sec (6744 kBytes/sec)
>
> real       11.146
> user        0.300
> sys         1.762
>
> and the same test on the same machine with a local UFS filesystem :
>
> # cd /mnt/test
> # ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
> /export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes
> 74570.00k).
> /export/home/dclarke/star: Total time 92.378sec (807 kBytes/sec)
>
> real     1:32.463
> user        0.351
> sys         3.658
>
> Pretty much what I expect for an old old Solaris 8 box.
If you do this kind of tests, it makes sense, to repeat the test with star
-no-fsync

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Constantin Gonzalez

2006-Nov-23 10:00 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Hi,

I haven''t followed all the details in this discussion, but it seems to
me
that it all breaks down to:

- NFS on ZFS is slow due to NFS being very conservative when sending
  ACK to clients only after writes have definitely committed to disk.

- Therefore, the problem is not that much ZFS specific, it''s just a
  conscious focus on data correctness vs. speed on ZFS/NFS'' part.

- Currently known workarounds include:

  - Sacrifice correctness for speed by disabling ZIL or using a less
    conservative network file system.

  - Optimize NFS/ZFS to get as much speed as possible within the constraints
    of the NFS protocol.

But one aspect I haven''t seen so far is: How can we optimize ZFS on a
more
hardware oriented level to both achieve good NFS speeds and still preserve
the NFS level of correctness?

One possibility might be to give the ZFS pool enough spindles so it can
comfortably handle many small IOs fast enough for them not to become
NFS commit bottlenecks. This may require some tweaking on the ZFS side so
it doesn''t queue up write IOs for too long as to not delay commits more
than
necessary.

Has anyone investigated this branch or am I too simplistic in my view of the
underlying root of the problem?

Best regards,
   Constantin

-- 
Constantin Gonzalez                            Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions                http://www.sun.de/
Tel.: +49 89/4 60 08-25 91                   http://blogs.sun.com/constantin/

Roch - PAE

2006-Nov-23 10:42 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Nope, wrong conclusion again.

This large performance degradation has nothing whatsoever to 
do with ZFS. I have not seen data that would show a possible 
slowness on the part of ZFS vfs AnyFS on the
backend; there may well be and that would be an entirely
diffenrent discussion to the large slowdown the NFS induces
compares to localFS for single threaded loads.

more inline.

Constantin Gonzalez writes:
 > Hi,
 > 
 > I haven''t followed all the details in this discussion, but it
seems to me
 > that it all breaks down to:
 > 
 > - NFS on ZFS is slow due to NFS being very conservative when sending
 >   ACK to clients only after writes have definitely committed to disk.

Nope.  NFS  is slow   for single threaded  tar  extract. The
conservative approach of NFS is needed with the NFS protocol
in order to ensure client''s side data integrity. Nothing ZFS 
related.

 > 
 > - Therefore, the problem is not that much ZFS specific, it''s just
a
 >   conscious focus on data correctness vs. speed on ZFS/NFS'' part.
 > 

So nope. Purely NFS related. Nothing ZFS related.

 > - Currently known workarounds include:
 > 
 >   - Sacrifice correctness for speed by disabling ZIL or using a less
 >     conservative network file system.

Disabling the ZIL means the backing FS fails to commit
properly. The NFS protocol is cheated leading to client side 
integrity issue.

With UFS, which has similar slowness to ZFS, we can fix the
slowness by  enabling the WCE.

 > 
 >   - Optimize NFS/ZFS to get as much speed as possible within the
constraints
 >     of the NFS protocol.

Not possible. Nothing related to ZFS here and if NFS had
ways to make this better i think it would have been done in v4.

If  we extended the protocol   to allow for exclusive mounts
(single  client access) then,  I would think  that the extra
knowledge could  be used  to gain speed...  I  don''t know if
this was considered by the NFS forum.

 > 
 > But one aspect I haven''t seen so far is: How can we optimize ZFS
on a more
 > hardware oriented level to both achieve good NFS speeds and still preserve
 > the NFS level of correctness?
 > 
 > One possibility might be to give the ZFS pool enough spindles so it can
 > comfortably handle many small IOs fast enough for them not to become
 > NFS commit bottlenecks. This may require some tweaking on the ZFS side so
 > it doesn''t queue up write IOs for too long as to not delay
commits more than
 > necessary.

NFS is plenty fast in a throughput context (not that it does 
not need work). The complaints we have here are about single 
threaded code.

 > 
 > Has anyone investigated this branch or am I too simplistic in my view of
the
 > underlying root of the problem?
 > 

I''ll let you be the judge of that ;-)

-r

 > Best regards,
 >    Constantin
 > 
 > -- 
 > Constantin Gonzalez                            Sun Microsystems GmbH,
Germany
 > Platform Technology Group, Client Solutions               
http://www.sun.de/
 > Tel.: +49 89/4 60 08-25 91                  
http://blogs.sun.com/constantin/
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Constantin Gonzalez

2006-Nov-23 11:37 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Hi Roch,

thanks, now I better understand the issue :).
> Nope.  NFS  is slow   for single threaded  tar  extract. The
> conservative approach of NFS is needed with the NFS protocol
> in order to ensure client''s side data integrity. Nothing ZFS 
> related.
...
> NFS is plenty fast in a throughput context (not that it does 
> not need work). The complaints we have here are about single 
> threaded code.
ok, then it''s "just" a single thread client latency of
request issue, which
(as increasingly often) software vendors need to realize. The proper way to
deal with this, then, is to multi-thread on the application layer.

Reminds my of many UltraSPARC T1 issues, which don''t sit in hardware
nor
OS, but in the way applications have been developed for years :).

Best regards,
   Constantin

-- 
Constantin Gonzalez                            Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions                http://www.sun.de/
Tel.: +49 89/4 60 08-25 91                   http://blogs.sun.com/constantin/

Al Hopper

2006-Nov-23 14:25 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On Thu, 23 Nov 2006, Roch - PAE wrote:
>
> Hi Al, You conclude:
>
> 	No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.
>
> But my reading of your data leads to:
>
> 	single threaded small file creation is much slower
> 	over NFS than locally. regardless of the server FS.
>
> It''s been posted on this alias before, Change ZFS to
> anything else and it won''t change the conclusion.
> NFS/AnyFS is a bad combination for single threaded tar x.
Hi Roch - you are correct in that the data presented was incomplete.  I
did''nt present data for the same test with an NFS mount from the same
server, for a UFS based filesystem.  So here is that data point:

$ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -

real       12.671
user        2.356
sys         0.228

This test is not totally fair, in that the UFS filesystem being shared is
on a single 400Gb SATA drive being used as the boot device - versus the
5-way raidz config which consists of 5 of those same 400Gb SATA drives.
But the data clearly shows the NFS/ZFS is a bad combination: 2 minutes 33
Seconds for NFS/ZFS versus 13 Seconds (rouding up) for NFS/UFS.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

Roch - PAE

2006-Nov-23 14:37 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Al Hopper writes:
 > On Thu, 23 Nov 2006, Roch - PAE wrote:
 > 
 > >
 > > Hi Al, You conclude:
 > >
 > > 	No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.
 > >
 > > But my reading of your data leads to:
 > >
 > > 	single threaded small file creation is much slower
 > > 	over NFS than locally. regardless of the server FS.
 > >
 > > It''s been posted on this alias before, Change ZFS to
 > > anything else and it won''t change the conclusion.
 > > NFS/AnyFS is a bad combination for single threaded tar x.
 > 
 > Hi Roch - you are correct in that the data presented was incomplete.  I
 > did''nt present data for the same test with an NFS mount from the
same
 > server, for a UFS based filesystem.  So here is that data point:
 > 
 > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 > 
 > real       12.671
 > user        2.356
 > sys         0.228
 > 
 > This test is not totally fair, in that the UFS filesystem being shared is
 > on a single 400Gb SATA drive being used as the boot device - versus the
 > 5-way raidz config which consists of 5 of those same 400Gb SATA drives.
 > But the data clearly shows the NFS/ZFS is a bad combination: 2 minutes 33
 > Seconds for NFS/ZFS versus 13 Seconds (rouding up) for NFS/UFS.
 > 


Thanks Al.

I''d put 100$ on the table that the WCE is enabled on the
SATA drive backing UFS. Even if format says it''s not, are
there not some drives which just ignore the WC disable
commands ?


-r


 > Regards,
 > 
 > Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
 >            Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
 > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 >              OpenSolaris Governing Board (OGB) Member - Feb 2006

Darren J Moffat

2006-Nov-23 15:04 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Roch - PAE wrote:> 
> Not possible. Nothing related to ZFS here and if NFS had
> ways to make this better i think it would have been done in v4.
> 
> If  we extended the protocol   to allow for exclusive mounts
> (single  client access) then,  I would think  that the extra
> knowledge could  be used  to gain speed...  I  don''t know if
> this was considered by the NFS forum.
This sounds very similar to delegations, quoting from RFC 3530

1.4.6.  Client Caching and Delegation

...

    The major addition to NFS version 4 in the area of caching is the
    ability of the server to delegate certain responsibilities to the
    client.  When the server grants a delegation for a file to a client,
    the client is guaranteed certain semantics with respect to the
    sharing of that file with other clients.  At OPEN, the server may
    provide the client either a read or write delegation for the file.
    If the client is granted a read delegation, it is assured that no
    other client has the ability to write to the file for the duration of
    the delegation.  If the client is granted a write delegation, the
    client is assured that no other client has read or write access to
    the file.


However IIRC the current state of the NFSv4 client in OpenSolaris is 
that this functionality is not supported.  My knowledge of what we have 
done in NFSv4 in OpenSolaris is a little rusty though so I highly 
recommend follow up with the NFS community.

As for NFS performance it isn''t all bad it is just as you have pointed 
out that there is a pathological case that is being used in these tests.

For some idea of what NFS can do see: 
http://blogs.sun.com/shepler/entry/spec_sfs_over_the_years


-- 
Darren J Moffat

Calum Mackay

2006-Nov-23 18:19 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

We have had file delegation on by default in NFSv4 since Solaris 10 FCS, 
putback in July 2004.

We''re currently working on also providing directory delegations -
client
caching of directory contents - as part of the upcoming NFSv4.1.

cheers,
calum.

Darren J Moffat wrote:> Roch - PAE wrote:
>>
>> Not possible. Nothing related to ZFS here and if NFS had
>> ways to make this better i think it would have been done in v4.
>>
>> If  we extended the protocol   to allow for exclusive mounts
>> (single  client access) then,  I would think  that the extra
>> knowledge could  be used  to gain speed...  I  don''t know if
>> this was considered by the NFS forum.
> 
> This sounds very similar to delegations, quoting from RFC 3530
> 
> 1.4.6.  Client Caching and Delegation
> 
> ...
> 
>    The major addition to NFS version 4 in the area of caching is the
>    ability of the server to delegate certain responsibilities to the
>    client.  When the server grants a delegation for a file to a client,
>    the client is guaranteed certain semantics with respect to the
>    sharing of that file with other clients.  At OPEN, the server may
>    provide the client either a read or write delegation for the file.
>    If the client is granted a read delegation, it is assured that no
>    other client has the ability to write to the file for the duration of
>    the delegation.  If the client is granted a write delegation, the
>    client is assured that no other client has read or write access to
>    the file.
> 
> 
> However IIRC the current state of the NFSv4 client in OpenSolaris is 
> that this functionality is not supported.  My knowledge of what we have 
> done in NFSv4 in OpenSolaris is a little rusty though so I highly 
> recommend follow up with the NFS community.
> 
> As for NFS performance it isn''t all bad it is just as you have
pointed
> out that there is a pathological case that is being used in these tests.
> 
> For some idea of what NFS can do see: 
> http://blogs.sun.com/shepler/entry/spec_sfs_over_the_years
> 
>

Bill Moore

2006-Nov-23 22:09 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

On Thu, Nov 23, 2006 at 03:37:33PM +0100, Roch - PAE
wrote:> Al Hopper writes:
>  > Hi Roch - you are correct in that the data presented was incomplete. 
I
>  > did''nt present data for the same test with an NFS mount from
the same
>  > server, for a UFS based filesystem.  So here is that data point:
>  > 
>  > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
>  > 
>  > real       12.671
>  > user        2.356
>  > sys         0.228
>  > 
>  > This test is not totally fair, in that the UFS filesystem being
shared is
>  > on a single 400Gb SATA drive being used as the boot device - versus
the
>  > 5-way raidz config which consists of 5 of those same 400Gb SATA
drives.
>  > But the data clearly shows the NFS/ZFS is a bad combination: 2
minutes 33
>  > Seconds for NFS/ZFS versus 13 Seconds (rouding up) for NFS/UFS.
> 
> I''d put 100$ on the table that the WCE is enabled on the
> SATA drive backing UFS. Even if format says it''s not, are
> there not some drives which just ignore the WC disable
> commands ?
I agree with Roch here.  With UFS, if WCE is enabled on the drives
(which I''m sure it is on Al''s SATA drives), UFS is fooled into
thinking
that when it writes a block to disk, it''s safe.  The drive returns from
the write amazingly fast (since the data only landed in cache - not the
media), so you get quick turnarounds (low latency) on NFS, which is the
only thing that matters on single-threaded performance.

With ZFS, on the other hand, not only do we write data to the drive when
NFS tells us to, but we issue a DKIOCFLUSHWRITECACHE ioctl to the
underlying device (FLUSH_CACHE on ATA, SYNCHRONIZE_CACHE on SCSI) to
ensure that the data that''s supposed to be on the disk is really, truly
on the disk.  This takes typically around 4-6ms, which is quite a while.
Again, this dictates the single-threaded NFS performance.

If you want an apples-to-apples comparison, either try the UFS/ZFS tests
on a drive that has WCE disabled, or turn off the ZIL on a drive that
has WCE enabled.  I''ll bet the difference will be rather slight,
perhaps
in favor of ZFS.

--Bill

Matt Sweeney

2006-Nov-23 22:36 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Bill,

I did the same test on the Thumper I''m working on with the NFS vols 
converted from ZFS stripes to SVM stripes.  In both cases same 
number/type of disks in the stripe.  In my very simple test ,time  for 
file in frame*; do cp /inq/$file /outq/$file; done, UFS did 
approximately 64 MB/s, the best run for ZFS was approx 58 MB/s.  Not a 
huge difference for sure, but enough to make you think about switching.  
This was single stream over a 10GE link. (x4600 mounting vols from an x4500)

Matt

Bill Moore wrote:> On Thu, Nov 23, 2006 at 03:37:33PM +0100, Roch - PAE wrote:
>   
>> Al Hopper writes:
>>  > Hi Roch - you are correct in that the data presented was
incomplete.  I
>>  > did''nt present data for the same test with an NFS mount
from the same
>>  > server, for a UFS based filesystem.  So here is that data point:
>>  > 
>>  > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
>>  > 
>>  > real       12.671
>>  > user        2.356
>>  > sys         0.228
>>  > 
>>  > This test is not totally fair, in that the UFS filesystem being
shared is
>>  > on a single 400Gb SATA drive being used as the boot device -
versus the
>>  > 5-way raidz config which consists of 5 of those same 400Gb SATA
drives.
>>  > But the data clearly shows the NFS/ZFS is a bad combination: 2
minutes 33
>>  > Seconds for NFS/ZFS versus 13 Seconds (rouding up) for NFS/UFS.
>>
>> I''d put 100$ on the table that the WCE is enabled on the
>> SATA drive backing UFS. Even if format says it''s not, are
>> there not some drives which just ignore the WC disable
>> commands ?
>>     
>
> I agree with Roch here.  With UFS, if WCE is enabled on the drives
> (which I''m sure it is on Al''s SATA drives), UFS is fooled
into thinking
> that when it writes a block to disk, it''s safe.  The drive returns
from
> the write amazingly fast (since the data only landed in cache - not the
> media), so you get quick turnarounds (low latency) on NFS, which is the
> only thing that matters on single-threaded performance.
>
> With ZFS, on the other hand, not only do we write data to the drive when
> NFS tells us to, but we issue a DKIOCFLUSHWRITECACHE ioctl to the
> underlying device (FLUSH_CACHE on ATA, SYNCHRONIZE_CACHE on SCSI) to
> ensure that the data that''s supposed to be on the disk is really,
truly
> on the disk.  This takes typically around 4-6ms, which is quite a while.
> Again, this dictates the single-threaded NFS performance.
>
> If you want an apples-to-apples comparison, either try the UFS/ZFS tests
> on a drive that has WCE disabled, or turn off the ZIL on a drive that
> has WCE enabled.  I''ll bet the difference will be rather slight,
perhaps
> in favor of ZFS.
>
>
> --Bill
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061123/386cda0f/attachment.html>

Calum Mackay

2006-Nov-24 01:16 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

Calum Mackay wrote:> We have had file delegation on by default in NFSv4 since Solaris 10 FCS, 
> putback in July 2004.
The delegation of a file gives the client certain guarantees about how 
that file may be accessed by other clients (regardless of NFS version) 
or processes local to the NFS server. If the client receives a read 
delegation it knows that no other client (or server process) may write 
to the file. If the client receives a write delegation, then no other 
client (or server process) may read or write the file.

Once a client has a delegation for a file, it may then deal locally - 
i.e. with no need to contact the server - with reads (assuming it has 
the required data cached), writes and locking. The file may even be 
closed, locally, and the client will still retain the delegation, such 
that other applications on this client may open the file, still without 
contact with the server.

Note that delegation only provides a benefit when there is no 
conflicting access. If conflicting accesses occur, the delegation is 
recalled, the client flushes its data to the server, and the conflicting 
accesses may proceed. In addition, note that a server is not *required* 
to return a delegation to the client.

I would expect a write delegation to provide some performance increase 
in the case noted, although I don''t have any numbers in front of me.

cheers,
c.

Roch - PAE

2006-Nov-24 08:59 UTC

head link

[zfs-discuss] poor NFS/ZFS performance

This thread started with the notion that ZFS provided a
very bad NFS service to the point of > 10X degradation over
say UFS. What I hope we have an agreement on, is that these
scale of performance difference does not come from ZFS but
from an NFS service that would sacrifice integrity. Enabling 
the write-cache (UFS) or disabling the zil (ZFS) are 2 ways
to get such speedup.

Here, you have a fair comparison of UFS/ZFS serving NFS and
that shows a 10% effect. It would be nice to analyze where
that difference comes from. 

-r


Matt Sweeney writes:
 > Bill,
 > 
 > I did the same test on the Thumper I''m working on with the NFS
vols
 > converted from ZFS stripes to SVM stripes.  In both cases same 
 > number/type of disks in the stripe.  In my very simple test ,time  for 
 > file in frame*; do cp /inq/$file /outq/$file; done, UFS did 
 > approximately 64 MB/s, the best run for ZFS was approx 58 MB/s.  Not a 
 > huge difference for sure, but enough to make you think about switching.  
 > This was single stream over a 10GE link. (x4600 mounting vols from an
x4500)
 > 
 > Matt
 > 
 > Bill Moore wrote:
 > > On Thu, Nov 23, 2006 at 03:37:33PM +0100, Roch - PAE wrote:
 > >   
 > >> Al Hopper writes:
 > >>  > Hi Roch - you are correct in that the data presented was
incomplete.  I
 > >>  > did''nt present data for the same test with an NFS
mount from the same
 > >>  > server, for a UFS based filesystem.  So here is that data
point:
 > >>  > 
 > >>  > $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 > >>  > 
 > >>  > real       12.671
 > >>  > user        2.356
 > >>  > sys         0.228
 > >>  > 
 > >>  > This test is not totally fair, in that the UFS filesystem
being shared is
 > >>  > on a single 400Gb SATA drive being used as the boot device
- versus the
 > >>  > 5-way raidz config which consists of 5 of those same 400Gb
SATA drives.
 > >>  > But the data clearly shows the NFS/ZFS is a bad
combination: 2 minutes 33
 > >>  > Seconds for NFS/ZFS versus 13 Seconds (rouding up) for
NFS/UFS.
 > >>
 > >> I''d put 100$ on the table that the WCE is enabled on the
 > >> SATA drive backing UFS. Even if format says it''s not,
are
 > >> there not some drives which just ignore the WC disable
 > >> commands ?
 > >>     
 > >
 > > I agree with Roch here.  With UFS, if WCE is enabled on the drives
 > > (which I''m sure it is on Al''s SATA drives), UFS is
fooled into thinking
 > > that when it writes a block to disk, it''s safe.  The drive
returns from
 > > the write amazingly fast (since the data only landed in cache - not
the
 > > media), so you get quick turnarounds (low latency) on NFS, which is
the
 > > only thing that matters on single-threaded performance.
 > >
 > > With ZFS, on the other hand, not only do we write data to the drive
when
 > > NFS tells us to, but we issue a DKIOCFLUSHWRITECACHE ioctl to the
 > > underlying device (FLUSH_CACHE on ATA, SYNCHRONIZE_CACHE on SCSI) to
 > > ensure that the data that''s supposed to be on the disk is
really, truly
 > > on the disk.  This takes typically around 4-6ms, which is quite a
while.
 > > Again, this dictates the single-threaded NFS performance.
 > >
 > > If you want an apples-to-apples comparison, either try the UFS/ZFS
tests
 > > on a drive that has WCE disabled, or turn off the ZIL on a drive that
 > > has WCE enabled.  I''ll bet the difference will be rather
slight, perhaps
 > > in favor of ZFS.
 > >
 > >
 > > --Bill
 > > _______________________________________________
 > > zfs-discuss mailing list
 > > zfs-discuss at opensolaris.org
 > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >   
 > 
 > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN">
 > <html>
 > <head>
 >   <meta content="text/html;charset=ISO-8859-1"
http-equiv="Content-Type">
 >   <title></title>
 > </head>
 > <body bgcolor="#ffffff" text="#000000">
 > Bill,<br>
 > <br>
 > I did the same test on the Thumper I''m working on with the NFS
vols
 > converted from ZFS stripes to SVM stripes.&nbsp; In both cases same
 > number/type of disks in the stripe.&nbsp; In my very simple test
,time&nbsp; for
 > file in frame*; do cp /inq/$file /outq/$file; done, UFS did
 > approximately 64 MB/s, the best run for ZFS was approx 58 MB/s.&nbsp;
Not a
 > huge difference for sure, but enough to make you think about
 > switching.&nbsp; This was single stream over a 10GE link. (x4600
mounting
 > vols from an x4500)<br>
 > <br>
 > Matt<br>
 > <br>
 > Bill Moore wrote:
 > <blockquote cite="mid20061123220937.GA4944 at eng.sun.com"
type="cite">
 >   <pre wrap="">On Thu, Nov 23, 2006 at 03:37:33PM +0100,
Roch - PAE wrote:
 >   </pre>
 >   <blockquote type="cite">
 >     <pre wrap="">Al Hopper writes:
 >  &gt; Hi Roch - you are correct in that the data presented was
incomplete.  I
 >  &gt; did''nt present data for the same test with an NFS mount
from the same
 >  &gt; server, for a UFS based filesystem.  So here is that data point:
 >  &gt; 
 >  &gt; $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -
 >  &gt; 
 >  &gt; real       12.671
 >  &gt; user        2.356
 >  &gt; sys         0.228
 >  &gt; 
 >  &gt; This test is not totally fair, in that the UFS filesystem being
shared is
 >  &gt; on a single 400Gb SATA drive being used as the boot device -
versus the
 >  &gt; 5-way raidz config which consists of 5 of those same 400Gb SATA
drives.
 >  &gt; But the data clearly shows the NFS/ZFS is a bad combination: 2
minutes 33
 >  &gt; Seconds for NFS/ZFS versus 13 Seconds (rouding up) for NFS/UFS.
 > 
 > I''d put 100$ on the table that the WCE is enabled on the
 > SATA drive backing UFS. Even if format says it''s not, are
 > there not some drives which just ignore the WC disable
 > commands ?
 >     </pre>
 >   </blockquote>
 >   <pre wrap=""><!---->
 > I agree with Roch here.  With UFS, if WCE is enabled on the drives
 > (which I''m sure it is on Al''s SATA drives), UFS is
fooled into thinking
 > that when it writes a block to disk, it''s safe.  The drive
returns from
 > the write amazingly fast (since the data only landed in cache - not the
 > media), so you get quick turnarounds (low latency) on NFS, which is the
 > only thing that matters on single-threaded performance.
 > 
 > With ZFS, on the other hand, not only do we write data to the drive when
 > NFS tells us to, but we issue a DKIOCFLUSHWRITECACHE ioctl to the
 > underlying device (FLUSH_CACHE on ATA, SYNCHRONIZE_CACHE on SCSI) to
 > ensure that the data that''s supposed to be on the disk is really,
truly
 > on the disk.  This takes typically around 4-6ms, which is quite a while.
 > Again, this dictates the single-threaded NFS performance.
 > 
 > If you want an apples-to-apples comparison, either try the UFS/ZFS tests
 > on a drive that has WCE disabled, or turn off the ZIL on a drive that
 > has WCE enabled.  I''ll bet the difference will be rather slight,
perhaps
 > in favor of ZFS.
 > 
 > 
 > --Bill
 > _______________________________________________
 > zfs-discuss mailing list
 > <a class="moz-txt-link-abbreviated"
href="mailto:zfs-discuss at opensolaris.org">zfs-discuss at
opensolaris.org</a>
 > <a class="moz-txt-link-freetext"
href="http://mail.opensolaris.org/mailman/listinfo/zfs-discuss">http://mail.opensolaris.org/mailman/listinfo/zfs-discuss</a>
 >   </pre>
 > </blockquote>
 > <br>
 > </body>
 > </html>

Calum Mackay

2006-Nov-24 12:24 UTC

head link

[nfs-discuss] Re: [zfs-discuss] poor NFS/ZFS performance

I should perhaps note that my last email on delegation describes the 
optimisations possible under the NFSv4 protocol, as per RFC, all of 
which are not necessarily implemented in our own Solaris client.

In particular, I think that fsync and committed writes do still go 
through to the server, so we may not see that performance enhancements.

We should at least do away with the regular GETATTRs that would 
otherwise make up a lot of the conflicting access detection in NFSv3.

cheers,
c.

zfs discuss - Nov 2006 - poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] Re: poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[zfs-discuss] poor NFS/ZFS performance

[nfs-discuss] Re: [zfs-discuss] poor NFS/ZFS performance