thr3ads.net - zfs discuss - [zfs-discuss] Re: NFS Performance and Tar [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Ben Rockwood

2006-Oct-03 10:03 UTC

[zfs-discuss] Re: NFS Performance and Tar

I was really hoping for some option other than ZIL_DISABLE, but finally gave up
the fight.  Some people suggested NFSv4 helping over NFSv3 but it
didn''t... at least not enough to matter.

ZIL_DISABLE was the solution, sadly.  I''m running B43/X86 and hoping to
get up to 48 or so soonish (I BFU''d it straight to B48 last night and
brick''ed it).

Here are the times.  This is an untar (gtar xfj) of SIDEkick
(http://www.cuddletech.com/blog/pivot/entry.php?id=491) on NFSv4 on a 20TB
RAIDZ2 ZFS Pool:

ZIL Enabled:
real    1m26.941s

ZIL Disabled:
real    0m5.789s


I''ll update this post again when I finally get B48 or newer on the
system and try it.  Thanks to everyone for their suggestions.

benr.
 
 
This message posted from opensolaris.org

eric kustarz

2006-Oct-03 16:18 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

Ben Rockwood wrote:> I was really hoping for some option other than ZIL_DISABLE, but finally
gave up the fight.  Some people suggested NFSv4 helping over NFSv3 but it
didn''t... at least not enough to matter.
> 
> ZIL_DISABLE was the solution, sadly.  I''m running B43/X86 and
hoping to get up to 48 or so soonish (I BFU''d it straight to B48 last
night and brick''ed it).
> 
> Here are the times.  This is an untar (gtar xfj) of SIDEkick
(http://www.cuddletech.com/blog/pivot/entry.php?id=491) on NFSv4 on a 20TB
RAIDZ2 ZFS Pool:
> 
> ZIL Enabled:
> real    1m26.941s
> 
> ZIL Disabled:
> real    0m5.789s
> 
> 
> I''ll update this post again when I finally get B48 or newer on the
system and try it.  Thanks to everyone for their suggestions.
> 
I imagine what''s happening is that tar is a single-threaded application
and it''s basically doing: open, asynchronous write, close.  This will
go
really fast locally.  But for NFS due to the way it does cache 
consistency, on CLOSE, it must make sure that the writes are on stable 
storage, so it does a COMMIT, which basically turns your asynchronous 
write into a synchronous write.  Which means you basically have a 
single-threaded app doing synchronous writes- ~ 1/2 disk rotational 
latency per write.

Check out ''mount_nfs(1M)'' and the ''nocto''
option.  It might be ok for
you to relax the cache consistency for client''s mount as you untar the 
file(s).  Then remount w/out the ''nocto'' option once
you''re done.

Another option is to run multiple untars together.  I''m guessing that 
you''ve got I/O to spare from ZFS''s point of view.

eric

Spencer Shepler

2006-Oct-03 16:23 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

On Tue, eric kustarz wrote:> Ben Rockwood wrote:
> >I was really hoping for some option other than ZIL_DISABLE, but finally
> >gave up the fight.  Some people suggested NFSv4 helping over NFSv3 but
it
> >didn''t... at least not enough to matter.
> >
> >ZIL_DISABLE was the solution, sadly.  I''m running B43/X86 and
hoping to
> >get up to 48 or so soonish (I BFU''d it straight to B48 last
night and
> >brick''ed it).
> >
> >Here are the times.  This is an untar (gtar xfj) of SIDEkick 
> >(http://www.cuddletech.com/blog/pivot/entry.php?id=491) on NFSv4 on a
20TB
> >RAIDZ2 ZFS Pool:
> >
> >ZIL Enabled:
> >real    1m26.941s
> >
> >ZIL Disabled:
> >real    0m5.789s
> >
> >
> >I''ll update this post again when I finally get B48 or newer on
the system
> >and try it.  Thanks to everyone for their suggestions.
> >
> 
> I imagine what''s happening is that tar is a single-threaded
application
> and it''s basically doing: open, asynchronous write, close.  This
will go
> really fast locally.  But for NFS due to the way it does cache 
> consistency, on CLOSE, it must make sure that the writes are on stable 
> storage, so it does a COMMIT, which basically turns your asynchronous 
> write into a synchronous write.  Which means you basically have a 
> single-threaded app doing synchronous writes- ~ 1/2 disk rotational 
> latency per write.
> 
> Check out ''mount_nfs(1M)'' and the
''nocto'' option.  It might be ok for
> you to relax the cache consistency for client''s mount as you untar
the
> file(s).  Then remount w/out the ''nocto'' option once
you''re done.
This will not correct the problem because tar is extracting and therefore
creating files and directories; those creates will be synchronous at
the NFS server and there is no method to change this behavior at the
client.

Spencer
> 
> Another option is to run multiple untars together.  I''m guessing
that
> you''ve got I/O to spare from ZFS''s point of view.
> 
> eric
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Joerg Schilling

2006-Oct-03 18:07 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

eric kustarz <eric.kustarz at sun.com> wrote:
> Ben Rockwood wrote:
> I imagine what''s happening is that tar is a single-threaded
application
> and it''s basically doing: open, asynchronous write, close.  This
will go
> really fast locally.  But for NFS due to the way it does cache 
> consistency, on CLOSE, it must make sure that the writes are on stable 
> storage, so it does a COMMIT, which basically turns your asynchronous 
> write into a synchronous write.  Which means you basically have a 
> single-threaded app doing synchronous writes- ~ 1/2 disk rotational 
> latency per write.
star does write files in a way that grants consistency, GNU tar and
Sun tar do not.

If you like to see the difference, you may try to compare a

star x

with star x -no-fsync

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Roch

2006-Oct-07 18:30 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

Spencer Shepler writes:
 > On Tue, eric kustarz wrote:
 > > Ben Rockwood wrote:
 > > >I was really hoping for some option other than ZIL_DISABLE, but
finally
 > > >gave up the fight.  Some people suggested NFSv4 helping over
NFSv3 but it
 > > >didn''t... at least not enough to matter.
 > > >
 > > >ZIL_DISABLE was the solution, sadly.  I''m running
B43/X86 and hoping to
 > > >get up to 48 or so soonish (I BFU''d it straight to B48
last night and
 > > >brick''ed it).
 > > >
 > > >Here are the times.  This is an untar (gtar xfj) of SIDEkick 
 > > >(http://www.cuddletech.com/blog/pivot/entry.php?id=491) on NFSv4
on a 20TB
 > > >RAIDZ2 ZFS Pool:
 > > >
 > > >ZIL Enabled:
 > > >real    1m26.941s
 > > >
 > > >ZIL Disabled:
 > > >real    0m5.789s
 > > >
 > > >
 > > >I''ll update this post again when I finally get B48 or
newer on the system
 > > >and try it.  Thanks to everyone for their suggestions.
 > > >
 > > 
 > > I imagine what''s happening is that tar is a single-threaded
application
 > > and it''s basically doing: open, asynchronous write, close. 
This will go
 > > really fast locally.  But for NFS due to the way it does cache 
 > > consistency, on CLOSE, it must make sure that the writes are on
stable
 > > storage, so it does a COMMIT, which basically turns your asynchronous
 > > write into a synchronous write.  Which means you basically have a 
 > > single-threaded app doing synchronous writes- ~ 1/2 disk rotational 
 > > latency per write.
 > > 
 > > Check out ''mount_nfs(1M)'' and the
''nocto'' option.  It might be ok for
 > > you to relax the cache consistency for client''s mount as you
untar the
 > > file(s).  Then remount w/out the ''nocto'' option
once you''re done.
 > 
 > This will not correct the problem because tar is extracting and therefore
 > creating files and directories; those creates will be synchronous at
 > the NFS server and there is no method to change this behavior at the
 > client.
 > 
 > Spencer
 > 

Thanks for that (I also thaught nocto would help, now I see
it won''t).

I would   add that  this  is not   a  bug or  deficientcy in
implementation. Any NFS implementation tweak to make ''tar x''
go  as fast as   direct attached will  lead  to silent  data
corruption (tar x   succeeds but the files  don''t  checksum
ok). 

Interestingly the quality of service of ''tar x'' is higher
over NFS than direct attach since, with direct attach,
there is no guarantee that files are set to storage whereas
there is one with NFS.

Net Net, for single threaded ''tar x'', data integrity
consideration forces NFS to provide a high quality slow
service. For direct attach, we don''t have those data
integrity issues, and the community has managed to get by the 
lower quality higher speed service.


 > > 
 > > Another option is to run multiple untars together.  I''m
guessing that
 > > you''ve got I/O to spare from ZFS''s point of view.
 > > 

Or maybe a threaded tar ?

To re-emphasise Eric''s point, this  type of slowness affects
single threaded loads. There is lots of headroom to higher
performance by using concurrency.

-r

 > > eric
 > > 
 > > _______________________________________________
 > > zfs-discuss mailing list
 > > zfs-discuss at opensolaris.org
 > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Joerg Schilling

2006-Oct-09 10:10 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

Roch <Roch.Bourbonnais at Sun.COM> wrote:
> I would   add that  this  is not   a  bug or  deficientcy in
> implementation. Any NFS implementation tweak to make ''tar
x''
> go  as fast as   direct attached will  lead  to silent  data
> corruption (tar x   succeeds but the files  don''t  checksum
> ok). 
>
> Interestingly the quality of service of ''tar x'' is higher
> over NFS than direct attach since, with direct attach,
> there is no guarantee that files are set to storage whereas
> there is one with NFS.
Why do you beieve this?

Neither Sun tar nor GNU tar call fsync which is the only way to 
enforce data integrity over NFS.

If you like to test this, use star. Star by default calls 
fsync before it closes a written file in x mode. To switch this
off, use star -no-fsync.
> Net Net, for single threaded ''tar x'', data integrity
> consideration forces NFS to provide a high quality slow
> service. For direct attach, we don''t have those data
> integrity issues, and the community has managed to get by the 
> lower quality higher speed service.
What dou you have in mind?

A tar that calls fsync in detached threads?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Roch

2006-Oct-09 23:25 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

Joerg Schilling writes:
 > Roch <Roch.Bourbonnais at Sun.COM> wrote:
 > 
 > > I would   add that  this  is not   a  bug or  deficientcy in
 > > implementation. Any NFS implementation tweak to make ''tar
x''
 > > go  as fast as   direct attached will  lead  to silent  data
 > > corruption (tar x   succeeds but the files  don''t  checksum
 > > ok). 
 > >
 > > Interestingly the quality of service of ''tar x'' is
higher
 > > over NFS than direct attach since, with direct attach,
 > > there is no guarantee that files are set to storage whereas
 > > there is one with NFS.
 > 
 > Why do you beieve this?
 > 
 > Neither Sun tar nor GNU tar call fsync which is the only way to 
 > enforce data integrity over NFS.

I tend to agree with this although I''d say that in practice,
from performance perspective,  calling fsync should  be more
relevant for direct attach.   For NFS, doesn''t close-to-open
and other  aspect of the protocol need  to enforce much more
synchronous  operations ?  For tar  x over  nfs I''d bet  the
fsync will be an over-the-wire ops (say  0.5ms) but will not
add an additional I/O latency (5ms) on each file extract.

My target for a single threaded ''tar x'' of small files is to
be able to run over NFS at 1 file per I/O latency, no matter
what the backend FS is. I  guess that ''star -yes-fsync'' over
direct attach  should   behave the same  ?  Or  do you  have
concurrency in there...see below.

 > 
 > If you like to test this, use star. Star by default calls 
 > fsync before it closes a written file in x mode. To switch this
 > off, use star -no-fsync.
 > 
 > > Net Net, for single threaded ''tar x'', data
integrity
 > > consideration forces NFS to provide a high quality slow
 > > service. For direct attach, we don''t have those data
 > > integrity issues, and the community has managed to get by the 
 > > lower quality higher speed service.
 > 
 > What dou you have in mind?
 > 
 > A tar that calls fsync in detached threads?
 > 

You tell me ? We have 2 issues

	can we make ''tar x'' over direct attach, safe (fsync)
	and posix compliant while staying close to current
	performance characteristics ? In other words do we
	have the posix leeway to extract files in parallel ?

	For NFS, can we make ''tar x'' fast and reliable while
	keeping a principle of  least surprise for users  on
	this non-posix FS.

-r


 > J?rg
 > 
 > -- 
 >  EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353
Berlin
 >        js at cs.tu-berlin.de                (uni)  
 >        schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 >  URL:  http://cdrecord.berlios.de/old/private/
ftp://ftp.berlios.de/pub/schily

Spencer Shepler

2006-Oct-09 23:58 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Tue, Roch wrote:> 
> Joerg Schilling writes:
>  > Roch <Roch.Bourbonnais at Sun.COM> wrote:
>  > 
>  > > I would   add that  this  is not   a  bug or  deficientcy in
>  > > implementation. Any NFS implementation tweak to make
''tar x''
>  > > go  as fast as   direct attached will  lead  to silent  data
>  > > corruption (tar x   succeeds but the files  don''t 
checksum
>  > > ok). 
>  > >
>  > > Interestingly the quality of service of ''tar
x'' is higher
>  > > over NFS than direct attach since, with direct attach,
>  > > there is no guarantee that files are set to storage whereas
>  > > there is one with NFS.
>  > 
>  > Why do you beieve this?
>  > 
>  > Neither Sun tar nor GNU tar call fsync which is the only way to 
>  > enforce data integrity over NFS.
> 
> I tend to agree with this although I''d say that in practice,
> from performance perspective,  calling fsync should  be more
> relevant for direct attach.   For NFS, doesn''t close-to-open
> and other  aspect of the protocol need  to enforce much more
> synchronous  operations ?  For tar  x over  nfs I''d bet  the
> fsync will be an over-the-wire ops (say  0.5ms) but will not
> add an additional I/O latency (5ms) on each file extract.
The close-to-open behavior of NFS clients is what ensures that the
file data is on stable storage when close() returns.
The meta-data requirements of NFS is what ensures that file creation,
removal, renames, etc. are on stable storage when the server
sends a response.

So, unless the NFS server is behaving badly, the NFS client has
a synchronous behavior and for some that means more "safe" but
usually means that it is also slower that local access.
> My target for a single threaded ''tar x'' of small files is
to
> be able to run over NFS at 1 file per I/O latency, no matter
> what the backend FS is. I  guess that ''star -yes-fsync''
over
> direct attach  should   behave the same  ?  Or  do you  have
> concurrency in there...see below.
> 
>  > 
>  > If you like to test this, use star. Star by default calls 
>  > fsync before it closes a written file in x mode. To switch this
>  > off, use star -no-fsync.
>  > 
>  > > Net Net, for single threaded ''tar x'', data
integrity
>  > > consideration forces NFS to provide a high quality slow
>  > > service. For direct attach, we don''t have those data
>  > > integrity issues, and the community has managed to get by the 
>  > > lower quality higher speed service.
>  > 
>  > What dou you have in mind?
>  > 
>  > A tar that calls fsync in detached threads?
>  > 
> 
> You tell me ? We have 2 issues
> 
> 	can we make ''tar x'' over direct attach, safe (fsync)
> 	and posix compliant while staying close to current
> 	performance characteristics ? In other words do we
> 	have the posix leeway to extract files in parallel ?
> 
> 	For NFS, can we make ''tar x'' fast and reliable while
> 	keeping a principle of  least surprise for users  on
> 	this non-posix FS.
Having tar create/write/close files concurrently would be a 
big win over NFS mounts on almost any system.

Spencer

Cameron Bahar

2006-Oct-10 02:06 UTC

head link

[zfs-discuss] Re: Re: NFS Performance and Tar

This is correct based on our experience with ZFS. When using NFS v3, you have
COMMIT''s that come down the wire to the ZFS/NFS server which forces a
sync or flush to disk. To get around this issue without compromizing data
integrity you can effectively get some NVRAM by adding a battery-backed hardware
RAID controller to the NFS server and turn on write-back caching for the
controller. This speeds things up for using tar and small files.

There''s a ER open to add NVRAM support to ZFS, but don''t know
the state of that.
 
 
This message posted from opensolaris.org

Frank Batschulat (Home)

2006-Oct-10 04:46 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Tue, 10 Oct 2006 01:25:36 +0200, Roch <Roch.Bourbonnais at Sun.COM>
wrote:
> You tell me ? We have 2 issues
>
> 	can we make ''tar x'' over direct attach, safe (fsync)
> 	and posix compliant while staying close to current
> 	performance characteristics ? In other words do we
> 	have the posix leeway to extract files in parallel ?
why fsync(3C) ? it is usually more heavy weight then
opening the file with O_SYNC - and both provide
POSIX synchronized  file integrity completion.

---
frankB

Sanjeev Bagewadi

2006-Oct-10 05:01 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

I think the original point of NFS being better WRT data making it to the 
disk was that :
NFS follows the SYNC-ON-CLOSE semantics. You will not see an explicit 
fsync() being called
by the tar...

-- Sanjeev.

Frank Batschulat (Home) wrote:
> On Tue, 10 Oct 2006 01:25:36 +0200, Roch <Roch.Bourbonnais at
Sun.COM>
> wrote:
>
>> You tell me ? We have 2 issues
>>
>>     can we make ''tar x'' over direct attach, safe
(fsync)
>>     and posix compliant while staying close to current
>>     performance characteristics ? In other words do we
>>     have the posix leeway to extract files in parallel ?
>
>
> why fsync(3C) ? it is usually more heavy weight then
> opening the file with O_SYNC - and both provide
> POSIX synchronized  file integrity completion.
>
> ---
> frankB
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

Joerg Schilling

2006-Oct-12 10:08 UTC

head link

[zfs-discuss] Re: NFS Performance and Tar

Roch <Roch.Bourbonnais at Sun.COM> wrote:
>  > Neither Sun tar nor GNU tar call fsync which is the only way to 
>  > enforce data integrity over NFS.
>
> I tend to agree with this although I''d say that in practice,
> from performance perspective,  calling fsync should  be more
> relevant for direct attach.   For NFS, doesn''t close-to-open
> and other  aspect of the protocol need  to enforce much more
> synchronous  operations ?  For tar  x over  nfs I''d bet  the
> fsync will be an over-the-wire ops (say  0.5ms) but will not
> add an additional I/O latency (5ms) on each file extract.
I did never test the performance aspects over NFS, as from my 
experience this is the only way to grant detection of file write 
problems. 
> My target for a single threaded ''tar x'' of small files is
to
> be able to run over NFS at 1 file per I/O latency, no matter
> what the backend FS is. I  guess that ''star -yes-fsync''
over
> direct attach  should   behave the same  ?  Or  do you  have
> concurrency in there...see below.
No, star did not change since the last 15 year:

-	One process (the second) read/writes the archive
	In copy mode, this process extract the internal stream.

-	One process (the first) does the file I/O and archive generation.
>  > If you like to test this, use star. Star by default calls 
>  > fsync before it closes a written file in x mode. To switch this
>  > off, use star -no-fsync.
>  > 
>  > > Net Net, for single threaded ''tar x'', data
integrity
>  > > consideration forces NFS to provide a high quality slow
>  > > service. For direct attach, we don''t have those data
>  > > integrity issues, and the community has managed to get by the 
>  > > lower quality higher speed service.
>  > 
>  > What dou you have in mind?
>  > 
>  > A tar that calls fsync in detached threads?
>  > 
>
> You tell me ? We have 2 issues
>
> 	can we make ''tar x'' over direct attach, safe (fsync)
> 	and posix compliant while staying close to current
> 	performance characteristics ? In other words do we
> 	have the posix leeway to extract files in parallel ?
What do you believe how fsync is related to POSIX?

When I did introduce fsync calls 7 years ago, I did
make some performance tests and it seems that on UFS, calling
fsync reduces the extract performance by 10-20% which looks OK
to me.

> 	For NFS, can we make ''tar x'' fast and reliable while
> 	keeping a principle of  least surprise for users  on
> 	this non-posix FS.
Someone should start with star -x vs. star -x -no-fsync
tests and report timings...

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-12 10:23 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Spencer Shepler <spencer.shepler at sun.com> wrote:
> The close-to-open behavior of NFS clients is what ensures that the
> file data is on stable storage when close() returns.
In the 1980s this was definitely not the case. When did this change?

> The meta-data requirements of NFS is what ensures that file creation,
> removal, renames, etc. are on stable storage when the server
> sends a response.
>
> So, unless the NFS server is behaving badly, the NFS client has
> a synchronous behavior and for some that means more "safe" but
> usually means that it is also slower that local access.
In any case, calling fsync before close does nto seem to  be a problem.
> > You tell me ? We have 2 issues
> > 
> > 	can we make ''tar x'' over direct attach, safe
(fsync)
> > 	and posix compliant while staying close to current
> > 	performance characteristics ? In other words do we
> > 	have the posix leeway to extract files in parallel ?
> > 
> > 	For NFS, can we make ''tar x'' fast and reliable
while
> > 	keeping a principle of  least surprise for users  on
> > 	this non-posix FS.
>
> Having tar create/write/close files concurrently would be a 
> big win over NFS mounts on almost any system.
Do you have an idea on how to do this?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-12 10:27 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

"Frank Batschulat (Home)" <Frank.Batschulat at Sun.COM> wrote:
> On Tue, 10 Oct 2006 01:25:36 +0200, Roch <Roch.Bourbonnais at
Sun.COM> wrote:
>
> > You tell me ? We have 2 issues
> >
> > 	can we make ''tar x'' over direct attach, safe
(fsync)
> > 	and posix compliant while staying close to current
> > 	performance characteristics ? In other words do we
> > 	have the posix leeway to extract files in parallel ?
>
> why fsync(3C) ? it is usually more heavy weight then
> opening the file with O_SYNC - and both provide
> POSIX synchronized  file integrity completion.
I believe that I did run tests that show that fsync is better.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Spencer Shepler

2006-Oct-12 14:18 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Thu, Joerg Schilling wrote:> Spencer Shepler <spencer.shepler at sun.com> wrote:
> 
> > The close-to-open behavior of NFS clients is what ensures that the
> > file data is on stable storage when close() returns.
> 
> In the 1980s this was definitely not the case. When did this change?
It has not.  NFS clients have always flushed (written) modified file data 
to the server before returning to the applications close().  The NFS
client also asks that the data be committed to disk in this case.
> > The meta-data requirements of NFS is what ensures that file creation,
> > removal, renames, etc. are on stable storage when the server
> > sends a response.
> >
> > So, unless the NFS server is behaving badly, the NFS client has
> > a synchronous behavior and for some that means more "safe"
but
> > usually means that it is also slower that local access.
> 
> In any case, calling fsync before close does nto seem to  be a problem.
Not for the NFS client because the default behavior has the same
effect as fsync()/close().
> 
> > > You tell me ? We have 2 issues
> > > 
> > > 	can we make ''tar x'' over direct attach, safe
(fsync)
> > > 	and posix compliant while staying close to current
> > > 	performance characteristics ? In other words do we
> > > 	have the posix leeway to extract files in parallel ?
> > > 
> > > 	For NFS, can we make ''tar x'' fast and reliable
while
> > > 	keeping a principle of  least surprise for users  on
> > > 	this non-posix FS.
> >
> > Having tar create/write/close files concurrently would be a 
> > big win over NFS mounts on almost any system.
> 
> Do you have an idea on how to do this?
My naive thought would be to have multiple threads that create and
write file data upon extraction.  This multithreaded behavior would
provide better overall throughput of an extraction given NFS'' response
time characteristics.  More outstanding requests results in better
throughput.  It isn''t only the file data being written to disk that
is the overhead of the extraction, it is the creation of the directories
and files that must also be committed to disk in the case of NFS.
This is the other part that makes things slower than local access.

Spencer

Joerg Schilling

2006-Oct-12 14:35 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Spencer Shepler <spencer.shepler at sun.com> wrote:
> On Thu, Joerg Schilling wrote:
> > Spencer Shepler <spencer.shepler at sun.com> wrote:
> > 
> > > The close-to-open behavior of NFS clients is what ensures that
the
> > > file data is on stable storage when close() returns.
> > 
> > In the 1980s this was definitely not the case. When did this change?
>
> It has not.  NFS clients have always flushed (written) modified file data 
> to the server before returning to the applications close().  The NFS
> client also asks that the data be committed to disk in this case.
This is definitely wrong.

Our developers did loose many files in the 1980s when the NFS file server
did fill up the exported filesystem while several NFS clients did try to
write back edited files at the same time.

VI at that time did not call fsync and for this reason did not notice that
the file could not be written back properly.

What happens: All client did call statfs() and did asume that there is 
still space on the server. They all did allow to put blocks into the local
clients buffer cache. VI did call close, but the client did notice the
no space problem after the close did return and VI did not notice that the
file was damaged and allowed the user to quit VI.

Some time later, Sun did enhance VI to first call fsync() and then call
close(). Only if both return 0, the file is granted to be on the server.
Sun also did inform us to write applications this way in order to prevent
lost file content.

> > > Having tar create/write/close files concurrently would be a 
> > > big win over NFS mounts on almost any system.
> > 
> > Do you have an idea on how to do this?
>
> My naive thought would be to have multiple threads that create and
> write file data upon extraction.  This multithreaded behavior would
> provide better overall throughput of an extraction given NFS''
response
> time characteristics.  More outstanding requests results in better
> throughput.  It isn''t only the file data being written to disk
that
> is the overhead of the extraction, it is the creation of the directories
> and files that must also be committed to disk in the case of NFS.
> This is the other part that makes things slower than local access.
Doing this with tar (which fetches the data from a serial data stream)
would only make sense in case that there will be threads that only have the task
to wait for a final fsync()/close().

It would also make it harder to implement error control as it may be that 
a problem is detected late while another large file is being extracted.
Star could not just quit with an error message but would need to delay the
error caused exit.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Spencer Shepler

2006-Oct-12 15:24 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Thu, Joerg Schilling wrote:> Spencer Shepler <spencer.shepler at sun.com> wrote:
> 
> > On Thu, Joerg Schilling wrote:
> > > Spencer Shepler <spencer.shepler at sun.com> wrote:
> > > 
> > > > The close-to-open behavior of NFS clients is what ensures
that the
> > > > file data is on stable storage when close() returns.
> > > 
> > > In the 1980s this was definitely not the case. When did this
change?
> >
> > It has not.  NFS clients have always flushed (written) modified file
data
> > to the server before returning to the applications close().  The NFS
> > client also asks that the data be committed to disk in this case.
> 
> This is definitely wrong.
> 
> Our developers did loose many files in the 1980s when the NFS file server
> did fill up the exported filesystem while several NFS clients did try to
> write back edited files at the same time.
> 
> VI at that time did not call fsync and for this reason did not notice that
> the file could not be written back properly.
> 
> What happens: All client did call statfs() and did asume that there is 
> still space on the server. They all did allow to put blocks into the local
> clients buffer cache. VI did call close, but the client did notice the
> no space problem after the close did return and VI did not notice that the
> file was damaged and allowed the user to quit VI.
> 
> Some time later, Sun did enhance VI to first call fsync() and then call
> close(). Only if both return 0, the file is granted to be on the server.
> Sun also did inform us to write applications this way in order to prevent
> lost file content.
I didn''t comment on the error conditions that can occur during
the writing of data upon close().  What you describe is the preferred
method of obtaining any errors that occur during the writing of data.
This occurs because the NFS client is writing asynchronously and the
only method the application has of retrieving the error information
is from the fsync() or close() call.  At close(), it is to late
to recovery so fsync() can be used to obtain any asynchronous error
state.

This doesn''t change the fact that upon close() the NFS client will
write data back to the server.  This is done to meet the
close-to-open semantics of NFS.

> > > > Having tar create/write/close files concurrently would be a 
> > > > big win over NFS mounts on almost any system.
> > > 
> > > Do you have an idea on how to do this?
> >
> > My naive thought would be to have multiple threads that create and
> > write file data upon extraction.  This multithreaded behavior would
> > provide better overall throughput of an extraction given NFS''
response
> > time characteristics.  More outstanding requests results in better
> > throughput.  It isn''t only the file data being written to
disk that
> > is the overhead of the extraction, it is the creation of the
directories
> > and files that must also be committed to disk in the case of NFS.
> > This is the other part that makes things slower than local access.
> 
> Doing this with tar (which fetches the data from a serial data stream)
> would only make sense in case that there will be threads that only have the
task
> to wait for a final fsync()/close().
> 
> It would also make it harder to implement error control as it may be that 
> a problem is detected late while another large file is being extracted.
> Star could not just quit with an error message but would need to delay the
> error caused exit.
Sure, I can see that it would be difficult.  My point is that tar is
not only waiting upon the fsync()/close() but also on file and directory
creation.  There is a longer delay not only because of the network
latency but also the latency to writing the filesystem data to
stable storage.  Parallel requests will tend to overcome the delay/bandwidth
issues.  Not easy but can be an advantage with respect to performance.

Spencer

Anton B. Rang

2006-Oct-12 21:42 UTC

head link

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

fsync() should theoretically be better because O_SYNC requires that each write()
include writing not only the data but also the inode and all indirect blocks
back to the disk.
 
 
This message posted from opensolaris.org

Neil Perrin

2006-Oct-13 04:56 UTC

head link

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

As far as zfs performance is concerned, O_DSYNC and O_SYNC are equivalent.
This is because, zfs saves all posix layer transactions (eg WRITE,
SETATTR, RENAME...) in the log. So both meta data and data is always
re-created if a replay is needed.

Anton B. Rang wrote On 10/12/06 15:42,:> fsync() should theoretically be better because O_SYNC requires that each
write() include writing not only the data but also the inode and all indirect
blocks back to the disk.
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Joerg Schilling

2006-Oct-13 09:46 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Spencer Shepler <spencer.shepler at sun.com> wrote:
> I didn''t comment on the error conditions that can occur during
> the writing of data upon close().  What you describe is the preferred
> method of obtaining any errors that occur during the writing of data.
> This occurs because the NFS client is writing asynchronously and the
> only method the application has of retrieving the error information
> is from the fsync() or close() call.  At close(), it is to late
> to recovery so fsync() can be used to obtain any asynchronous error
> state.
>
> This doesn''t change the fact that upon close() the NFS client will
> write data back to the server.  This is done to meet the
> close-to-open semantics of NFS.
Your working did not match with the reality, this is why I did write this.
You did write that upon close() the client will first do something similar to 
fsync on that file. The problem is that this is done asynchronously and the
close() return value does noo contain an indication on whether the fsync
did succeed.

> > It would also make it harder to implement error control as it may be
that
> > a problem is detected late while another large file is being
extracted.
> > Star could not just quit with an error message but would need to delay
the
> > error caused exit.
>
> Sure, I can see that it would be difficult.  My point is that tar is
> not only waiting upon the fsync()/close() but also on file and directory
> creation.  There is a longer delay not only because of the network
> latency but also the latency to writing the filesystem data to
> stable storage.  Parallel requests will tend to overcome the
delay/bandwidth
> issues.  Not easy but can be an advantage with respect to performance.
I see no simple way to let tar implement concurrenty with respect to these 
problems. In star, it would be possible to create detached threads that
work independently on small files that in sum are smaller than the size of the 
FIFO. This would however make the code much more complex.


J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Spencer Shepler

2006-Oct-13 14:01 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Fri, Joerg Schilling wrote:> Spencer Shepler <spencer.shepler at sun.com> wrote:
> 
> > I didn''t comment on the error conditions that can occur
during
> > the writing of data upon close().  What you describe is the preferred
> > method of obtaining any errors that occur during the writing of data.
> > This occurs because the NFS client is writing asynchronously and the
> > only method the application has of retrieving the error information
> > is from the fsync() or close() call.  At close(), it is to late
> > to recovery so fsync() can be used to obtain any asynchronous error
> > state.
> >
> > This doesn''t change the fact that upon close() the NFS client
will
> > write data back to the server.  This is done to meet the
> > close-to-open semantics of NFS.
> 
> Your working did not match with the reality, this is why I did write this.
> You did write that upon close() the client will first do something similar
to
> fsync on that file. The problem is that this is done asynchronously and the
> close() return value does noo contain an indication on whether the fsync
> did succeed.
Sorry, the code in Solaris would behave as I described.  Upon the 
application closing the file, modified data is written to the server.
The client waits for completion of those writes.  If there is an error,
it is returned to the caller of close().

Spencer

Jeff Victor

2006-Oct-13 14:08 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Spencer Shepler wrote:> On Fri, Joerg Schilling wrote:
> 
>>>This doesn''t change the fact that upon close() the NFS
client will
>>>write data back to the server.  This is done to meet the
>>>close-to-open semantics of NFS.
>>
>>Your working did not match with the reality, this is why I did write
this.
>>You did write that upon close() the client will first do something
similar to
>>fsync on that file. The problem is that this is done asynchronously and
the
>>close() return value does noo contain an indication on whether the fsync
>>did succeed.
> 
> Sorry, the code in Solaris would behave as I described.  Upon the 
> application closing the file, modified data is written to the server.
> The client waits for completion of those writes.  If there is an error,
> it is returned to the caller of close().
Are you talking about the client-end of NFS, as implemented in Solaris, or the 
"application-clients" like vi?

It seems to me that you are talking about Solaris, and Joerg is talking about vi
(and other applications).



--------------------------------------------------------------------------
Jeff VICTOR              Sun Microsystems            jeff.victor @ sun.com
OS Ambassador            Sr. Technical Specialist
Solaris 10 Zones FAQ:    http://www.opensolaris.org/os/community/zones/faq
--------------------------------------------------------------------------

Joerg Schilling

2006-Oct-13 14:11 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Spencer Shepler <spencer.shepler at sun.com> wrote:
> Sorry, the code in Solaris would behave as I described.  Upon the 
> application closing the file, modified data is written to the server.
> The client waits for completion of those writes.  If there is an error,
> it is returned to the caller of close().
So is this Solaris specific, or why are people warned to depend on the close()
return code only?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-13 14:12 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Jeff Victor <Jeff.Victor at Sun.COM> wrote:
> >>Your working did not match with the reality, this is why I did
write this.
> >>You did write that upon close() the client will first do something
similar to
> >>fsync on that file. The problem is that this is done asynchronously
and the
> >>close() return value does noo contain an indication on whether the
fsync
> >>did succeed.
> > 
> > Sorry, the code in Solaris would behave as I described.  Upon the 
> > application closing the file, modified data is written to the server.
> > The client waits for completion of those writes.  If there is an
error,
> > it is returned to the caller of close().
>
> Are you talking about the client-end of NFS, as implemented in Solaris, or
the
> "application-clients" like vi?
>
> It seems to me that you are talking about Solaris, and Joerg is talking
about vi
> (and other applications).
I am talking about the syscall interface to applications.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Spencer Shepler

2006-Oct-13 14:12 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Fri, Jeff Victor wrote:> Spencer Shepler wrote:
> >On Fri, Joerg Schilling wrote:
> >
> >>>This doesn''t change the fact that upon close() the NFS
client will
> >>>write data back to the server.  This is done to meet the
> >>>close-to-open semantics of NFS.
> >>
> >>Your working did not match with the reality, this is why I did
write this.
> >>You did write that upon close() the client will first do something 
> >>similar to fsync on that file. The problem is that this is done 
> >>asynchronously and the
> >>close() return value does noo contain an indication on whether the
fsync
> >>did succeed.
> >
> >Sorry, the code in Solaris would behave as I described.  Upon the 
> >application closing the file, modified data is written to the server.
> >The client waits for completion of those writes.  If there is an error,
> >it is returned to the caller of close().
> 
> Are you talking about the client-end of NFS, as implemented in Solaris, or 
> the "application-clients" like vi?
> 
> It seems to me that you are talking about Solaris, and Joerg is talking 
> about vi (and other applications).
NFS client.

Spencer

Spencer Shepler

2006-Oct-13 14:22 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

On Fri, Joerg Schilling wrote:> Spencer Shepler <spencer.shepler at sun.com> wrote:
> 
> > Sorry, the code in Solaris would behave as I described.  Upon the 
> > application closing the file, modified data is written to the server.
> > The client waits for completion of those writes.  If there is an
error,
> > it is returned to the caller of close().
> 
> So is this Solaris specific, or why are people warned to depend on the
close()
> return code only?
All unix NFS clients that I know of behave the way I described.

I believe the warning about relying on close() is that by the time
the application receives the error it is too late to recover.

If the application uses fsync() and receives an error, the application
can warn the user and they may be able to do something about it (your
example of ENOSPC is a very good one).  Space can be freed, and the
fsync() can be done again and the client will again push the writes
to the server and be successful.

If an application doesn''t care about recovery but wants the error to
report back to the user, then close() is sufficient.

Spencer

Anton B. Rang

2006-Oct-13 19:45 UTC

head link

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

For what it''s worth, close-to-open consistency was added to Linux NFS
in the 2.4.20 kernel (late 2002 timeframe). This might be the source of some of
the confusion.
 
 
This message posted from opensolaris.org

Roch

2006-Oct-14 01:41 UTC

head link

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

The high order bit here is that 

	write();
	write();
	fsync();

can be   executed  using a single   I/O  latency (during the
fsync) whereas  using  O_*DSYNC, will require  2 I/O latency
(one for each write).

-r

Neil Perrin writes:
 > As far as zfs performance is concerned, O_DSYNC and O_SYNC are equivalent.
 > This is because, zfs saves all posix layer transactions (eg WRITE,
 > SETATTR, RENAME...) in the log. So both meta data and data is always
 > re-created if a replay is needed.
 > 
 > Anton B. Rang wrote On 10/12/06 15:42,:
 > > fsync() should theoretically be better because O_SYNC requires that
each write() include writing not only the data but also the inode and all
indirect blocks back to the disk.
 > >  
 > >  
 > > This message posted from opensolaris.org
 > > _______________________________________________
 > > zfs-discuss mailing list
 > > zfs-discuss at opensolaris.org
 > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Erblichs

2006-Oct-14 01:45 UTC

head link

[zfs-discuss] zfs_vfsops.c : zfs_vfsinit() : line 1179: Src inspection

Group,

	If their is a bad vfs ops template, why
	wouldn''t you just return(error) versus
	trying to create the vnode ops template?

	My suggestion is after the cmn_err()
	then			   return(error);

	Mitchell Erblich
	-------------------

Richard L. Hamilton

2007-Jan-10 16:25 UTC

head link

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

Assuming multiple threads extracting multiple files
simultaneously from a single archive, won''t the
results be indeterminate if a quota or out-of-space
condition comes into play?  That is, normally, the
sequentially earlier file might get the space while
the later one wouldn''t; but if multithreaded, either
(or both) could fail to get all they needed.

That doesn''t even consider I/O errors on either the
archive or the files being extracted.

What about programs that postprocess the messages from
tar?  (there must be some, meant to give it a "friendlier"
interface or for other purposes).


This message posted from opensolaris.org
_______________________________________________
nfs-discuss mailing list
nfs-discuss@opensolaris.org

Siegfried Nikolaivich

2007-Jun-11 01:33 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

This is an old topic, discussed many times at length.  However, I  
still wonder if there are any workarounds to this issue except  
disabling ZIL, since it makes ZFS over NFS almost unusable (it''s a  
whole magnitude slower).  My understanding is that the ball is in the  
hands of NFS due to ZFS''s design.  The testing results are below.


Solaris 10u3 AMD64 server with Mac client over gigabit ethernet.  The  
filesystem is on a 6 disk raidz1 pool, testing the performance of  
untarring (with bzip2) the Linux 2.6.21 source code.  The archive is  
stored locally and extracted remotely.

Locally
-------
tar xfvj linux-2.6.21.tar.bz2
real    4m4.094s,	user    0m44.732s,	sys     0m26.047s

star xfv linux-2.6.21.tar.bz2
real    1m47.502s,	user    0m38.573s,	sys     0m22.671s

Over NFS
--------
tar xfvj linux-2.6.21.tar.bz2
real    48m22.685s,	user    0m45.703s,	sys     0m59.264s

star xfv linux-2.6.21.tar.bz2
real    49m13.574s,	user    0m38.996s,	sys     0m35.215s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real    49m32.127s,	user    0m38.454s,	sys     0m36.197s


The performance seems pretty bad, lets see how other protocols fare.

Over Samba
----------
tar xfvj linux-2.6.21.tar.bz2
real    4m34.952s,	user    0m44.325s,	sys     0m27.404s

star xfv linux-2.6.21.tar.bz2
real    4m2.998s,	user    0m44.121s,	sys     0m29.214s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real    4m13.352s,	user    0m44.239s,	sys     0m29.547s

Over AFP
--------
tar xfvj linux-2.6.21.tar.bz2
real    3m58.405s,	user    0m43.132s,	sys     0m40.847s

star xfv linux-2.6.21.tar.bz2
real    19m44.212s,	user    0m38.535s,	sys     0m38.866s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real    3m21.976s,	user    0m42.529s,	sys     0m39.529s


Samba and AFP are much faster, except the fsync''ed star over AFP.  Is  
this a ZFS or NFS issue?

Over NFS to non-ZFS drive
-------------------------
tar xfvj linux-2.6.21.tar.bz2
real    5m0.211s,	user    0m45.330s,	sys     0m50.118s

star xfv linux-2.6.21.tar.bz2
real    3m26.053s,	user    0m43.069s,	sys     0m33.726s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real    3m55.522s,	user    0m42.749s,	sys     0m35.294s

It looks like ZFS is the culprit here.  The untarring is much faster  
to a single 80 GB UFS drive than a 6 disk raid-z array over NFS.


Cheers,
Siegfried


PS. Getting netatalk to compile on amd64 Solaris required some  
changes since i386 wasn''t being defined anymore, and somehow it  
thought the architecture was sparc64 for some linking steps.

Roch - PAE

2007-Jun-12 07:57 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

Hi Seigfried, just making sure you had seen this:

	http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 

You have very fast NFS to non-ZFS runs.

That seems only possible if the  hosting OS did not sync the
data when NFS required it or the  drive in question had some
fast write caches.  If the drive did  have some FWC and  ZFS
was  still slow  using them,  that  would be  the issue with
flushing mention in the blog entry.

but also maybe there   is something to  be learned  from the
Samba and AFP results...

Takeaways:

	ZFS and NFS just work together.

	ZFS has an open issue with some storage array (the
	issue  is *not* related   to NFS); it''s being worked
	on. Will need collaboration from storage vendors.

	NFS is slower than direct attached. Can be very very
	much slower on single threaded loads.

	There are many ways to workaround the slowness but most
	are just not safe for your data.

-r



Siegfried Nikolaivich writes:
 > This is an old topic, discussed many times at length.  However, I  
 > still wonder if there are any workarounds to this issue except  
 > disabling ZIL, since it makes ZFS over NFS almost unusable (it''s
a
 > whole magnitude slower).  My understanding is that the ball is in the  
 > hands of NFS due to ZFS''s design.  The testing results are below.
 > 
 > 
 > Solaris 10u3 AMD64 server with Mac client over gigabit ethernet.  The  
 > filesystem is on a 6 disk raidz1 pool, testing the performance of  
 > untarring (with bzip2) the Linux 2.6.21 source code.  The archive is  
 > stored locally and extracted remotely.
 > 
 > Locally
 > -------
 > tar xfvj linux-2.6.21.tar.bz2
 > real    4m4.094s,	user    0m44.732s,	sys     0m26.047s
 > 
 > star xfv linux-2.6.21.tar.bz2
 > real    1m47.502s,	user    0m38.573s,	sys     0m22.671s
 > 
 > Over NFS
 > --------
 > tar xfvj linux-2.6.21.tar.bz2
 > real    48m22.685s,	user    0m45.703s,	sys     0m59.264s
 > 
 > star xfv linux-2.6.21.tar.bz2
 > real    49m13.574s,	user    0m38.996s,	sys     0m35.215s
 > 
 > star -no-fsync -x -v -f linux-2.6.21.tar.bz2
 > real    49m32.127s,	user    0m38.454s,	sys     0m36.197s
 > 
 > 
 > The performance seems pretty bad, lets see how other protocols fare.
 > 
 > Over Samba
 > ----------
 > tar xfvj linux-2.6.21.tar.bz2
 > real    4m34.952s,	user    0m44.325s,	sys     0m27.404s
 > 
 > star xfv linux-2.6.21.tar.bz2
 > real    4m2.998s,	user    0m44.121s,	sys     0m29.214s
 > 
 > star -no-fsync -x -v -f linux-2.6.21.tar.bz2
 > real    4m13.352s,	user    0m44.239s,	sys     0m29.547s
 > 
 > Over AFP
 > --------
 > tar xfvj linux-2.6.21.tar.bz2
 > real    3m58.405s,	user    0m43.132s,	sys     0m40.847s
 > 
 > star xfv linux-2.6.21.tar.bz2
 > real    19m44.212s,	user    0m38.535s,	sys     0m38.866s
 > 
 > star -no-fsync -x -v -f linux-2.6.21.tar.bz2
 > real    3m21.976s,	user    0m42.529s,	sys     0m39.529s
 > 
 > 
 > Samba and AFP are much faster, except the fsync''ed star over AFP.
Is
 > this a ZFS or NFS issue?
 > 
 > Over NFS to non-ZFS drive
 > -------------------------
 > tar xfvj linux-2.6.21.tar.bz2
 > real    5m0.211s,	user    0m45.330s,	sys     0m50.118s
 > 
 > star xfv linux-2.6.21.tar.bz2
 > real    3m26.053s,	user    0m43.069s,	sys     0m33.726s
 > 
 > star -no-fsync -x -v -f linux-2.6.21.tar.bz2
 > real    3m55.522s,	user    0m42.749s,	sys     0m35.294s
 > 
 > It looks like ZFS is the culprit here.  The untarring is much faster  
 > to a single 80 GB UFS drive than a 6 disk raid-z array over NFS.
 > 
 > 
 > Cheers,
 > Siegfried
 > 
 > 
 > PS. Getting netatalk to compile on amd64 Solaris required some  
 > changes since i386 wasn''t being defined anymore, and somehow it  
 > thought the architecture was sparc64 for some linking steps.
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

eric kustarz

2007-Jun-12 16:02 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

>
> Over NFS to non-ZFS drive
> -------------------------
> tar xfvj linux-2.6.21.tar.bz2
> real    5m0.211s,	user    0m45.330s,	sys     0m50.118s
>
> star xfv linux-2.6.21.tar.bz2
> real    3m26.053s,	user    0m43.069s,	sys     0m33.726s
>
> star -no-fsync -x -v -f linux-2.6.21.tar.bz2
> real    3m55.522s,	user    0m42.749s,	sys     0m35.294s
>
> It looks like ZFS is the culprit here.  The untarring is much  
> faster to a single 80 GB UFS drive than a 6 disk raid-z array over  
> NFS.
>
Comparing a ZFS pool made out of a single disk to a single UFS  
filesystem would be a fair comparison.

What does your storage look like?

eric

eric kustarz

2007-Jun-12 16:12 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

On Jun 12, 2007, at 12:57 AM, Roch - PAE wrote:
>
> Hi Seigfried, just making sure you had seen this:
>
> 	http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
>
> You have very fast NFS to non-ZFS runs.
>
> That seems only possible if the  hosting OS did not sync the
> data when NFS required it or the  drive in question had some
> fast write caches.  If the drive did  have some FWC and  ZFS
> was  still slow  using them,  that  would be  the issue with
> flushing mention in the blog entry.
>
> but also maybe there   is something to  be learned  from the
> Samba and AFP results...
>
> Takeaways:
>
> 	ZFS and NFS just work together.
>
> 	ZFS has an open issue with some storage array (the
> 	issue  is *not* related   to NFS); it''s being worked
> 	on. Will need collaboration from storage vendors.
>
> 	NFS is slower than direct attached. Can be very very
> 	much slower on single threaded loads.
Roch knows this, but just to point out for others following the  
discussion...

In this case (single threaded file creates) NFS is slower.  However,  
NFS can go at 1Gbe wirespeed, which can be faster than your disks  
(depending how many spindles you have and if you''ve striped them for  
performance).
>
> 	There are many ways to workaround the slowness but most
> 	are just not safe for your data.
Yeah, the samba numbers were interesting... so i guess its ok in CIFS  
for the client to be out of sync with the server?  That is, i wonder  
how they handle the case where the client creates a file, server  
replies ok w/out the data/metadata going to stable storage, server  
crashes, comes back up, created file is not on stable storage but the  
client (and its app) thinks it exists...

I really would like to know the details of CIFS behavior compared to  
NFS...

eric

Neil.Perrin at Sun.COM

2007-Jun-12 16:37 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

eric kustarz wrote:>>
>> Over NFS to non-ZFS drive
>> -------------------------
>> tar xfvj linux-2.6.21.tar.bz2
>> real    5m0.211s,    user    0m45.330s,    sys     0m50.118s
>>
>> star xfv linux-2.6.21.tar.bz2
>> real    3m26.053s,    user    0m43.069s,    sys     0m33.726s
>>
>> star -no-fsync -x -v -f linux-2.6.21.tar.bz2
>> real    3m55.522s,    user    0m42.749s,    sys     0m35.294s
>>
>> It looks like ZFS is the culprit here.  The untarring is much  faster 
>> to a single 80 GB UFS drive than a 6 disk raid-z array over  NFS.
>>
> 
> Comparing a ZFS pool made out of a single disk to a single UFS  
> filesystem would be a fair comparison.
Right, and to be fairer you need to ensure the disk write cache is disabled
(format -e) when testing ufs (as ufs does no flushing of the cache).

Siegfried Nikolaivich

2007-Jun-14 04:22 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

On 12-Jun-07, at 9:02 AM, eric kustarz wrote:> Comparing a ZFS pool made out of a single disk to a single UFS  
> filesystem would be a fair comparison.
>
> What does your storage look like?
The storage looks like:

         NAME        STATE     READ WRITE CKSUM
         tank        ONLINE       0     0     0
           raidz1    ONLINE       0     0     0
             c0t0d0  ONLINE       0     0     0
             c0t1d0  ONLINE       0     0     0
             c0t2d0  ONLINE       0     0     0
             c0t4d0  ONLINE       0     0     0
             c0t5d0  ONLINE       0     0     0
             c0t6d0  ONLINE       0     0     0

All disks are local SATA/300 drives with SATA framework on marvell  
card.  The SATA drives are consumer drives with 16MB cache.

I agree it''s not a fair comparison, especially with raidz over 6  
drives.  However, a performance difference of 10x is fairly large.

I do not have a single drive available to test ZFS with and compare  
it to UFS, but I have done similar tests in the past with one ZFS  
drive without write cache, etc. vs. a UFS drive of the same brand/ 
size.  The difference was still on the order of 10x slower for the  
ZFS drive over NFS.  What could cause such a large difference?  Is  
there a way to measure NFS_COMMIT latency?

Cheers,
Siegfried

eric kustarz

2007-Jun-14 23:07 UTC

head link

[zfs-discuss] NFS and Tar/Star Performance

On Jun 13, 2007, at 9:22 PM, Siegfried Nikolaivich wrote:
>
> On 12-Jun-07, at 9:02 AM, eric kustarz wrote:
>> Comparing a ZFS pool made out of a single disk to a single UFS  
>> filesystem would be a fair comparison.
>>
>> What does your storage look like?
>
> The storage looks like:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c0t0d0  ONLINE       0     0     0
>             c0t1d0  ONLINE       0     0     0
>             c0t2d0  ONLINE       0     0     0
>             c0t4d0  ONLINE       0     0     0
>             c0t5d0  ONLINE       0     0     0
>             c0t6d0  ONLINE       0     0     0
>
> All disks are local SATA/300 drives with SATA framework on marvell  
> card.  The SATA drives are consumer drives with 16MB cache.
>
> I agree it''s not a fair comparison, especially with raidz over 6  
> drives.  However, a performance difference of 10x is fairly large.
>
> I do not have a single drive available to test ZFS with and compare  
> it to UFS, but I have done similar tests in the past with one ZFS  
> drive without write cache, etc. vs. a UFS drive of the same brand/ 
> size.  The difference was still on the order of 10x slower for the  
> ZFS drive over NFS.  What could cause such a large difference?  Is  
> there a way to measure NFS_COMMIT latency?
>
You should do the comparison on a single drive.  For ZFS, enable the  
write cache as its safe to do so.  For UFS, disable the write cache.

Make sure you''re on non-debug bits.

eric

zfs discuss - Oct 2006 - Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

[zfs-discuss] zfs_vfsops.c : zfs_vfsinit() : line 1179: Src inspection

[nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

[zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance