thr3ads.net - dtrace discuss - [dtrace-discuss] Script to trace NFSv3 client operations [May 2006]

If this information is useful, please help other people find it:
Share via:

Matty

2006-May-03 22:19 UTC

[dtrace-discuss] Script to trace NFSv3 client operations

Howdy,

I have been doing a good deal of NFS debugging in the past month, and 
created an NFSv3 client operation tracer to simplify my life:

$ nfstrace
Executable Operation    Type     Time    Size   Path
mkdir      nfs3_lookup  physical 359953  N/A    /opt/nfs/htdocs/test
mkdir      nfs3_getattr logical  7577    N/A    /opt/nfs/htdocs/test
mkdir      nfs3_mkdir   physical 843500  N/A    /opt/nfs/htdocs/test/test
rmdir      nfs3_access  logical  19772   N/A    /opt/nfs/htdocs/test
rmdir      nfs3_lookup  logical  69222   N/A    /opt/nfs/htdocs/test/test
rmdir      nfs3_access  logical  7744    N/A    /opt/nfs/htdocs/test
cat        nfs3_read    logical  54848   8192   /opt/nfs/htdocs/test/1
cat        nfs3_read    logical  10082   8192   /opt/nfs/htdocs/test/1
bash       nfs3_access  logical  19707   N/A    /opt/nfs/htdocs/test
bash       nfs3_lookup  logical  66602   N/A    /opt/nfs/htdocs/test/.
bash       nfs3_access  logical  7740    N/A    /opt/nfs/htdocs/test
cp         nfs3_write   physical 38821   3935   /opt/nfs/htdocs/test/99

I created a pretty comprehensive test suite to validate that the script 
works as advertised, but there are edge cases (e.g., writes and 
readahead) where the physical vs. logical output may be slightly off. 
That said, you can snag the script from my website if you think it''s 
useful:

http://daemons.net/~matty

Hope everyone is having a nice day,
- Ryan
--
UNIX Administrator
http://daemons.net/~matty

Joe Little

2006-May-04 02:16 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Are there any scripts to track NFS server performance? More
specifically, I''m seeing terrible performance from Linux 2.6 clients
against Nevada (B38) with a ZFS shared pool, but only with directorys
with many small files. The performance for large files seems
reasonable (20MB/sec) but small drops down to low k''s/sec on copying
over NFS.


On 5/3/06, Matty <matty91 at gmail.com> wrote:>
> Howdy,
>
> I have been doing a good deal of NFS debugging in the past month, and
> created an NFSv3 client operation tracer to simplify my life:
>
> $ nfstrace
> Executable Operation    Type     Time    Size   Path
> mkdir      nfs3_lookup  physical 359953  N/A    /opt/nfs/htdocs/test
> mkdir      nfs3_getattr logical  7577    N/A    /opt/nfs/htdocs/test
> mkdir      nfs3_mkdir   physical 843500  N/A    /opt/nfs/htdocs/test/test
> rmdir      nfs3_access  logical  19772   N/A    /opt/nfs/htdocs/test
> rmdir      nfs3_lookup  logical  69222   N/A    /opt/nfs/htdocs/test/test
> rmdir      nfs3_access  logical  7744    N/A    /opt/nfs/htdocs/test
> cat        nfs3_read    logical  54848   8192   /opt/nfs/htdocs/test/1
> cat        nfs3_read    logical  10082   8192   /opt/nfs/htdocs/test/1
> bash       nfs3_access  logical  19707   N/A    /opt/nfs/htdocs/test
> bash       nfs3_lookup  logical  66602   N/A    /opt/nfs/htdocs/test/.
> bash       nfs3_access  logical  7740    N/A    /opt/nfs/htdocs/test
> cp         nfs3_write   physical 38821   3935   /opt/nfs/htdocs/test/99
>
> I created a pretty comprehensive test suite to validate that the script
> works as advertised, but there are edge cases (e.g., writes and
> readahead) where the physical vs. logical output may be slightly off.
> That said, you can snag the script from my website if you think
it''s
> useful:
>
> http://daemons.net/~matty
>
> Hope everyone is having a nice day,
> - Ryan
> --
> UNIX Administrator
> http://daemons.net/~matty
> _______________________________________________
> nfs-discuss mailing list
> nfs-discuss at opensolaris.org
>

Bill Sommerfeld

2006-May-04 12:22 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Wed, 2006-05-03 at 22:16, Joe Little wrote:> Are there any scripts to track NFS server performance? More
> specifically, I''m seeing terrible performance from Linux 2.6
clients
> against Nevada (B38) with a ZFS shared pool, but only with directorys
> with many small files. The performance for large files seems
> reasonable (20MB/sec) but small drops down to low k''s/sec on
copying
> over NFS.
Perhaps not your problem but yesterday, Dave Miner and I traced some
astoundingly poor NFS-over-ZFS performance on the Netra X1 to an
apparent bug in sparc ATA driver code.

The interactive response on the server while a tar extract (of the
firefox source tarball, so lots of small files) was running on a
(solaris) client reminded me of what it was like to use an overloaded
timeshared TOPS-20 system the night before problem sets were due...

tar was writing about one file per second.

mpstat or vmstat (I forget which one I used) showed 100% system time.

It took a few tries (one failed because dtrace aborted due to systemic
unresponsiveness; the second started right before tar finished and gave
us a nice profile of the idle loop..), but eventually we got useful data
from:

# lockstat -s 10 -I sleep 10

The system was so overloaded that "sleep 10" took 22.7 seconds

essentially all cpu time was spent in stack traces resembling:

Count indv cuml rcnt     nsec CPU+PIL                Caller
  1004  42%  42% 0.00     3322 cpu[0]                 drv_usecwait+0x78

       nsec ------ Time Distribution ------ count     Stack
       4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  975       ata_wait+0x90
       8192 |                               22       
acersb_get_intr_status+0x34
      16384 |                               7         ata_set_feature+0x124
                                                      ata_disk_start+0x15c
                                                      ata_hba_start+0xbc

The driver is busy-waiting in drv_usecwait() for long periods of time,
and my suspicion is that this is most likely an artifact of ZFS''s
regular disk write cache flushes. 

we filed:

6421427 netra x1 slagged by NFS over ZFS leading to long spins in the
ATA driver code

(if you see unreasonably slow performance, essentially no idle time, and
high system time when you think you shouldn''t, the above use of
lockstat''s profiling mechanism or dtrace equivalents thereof
aggregating
on stack() will point out where in the code the kernel is spinning its
wheels.)

					- Bill

Joe Little

2006-May-04 15:16 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

In my case, I''m using an iscsi backend. However, doing any operations
locally are fast via ZFS, even with small files. If I setup an rsyncd
service, clients can also write small files relatively fast
(3-5MB/sec), but over NFS its 5-64K/sec. So, there is a major problem
still with NFS/ZFS interaction.


On 5/4/06, Bill Sommerfeld <sommerfeld at sun.com>
wrote:> On Wed, 2006-05-03 at 22:16, Joe Little wrote:
> > Are there any scripts to track NFS server performance? More
> > specifically, I''m seeing terrible performance from Linux 2.6
clients
> > against Nevada (B38) with a ZFS shared pool, but only with directorys
> > with many small files. The performance for large files seems
> > reasonable (20MB/sec) but small drops down to low k''s/sec on
copying
> > over NFS.
>
> Perhaps not your problem but yesterday, Dave Miner and I traced some
> astoundingly poor NFS-over-ZFS performance on the Netra X1 to an
> apparent bug in sparc ATA driver code.
>
> The interactive response on the server while a tar extract (of the
> firefox source tarball, so lots of small files) was running on a
> (solaris) client reminded me of what it was like to use an overloaded
> timeshared TOPS-20 system the night before problem sets were due...
>
> tar was writing about one file per second.
>
> mpstat or vmstat (I forget which one I used) showed 100% system time.
>
> It took a few tries (one failed because dtrace aborted due to systemic
> unresponsiveness; the second started right before tar finished and gave
> us a nice profile of the idle loop..), but eventually we got useful data
> from:
>
> # lockstat -s 10 -I sleep 10
>
> The system was so overloaded that "sleep 10" took 22.7 seconds
>
> essentially all cpu time was spent in stack traces resembling:
>
> Count indv cuml rcnt     nsec CPU+PIL                Caller
>   1004  42%  42% 0.00     3322 cpu[0]                 drv_usecwait+0x78
>
>        nsec ------ Time Distribution ------ count     Stack
>        4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  975       ata_wait+0x90
>        8192 |                               22       
acersb_get_intr_status+0x34
>       16384 |                               7         ata_set_feature+0x124
>                                                       ata_disk_start+0x15c
>                                                       ata_hba_start+0xbc
>
> The driver is busy-waiting in drv_usecwait() for long periods of time,
> and my suspicion is that this is most likely an artifact of ZFS''s
> regular disk write cache flushes.
>
> we filed:
>
> 6421427 netra x1 slagged by NFS over ZFS leading to long spins in the
> ATA driver code
>
> (if you see unreasonably slow performance, essentially no idle time, and
> high system time when you think you shouldn''t, the above use of
> lockstat''s profiling mechanism or dtrace equivalents thereof
aggregating
> on stack() will point out where in the code the kernel is spinning its
> wheels.)
>
>                                         - Bill
>
>

Roch Bourbonnais - Performance Engineering

2006-May-04 15:52 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Joe Little writes:
 > In my case, I''m using an iscsi backend. However, doing any
operations
 > locally are fast via ZFS, even with small files. If I setup an rsyncd
 > service, clients can also write small files relatively fast
 > (3-5MB/sec), but over NFS its 5-64K/sec. So, there is a major problem
 > still with NFS/ZFS interaction.
 >

Does rsync follow an open/write/close pattern ?
Because of NFS close-to-open consistency, that pattern
delivers at best a throughput of 

	(avg file size) / (server IO latency)

Is your avg file size is < 640B  and latency 0.01 sec ?
This is true no matter what the backend FS. Or have you
observed something different with say UFS ?

Mounting with the -nocto option (at  your own risk) may help
if this is the issue.


-r

 > 
 > On 5/4/06, Bill Sommerfeld <sommerfeld at sun.com> wrote:
 > > On Wed, 2006-05-03 at 22:16, Joe Little wrote:
 > > > Are there any scripts to track NFS server performance? More
 > > > specifically, I''m seeing terrible performance from
Linux 2.6 clients
 > > > against Nevada (B38) with a ZFS shared pool, but only with
directorys
 > > > with many small files. The performance for large files seems
 > > > reasonable (20MB/sec) but small drops down to low
k''s/sec on copying
 > > > over NFS.
 > >
 > > Perhaps not your problem but yesterday, Dave Miner and I traced some
 > > astoundingly poor NFS-over-ZFS performance on the Netra X1 to an
 > > apparent bug in sparc ATA driver code.
 > >
 > > The interactive response on the server while a tar extract (of the
 > > firefox source tarball, so lots of small files) was running on a
 > > (solaris) client reminded me of what it was like to use an overloaded
 > > timeshared TOPS-20 system the night before problem sets were due...
 > >
 > > tar was writing about one file per second.
 > >
 > > mpstat or vmstat (I forget which one I used) showed 100% system time.
 > >
 > > It took a few tries (one failed because dtrace aborted due to
systemic
 > > unresponsiveness; the second started right before tar finished and
gave
 > > us a nice profile of the idle loop..), but eventually we got useful
data
 > > from:
 > >
 > > # lockstat -s 10 -I sleep 10
 > >
 > > The system was so overloaded that "sleep 10" took 22.7
seconds
 > >
 > > essentially all cpu time was spent in stack traces resembling:
 > >
 > > Count indv cuml rcnt     nsec CPU+PIL                Caller
 > >   1004  42%  42% 0.00     3322 cpu[0]                
drv_usecwait+0x78
 > >
 > >        nsec ------ Time Distribution ------ count     Stack
 > >        4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  975       ata_wait+0x90
 > >        8192 |                               22       
acersb_get_intr_status+0x34
 > >       16384 |                               7        
ata_set_feature+0x124
 > >                                                      
ata_disk_start+0x15c
 > >                                                      
ata_hba_start+0xbc
 > >
 > > The driver is busy-waiting in drv_usecwait() for long periods of
time,
 > > and my suspicion is that this is most likely an artifact of
ZFS''s
 > > regular disk write cache flushes.
 > >
 > > we filed:
 > >
 > > 6421427 netra x1 slagged by NFS over ZFS leading to long spins in the
 > > ATA driver code
 > >
 > > (if you see unreasonably slow performance, essentially no idle time,
and
 > > high system time when you think you shouldn''t, the above use
of
 > > lockstat''s profiling mechanism or dtrace equivalents thereof
aggregating
 > > on stack() will point out where in the code the kernel is spinning
its
 > > wheels.)
 > >
 > >                                         - Bill
 > >
 > >
 > _______________________________________________
 > dtrace-discuss mailing list
 > dtrace-discuss at opensolaris.org

Joe Little

2006-May-04 17:08 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On 5/4/06, Roch Bourbonnais - Performance Engineering
<Roch.Bourbonnais at sun.com> wrote:>
> Joe Little writes:
>  > In my case, I''m using an iscsi backend. However, doing any
operations
>  > locally are fast via ZFS, even with small files. If I setup an rsyncd
>  > service, clients can also write small files relatively fast
>  > (3-5MB/sec), but over NFS its 5-64K/sec. So, there is a major problem
>  > still with NFS/ZFS interaction.
>  >
>
> Does rsync follow an open/write/close pattern ?
> Because of NFS close-to-open consistency, that pattern
> delivers at best a throughput of
>
>         (avg file size) / (server IO latency)
>
> Is your avg file size is < 640B  and latency 0.01 sec ?
> This is true no matter what the backend FS. Or have you
> observed something different with say UFS ?
I''ll try the nocto option. Our average file size (I believe) is in the
low K''s -- perhaps between 1K-8K. An example where this gets really
slow is a prof with untarred versions of every linux kernel release
for source code analysis. It processes these directories about 2K/sec.
I will confess to not knowing how to measure latency. I haven''t
observed the issue with UFS, but I haven''t been trying much since UFS
doesn''t interest me as much. That said, our UFS only Solaris 9/10
machines haven''t had this issue.
>
> Mounting with the -nocto option (at  your own risk) may help
> if this is the issue.
>
>
> -r
>
>  >
>  > On 5/4/06, Bill Sommerfeld <sommerfeld at sun.com> wrote:
>  > > On Wed, 2006-05-03 at 22:16, Joe Little wrote:
>  > > > Are there any scripts to track NFS server performance? More
>  > > > specifically, I''m seeing terrible performance from
Linux 2.6 clients
>  > > > against Nevada (B38) with a ZFS shared pool, but only with
directorys
>  > > > with many small files. The performance for large files
seems
>  > > > reasonable (20MB/sec) but small drops down to low
k''s/sec on copying
>  > > > over NFS.
>  > >
>  > > Perhaps not your problem but yesterday, Dave Miner and I traced
some
>  > > astoundingly poor NFS-over-ZFS performance on the Netra X1 to an
>  > > apparent bug in sparc ATA driver code.
>  > >
>  > > The interactive response on the server while a tar extract (of
the
>  > > firefox source tarball, so lots of small files) was running on a
>  > > (solaris) client reminded me of what it was like to use an
overloaded
>  > > timeshared TOPS-20 system the night before problem sets were
due...
>  > >
>  > > tar was writing about one file per second.
>  > >
>  > > mpstat or vmstat (I forget which one I used) showed 100% system
time.
>  > >
>  > > It took a few tries (one failed because dtrace aborted due to
systemic
>  > > unresponsiveness; the second started right before tar finished
and gave
>  > > us a nice profile of the idle loop..), but eventually we got
useful data
>  > > from:
>  > >
>  > > # lockstat -s 10 -I sleep 10
>  > >
>  > > The system was so overloaded that "sleep 10" took 22.7
seconds
>  > >
>  > > essentially all cpu time was spent in stack traces resembling:
>  > >
>  > > Count indv cuml rcnt     nsec CPU+PIL                Caller
>  > >   1004  42%  42% 0.00     3322 cpu[0]                
drv_usecwait+0x78
>  > >
>  > >        nsec ------ Time Distribution ------ count     Stack
>  > >        4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  975      
ata_wait+0x90
>  > >        8192 |                               22       
acersb_get_intr_status+0x34
>  > >       16384 |                               7        
ata_set_feature+0x124
>  > >                                                      
ata_disk_start+0x15c
>  > >                                                      
ata_hba_start+0xbc
>  > >
>  > > The driver is busy-waiting in drv_usecwait() for long periods of
time,
>  > > and my suspicion is that this is most likely an artifact of
ZFS''s
>  > > regular disk write cache flushes.
>  > >
>  > > we filed:
>  > >
>  > > 6421427 netra x1 slagged by NFS over ZFS leading to long spins
in the
>  > > ATA driver code
>  > >
>  > > (if you see unreasonably slow performance, essentially no idle
time, and
>  > > high system time when you think you shouldn''t, the
above use of
>  > > lockstat''s profiling mechanism or dtrace equivalents
thereof aggregating
>  > > on stack() will point out where in the code the kernel is
spinning its
>  > > wheels.)
>  > >
>  > >                                         - Bill
>  > >
>  > >
>  > _______________________________________________
>  > dtrace-discuss mailing list
>  > dtrace-discuss at opensolaris.org
>
>

Matty

2006-May-04 17:52 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Thu, 4 May 2006, Joe Little wrote:
> I''ll try the nocto option. Our average file size (I believe) is in
the
> low K''s -- perhaps between 1K-8K. An example where this gets
really
> slow is a prof with untarred versions of every linux kernel release
> for source code analysis. It processes these directories about 2K/sec.
> I will confess to not knowing how to measure latency. I haven''t
> observed the issue with UFS, but I haven''t been trying much since
UFS
> doesn''t interest me as much. That said, our UFS only Solaris 9/10
> machines haven''t had this issue.
I have found tcptrace, snoop -vv, and the DTrae profile and io providers 
invaluable for putting together data flow diagrams with latency 
measurements. There is also some useful information in nfsstat, which 
might help you better understand activity on the client and server.

Hope this helps,
- Ryan
--
UNIX Administrator
http://daemons.net/~matty

Joe Little

2006-May-04 18:16 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Well, nfsstat I can do, and this is the pertinent section. Is this
high for getattrs and readdirplus?
Version 3: (323847 calls)
null        getattr     setattr     lookup      access      readlink
7 0%        177796 54%  1570 0%     2049 0%     37509 11%   5 0%
read        write       create      mkdir       symlink     mknod
32004 9%    47471 14%   351 0%      221 0%      5 0%        0 0%
remove      rmdir       rename      link        readdir     readdirplus
892 0%      49 0%       0 0%        0 0%        0 0%        22931 7%
fsstat      fsinfo      pathconf    commit
619 0%      16 0%       0 0%        352 0%


On 5/4/06, Matty <matty91 at gmail.com> wrote:>
> On Thu, 4 May 2006, Joe Little wrote:
>
> > I''ll try the nocto option. Our average file size (I believe)
is in the
> > low K''s -- perhaps between 1K-8K. An example where this gets
really
> > slow is a prof with untarred versions of every linux kernel release
> > for source code analysis. It processes these directories about 2K/sec.
> > I will confess to not knowing how to measure latency. I
haven''t
> > observed the issue with UFS, but I haven''t been trying much
since UFS
> > doesn''t interest me as much. That said, our UFS only Solaris
9/10
> > machines haven''t had this issue.
>
> I have found tcptrace, snoop -vv, and the DTrae profile and io providers
> invaluable for putting together data flow diagrams with latency
> measurements. There is also some useful information in nfsstat, which
> might help you better understand activity on the client and server.
>
> Hope this helps,
> - Ryan
> --
> UNIX Administrator
> http://daemons.net/~matty
> _______________________________________________
> nfs-discuss mailing list
> nfs-discuss at opensolaris.org
>

Spencer Shepler

2006-May-04 19:34 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Thu, Joe Little wrote:> Well, nfsstat I can do, and this is the pertinent section. Is this
> high for getattrs and readdirplus?
Not in my opinion based on the workload you have described.
The lookup counts are very low but that is replaced by the 
larger percentage of readdirplus (it is populating the name cache
at the client).  The getattr counts are large because of the number
of files (client is doing a getattr at least for each file for NFS''
close to open behavior).

Spencer
> Version 3: (323847 calls)
> null        getattr     setattr     lookup      access      readlink
> 7 0%        177796 54%  1570 0%     2049 0%     37509 11%   5 0%
> read        write       create      mkdir       symlink     mknod
> 32004 9%    47471 14%   351 0%      221 0%      5 0%        0 0%
> remove      rmdir       rename      link        readdir     readdirplus
> 892 0%      49 0%       0 0%        0 0%        0 0%        22931 7%
> fsstat      fsinfo      pathconf    commit
> 619 0%      16 0%       0 0%        352 0%
> 
> 
> On 5/4/06, Matty <matty91 at gmail.com> wrote:
> >
> >On Thu, 4 May 2006, Joe Little wrote:
> >
> >> I''ll try the nocto option. Our average file size (I
believe) is in the
> >> low K''s -- perhaps between 1K-8K. An example where this
gets really
> >> slow is a prof with untarred versions of every linux kernel
release
> >> for source code analysis. It processes these directories about
2K/sec.
> >> I will confess to not knowing how to measure latency. I
haven''t
> >> observed the issue with UFS, but I haven''t been trying
much since UFS
> >> doesn''t interest me as much. That said, our UFS only
Solaris 9/10
> >> machines haven''t had this issue.
> >
> >I have found tcptrace, snoop -vv, and the DTrae profile and io
providers
> >invaluable for putting together data flow diagrams with latency
> >measurements. There is also some useful information in nfsstat, which
> >might help you better understand activity on the client and server.
> >
> >Hope this helps,
> >- Ryan
> >--
> >UNIX Administrator
> >http://daemons.net/~matty
> >_______________________________________________
> >nfs-discuss mailing list
> >nfs-discuss at opensolaris.org
> >
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Jeff A. Smith

2006-May-05 00:51 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Joe Little wrote:>
> I will confess to not knowing how to measure latency. I haven''t
> observed the issue with UFS, but I haven''t been trying much since
UFS
> doesn''t interest me as much. That said, our UFS only Solaris 9/10
> machines haven''t had this issue.
> Hi Joe,

I''ve attached a very basic dtrace script to measure latency for
all NFS3 server ops.  Notice that this script measures the latency
incurred both from NFS3 server processing and the underlying file
system.

Just run it as root on your server and crank up a NFS3 client.  It
will produce the following metrics:

1. op count
2. avg response time per op
3. avg system time consumed per op
4. quantized avg resp time
5. quantized avg system time

I recommend running the same workload against a NFS mounted
tmpfs (to set the utopian baseline), UFS, and ZFS.  Then look
for the large differences.  That will give us an idea of which
NFS3 procs are having perf issues, and we can dig deeper.

I''m out of town Friday and Monday, but Spencer will cover for
me if you run into any problems.

Jeff
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: srv3snoop.d
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20060504/02a7e90f/attachment.ksh>

Joe Little

2006-May-05 04:50 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Perhaps I''m finding bugs in the iscsi initiator.

I can give the full dtrace reports (should I attach them?) but for now
I''ll give the three top segments. I did a mixed directory of only 1000
items of around 3GB as a first test. The NFS->UFS test and the
NFS->ZFS (local) were relatively fast, but the NFS->ZFS (over iscsi)
was dog slow vs what that ZFS/iscsi link normally does.

UFS:
NFS3 op counts
=============RFS3_SYMLINK                      5
RFS3_MKDIR                       73
RFS3_COMMIT                     885
RFS3_CREATE                     901
RFS3_RENAME                     901
RFS3_ACCESS                    1863
RFS3_SETATTR                   3792
RFS3_LOOKUP                    3873
RFS3_GETATTR                   7568
RFS3_WRITE                    46844


NFS3 op avg response time (usec)
===============================RFS3_GETATTR                      8
RFS3_ACCESS                       9
RFS3_LOOKUP                      10
RFS3_RENAME                     222
RFS3_SYMLINK                    344
RFS3_SETATTR                    559
RFS3_CREATE                     996
RFS3_MKDIR                     4589
RFS3_WRITE                     9409
RFS3_COMMIT                   13521


NFS3 op avg system time (usec)
=============================RFS3_ACCESS                       6
RFS3_GETATTR                      6
RFS3_LOOKUP                       8
RFS3_SETATTR                     32
RFS3_CREATE                      72
RFS3_RENAME                      77
RFS3_SYMLINK                     96
RFS3_MKDIR                      105
RFS3_COMMIT                     112
RFS3_WRITE                      125

ZFS (non-iscsi):
NFS3 op counts
=============RFS3_SYMLINK                      5
RFS3_MKDIR                       73
RFS3_COMMIT                     885
RFS3_CREATE                     901
RFS3_RENAME                     901
RFS3_ACCESS                    1884
RFS3_SETATTR                   3792
RFS3_LOOKUP                    3945
RFS3_GETATTR                   8479
RFS3_WRITE                    46844


NFS3 op avg response time (usec)
===============================RFS3_GETATTR                      8
RFS3_ACCESS                      10
RFS3_LOOKUP                      13
RFS3_WRITE                       82
RFS3_SETATTR                  13785
RFS3_RENAME                   14780
RFS3_CREATE                   14806
RFS3_SYMLINK                  16018
RFS3_MKDIR                    16591
RFS3_COMMIT                   72752


NFS3 op avg system time (usec)
=============================RFS3_GETATTR                      7
RFS3_ACCESS                       8
RFS3_LOOKUP                       9
RFS3_SETATTR                     51
RFS3_SYMLINK                     53
RFS3_CREATE                      63
RFS3_RENAME                      67
RFS3_WRITE                       80
RFS3_MKDIR                      195
RFS3_COMMIT                    5069

ZFS (iscsi):
NFS3 op counts
=============RFS3_FSINFO                       1
RFS3_NULL                         1
RFS3_FSSTAT                       4
RFS3_SYMLINK                      5
RFS3_MKDIR                       73
RFS3_COMMIT                     885
RFS3_CREATE                     901
RFS3_RENAME                     901
RFS3_ACCESS                    1958
RFS3_SETATTR                   3792
RFS3_LOOKUP                    3963
RFS3_GETATTR                   8514
RFS3_WRITE                    46844


NFS3 op avg response time (usec)
===============================RFS3_ACCESS                      13
RFS3_GETATTR                     14
RFS3_FSSTAT                      18
RFS3_LOOKUP                      24
RFS3_NULL                       253
RFS3_WRITE                     9931
RFS3_FSINFO                   19043
RFS3_SYMLINK                 118492
RFS3_SETATTR                 138609
RFS3_RENAME                  146412
RFS3_CREATE                  156186
RFS3_MKDIR                   223285
RFS3_COMMIT                  273027


NFS3 op avg system time (usec)
=============================RFS3_ACCESS                      11
RFS3_GETATTR                     12
RFS3_LOOKUP                      13
RFS3_NULL                        14
RFS3_FSSTAT                      15
RFS3_SYMLINK                     84
RFS3_WRITE                       91
RFS3_SETATTR                     96
RFS3_CREATE                     105
RFS3_RENAME                     112
RFS3_FSINFO                     118
RFS3_MKDIR                      387
RFS3_COMMIT                    6256

The ZFS numbers for response times seems worse than UFS in either
case, but there is a large difference between ZFS on local SATA disk
vs. my iscsi RAIDZ config.


On 5/4/06, Jeff A. Smith <Jeff.A.Smith at sun.com>
wrote:> Joe Little wrote:
> >
> > I will confess to not knowing how to measure latency. I
haven''t
> > observed the issue with UFS, but I haven''t been trying much
since UFS
> > doesn''t interest me as much. That said, our UFS only Solaris
9/10
> > machines haven''t had this issue.
> >
> Hi Joe,
>
> I''ve attached a very basic dtrace script to measure latency for
> all NFS3 server ops.  Notice that this script measures the latency
> incurred both from NFS3 server processing and the underlying file
> system.
>
> Just run it as root on your server and crank up a NFS3 client.  It
> will produce the following metrics:
>
> 1. op count
> 2. avg response time per op
> 3. avg system time consumed per op
> 4. quantized avg resp time
> 5. quantized avg system time
>
> I recommend running the same workload against a NFS mounted
> tmpfs (to set the utopian baseline), UFS, and ZFS.  Then look
> for the large differences.  That will give us an idea of which
> NFS3 procs are having perf issues, and we can dig deeper.
>
> I''m out of town Friday and Monday, but Spencer will cover for
> me if you run into any problems.
>
> Jeff
>
>
> #!/usr/sbin/dtrace -Fs
>
> fbt:nfssrv:common_dispatch:entry
> /args[0]->rq_vers == 3 && ! self->trace/
> {
>         self->trace = 1;
>         this->proc = args[0]->rq_proc;
>         self->pname = (this->proc >= 0 && this->proc
<= 21) ?
>                 stringof (nfssrv`rfscallnames_v3[this->proc]) :
>                 stringof ("invalid proc");
>         self->time = timestamp;
>         self->vtime = vtimestamp;
>         @opcount[self->pname] = count();
> }
>
> fbt:nfssrv:common_dispatch:return
> /self->trace/
> {
>         this->usec = (timestamp - self->time) / 1000;
>         this->systm = (vtimestamp - self->vtime) / 1000;
>         @avgtime[self->pname] = avg(this->usec);
>         @qnttime[self->pname] = quantize(this->usec);
>
>         @avgsystm[self->pname] = avg(this->systm);
>         @qntsystm[self->pname] = quantize(this->systm);
>
>         self->pname = 0;
>         self->trace = 0;
> }
>
> END
> {
>         printf("\n\nNFS3 op counts\n");
>         printf(    "==============\n");
>         printa("%-20s   %@12d\n", @opcount);
>
>         printf("\n\nNFS3 op avg response time (usec)\n");
>         printf(    "================================\n");
>         printa("%-20s   %@12d\n", @avgtime);
>
>         printf("\n\nNFS3 op avg system time (usec)\n");
>         printf(    "==============================\n");
>         printa("%-20s   %@12d\n", @avgsystm);
>
>         printf("\n\nNFS3 op quantized response time (usec)\n");
>         printf(    "======================================\n");
>         printa("%-20s   %@12d\n", @qnttime);
>
>         printf("\n\nNFS3 op quantized system time (usec)\n");
>         printf(    "====================================\n");
>         printa("%-20s   %@12d\n", @qntsystm);
> }
>
>
>

Joe Little

2006-May-05 04:57 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Adding to the previous test, I also did a local UFS of the same 3G to
the iscsi ZFS volume. The dtrace didn''t show anything, but this
completed the fastest of all, and so it sheds light that its the
NFS->ZFS->iSCSI interaction that is terrible, but ZFS->iSCSI is
fastest, with NFS->UFS/ZFS being second fastest.


On 5/4/06, Jeff A. Smith <Jeff.A.Smith at sun.com>
wrote:> Joe Little wrote:
> >
> > I will confess to not knowing how to measure latency. I
haven''t
> > observed the issue with UFS, but I haven''t been trying much
since UFS
> > doesn''t interest me as much. That said, our UFS only Solaris
9/10
> > machines haven''t had this issue.
> >
> Hi Joe,
>
> I''ve attached a very basic dtrace script to measure latency for
> all NFS3 server ops.  Notice that this script measures the latency
> incurred both from NFS3 server processing and the underlying file
> system.
>
> Just run it as root on your server and crank up a NFS3 client.  It
> will produce the following metrics:
>
> 1. op count
> 2. avg response time per op
> 3. avg system time consumed per op
> 4. quantized avg resp time
> 5. quantized avg system time
>
> I recommend running the same workload against a NFS mounted
> tmpfs (to set the utopian baseline), UFS, and ZFS.  Then look
> for the large differences.  That will give us an idea of which
> NFS3 procs are having perf issues, and we can dig deeper.
>
> I''m out of town Friday and Monday, but Spencer will cover for
> me if you run into any problems.
>
> Jeff
>
>
> #!/usr/sbin/dtrace -Fs
>
> fbt:nfssrv:common_dispatch:entry
> /args[0]->rq_vers == 3 && ! self->trace/
> {
>         self->trace = 1;
>         this->proc = args[0]->rq_proc;
>         self->pname = (this->proc >= 0 && this->proc
<= 21) ?
>                 stringof (nfssrv`rfscallnames_v3[this->proc]) :
>                 stringof ("invalid proc");
>         self->time = timestamp;
>         self->vtime = vtimestamp;
>         @opcount[self->pname] = count();
> }
>
> fbt:nfssrv:common_dispatch:return
> /self->trace/
> {
>         this->usec = (timestamp - self->time) / 1000;
>         this->systm = (vtimestamp - self->vtime) / 1000;
>         @avgtime[self->pname] = avg(this->usec);
>         @qnttime[self->pname] = quantize(this->usec);
>
>         @avgsystm[self->pname] = avg(this->systm);
>         @qntsystm[self->pname] = quantize(this->systm);
>
>         self->pname = 0;
>         self->trace = 0;
> }
>
> END
> {
>         printf("\n\nNFS3 op counts\n");
>         printf(    "==============\n");
>         printa("%-20s   %@12d\n", @opcount);
>
>         printf("\n\nNFS3 op avg response time (usec)\n");
>         printf(    "================================\n");
>         printa("%-20s   %@12d\n", @avgtime);
>
>         printf("\n\nNFS3 op avg system time (usec)\n");
>         printf(    "==============================\n");
>         printa("%-20s   %@12d\n", @avgsystm);
>
>         printf("\n\nNFS3 op quantized response time (usec)\n");
>         printf(    "======================================\n");
>         printa("%-20s   %@12d\n", @qnttime);
>
>         printf("\n\nNFS3 op quantized system time (usec)\n");
>         printf(    "====================================\n");
>         printa("%-20s   %@12d\n", @qntsystm);
> }
>
>
>

Bart Smaalders

2006-May-05 15:07 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Joe Little wrote:> Perhaps I''m finding bugs in the iscsi initiator.
> 
> I can give the full dtrace reports (should I attach them?) but for now
> I''ll give the three top segments. I did a mixed directory of only
1000
> items of around 3GB as a first test. The NFS->UFS test and the
> NFS->ZFS (local) were relatively fast, but the NFS->ZFS (over iscsi)
> was dog slow vs what that ZFS/iscsi link normally does.
> 
Could it be that your NFS implementation is single threaded on synch.
operations?

In other words, the NFS implementation (or benchmark) is forcing lots
of synchronous IO operations that are single threaded.  If you''re
always
waiting for data to end up on the platter before issuing the next IO
request, this would kill performance.

Perhaps a DTrace script to check workload concurrency is in order...

- Bart

-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts

Joe Little

2006-May-05 16:01 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Well, I used the dtrace script used here. The NFS implementation
(server) is Solaris 11 B38, and the client and the RHEL linux
revision, which doesn''t have this problem going through other
SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I previously setup a
Linux box as an iscsi initiator, XFS, and Linux''s less than stellar
kNFS server revision, and did not see this interaction. Thus, if there
are any thread issues, its likely on Solaris'' end or there is
particularly bad interaction with linux clients if and only if the
solaris backend is iSCSI. That latter doesn''t make sense.

On 5/5/06, Bart Smaalders <bart.smaalders at sun.com>
wrote:> Joe Little wrote:
> > Perhaps I''m finding bugs in the iscsi initiator.
> >
> > I can give the full dtrace reports (should I attach them?) but for now
> > I''ll give the three top segments. I did a mixed directory of
only 1000
> > items of around 3GB as a first test. The NFS->UFS test and the
> > NFS->ZFS (local) were relatively fast, but the NFS->ZFS (over
iscsi)
> > was dog slow vs what that ZFS/iscsi link normally does.
> >
>
> Could it be that your NFS implementation is single threaded on synch.
> operations?
>
> In other words, the NFS implementation (or benchmark) is forcing lots
> of synchronous IO operations that are single threaded.  If you''re
always
> waiting for data to end up on the platter before issuing the next IO
> request, this would kill performance.
>
> Perhaps a DTrace script to check workload concurrency is in order...
>
> - Bart
>
>
>
> --
> Bart Smaalders                  Solaris Kernel Performance
> barts at cyber.eng.sun.com         http://blogs.sun.com/barts
>

Joe Little

2006-May-05 16:02 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

..and the client is a RHEL 4 linux revision...

(sorry for any confusion)

On 5/5/06, Joe Little <jmlittle at gmail.com>
wrote:> Well, I used the dtrace script used here. The NFS implementation
> (server) is Solaris 11 B38, and the client and the RHEL linux
> revision, which doesn''t have this problem going through other
> SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I previously setup a
> Linux box as an iscsi initiator, XFS, and Linux''s less than
stellar
> kNFS server revision, and did not see this interaction. Thus, if there
> are any thread issues, its likely on Solaris'' end or there is
> particularly bad interaction with linux clients if and only if the
> solaris backend is iSCSI. That latter doesn''t make sense.
>
>
> On 5/5/06, Bart Smaalders <bart.smaalders at sun.com> wrote:
> > Joe Little wrote:
> > > Perhaps I''m finding bugs in the iscsi initiator.
> > >
> > > I can give the full dtrace reports (should I attach them?) but
for now
> > > I''ll give the three top segments. I did a mixed
directory of only 1000
> > > items of around 3GB as a first test. The NFS->UFS test and the
> > > NFS->ZFS (local) were relatively fast, but the NFS->ZFS
(over iscsi)
> > > was dog slow vs what that ZFS/iscsi link normally does.
> > >
> >
> > Could it be that your NFS implementation is single threaded on synch.
> > operations?
> >
> > In other words, the NFS implementation (or benchmark) is forcing lots
> > of synchronous IO operations that are single threaded.  If
you''re always
> > waiting for data to end up on the platter before issuing the next IO
> > request, this would kill performance.
> >
> > Perhaps a DTrace script to check workload concurrency is in order...
> >
> > - Bart
> >
> >
> >
> > --
> > Bart Smaalders                  Solaris Kernel Performance
> > barts at cyber.eng.sun.com         http://blogs.sun.com/barts
> >
>

Spencer Shepler

2006-May-05 16:49 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, Joe Little wrote:> Well, I used the dtrace script used here. The NFS implementation
> (server) is Solaris 11 B38, and the client and the RHEL linux
> revision, which doesn''t have this problem going through other
> SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I previously setup a
> Linux box as an iscsi initiator, XFS, and Linux''s less than
stellar
> kNFS server revision, and did not see this interaction. Thus, if there
> are any thread issues, its likely on Solaris'' end or there is
> particularly bad interaction with linux clients if and only if the
> solaris backend is iSCSI. That latter doesn''t make sense.
It is a server response time issue as you have demonstrated with data.
The server in the NFS/ZFS/iSCSI path is not responding as quickly
as other combinations and for this particular application, the overall
throughput is subpar.

Focusing on the disparity found to understand why the NFS/ZFS/iSCSI
combo is not working well seems like the correct path.

Spencer

Joe Little

2006-May-05 17:46 UTC

head link

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On 5/5/06, Spencer Shepler <spencer.shepler at sun.com>
wrote:> On Fri, Joe Little wrote:
> > Well, I used the dtrace script used here. The NFS implementation
> > (server) is Solaris 11 B38, and the client and the RHEL linux
> > revision, which doesn''t have this problem going through other
> > SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I previously setup a
> > Linux box as an iscsi initiator, XFS, and Linux''s less than
stellar
> > kNFS server revision, and did not see this interaction. Thus, if there
> > are any thread issues, its likely on Solaris'' end or there is
> > particularly bad interaction with linux clients if and only if the
> > solaris backend is iSCSI. That latter doesn''t make sense.
>
> It is a server response time issue as you have demonstrated with data.
> The server in the NFS/ZFS/iSCSI path is not responding as quickly
> as other combinations and for this particular application, the overall
> throughput is subpar.
>
> Focusing on the disparity found to understand why the NFS/ZFS/iSCSI
> combo is not working well seems like the correct path.
>
That''s where I''m at a loss. Has the NFS/ZFS/iSCSI path been
tested by
Sun at all?

> Spencer
>
>

Spencer Shepler

2006-May-05 19:32 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, Joe Little wrote:> On 5/5/06, Spencer Shepler <spencer.shepler at sun.com> wrote:
> >On Fri, Joe Little wrote:
> >> Well, I used the dtrace script used here. The NFS implementation
> >> (server) is Solaris 11 B38, and the client and the RHEL linux
> >> revision, which doesn''t have this problem going through
other
> >> SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I previously setup
a
> >> Linux box as an iscsi initiator, XFS, and Linux''s less
than stellar
> >> kNFS server revision, and did not see this interaction. Thus, if
there
> >> are any thread issues, its likely on Solaris'' end or
there is
> >> particularly bad interaction with linux clients if and only if the
> >> solaris backend is iSCSI. That latter doesn''t make sense.
> >
> >It is a server response time issue as you have demonstrated with data.
> >The server in the NFS/ZFS/iSCSI path is not responding as quickly
> >as other combinations and for this particular application, the overall
> >throughput is subpar.
> >
> >Focusing on the disparity found to understand why the NFS/ZFS/iSCSI
> >combo is not working well seems like the correct path.
> >
> 
> That''s where I''m at a loss. Has the NFS/ZFS/iSCSI path
been tested by
> Sun at all?
I don''t know and this seems like a good point to move the
discussion to zfs-discuss and nfs-discuss to see if there
is additional input.

Spencer

Joe Little

2006-May-05 19:55 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

well, it was already an NFS-discuss list message. Someone else added
dtrace-discuss to it. I have already noted this to a degree on zfs-discuss,
but it seems to be mainly a NFS specific issue at this stage.


On 5/5/06, Spencer Shepler <spencer.shepler at sun.com>
wrote:>
> On Fri, Joe Little wrote:
> > On 5/5/06, Spencer Shepler <spencer.shepler at sun.com> wrote:
> > >On Fri, Joe Little wrote:
> > >> Well, I used the dtrace script used here. The NFS
implementation
> > >> (server) is Solaris 11 B38, and the client and the RHEL linux
> > >> revision, which doesn''t have this problem going
through other
> > >> SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I previously
setup a
> > >> Linux box as an iscsi initiator, XFS, and Linux''s
less than stellar
> > >> kNFS server revision, and did not see this interaction. Thus,
if
> there
> > >> are any thread issues, its likely on Solaris'' end or
there is
> > >> particularly bad interaction with linux clients if and only
if the
> > >> solaris backend is iSCSI. That latter doesn''t make
sense.
> > >
> > >It is a server response time issue as you have demonstrated with
data.
> > >The server in the NFS/ZFS/iSCSI path is not responding as quickly
> > >as other combinations and for this particular application, the
overall
> > >throughput is subpar.
> > >
> > >Focusing on the disparity found to understand why the
NFS/ZFS/iSCSI
> > >combo is not working well seems like the correct path.
> > >
> >
> > That''s where I''m at a loss. Has the NFS/ZFS/iSCSI
path been tested by
> > Sun at all?
>
> I don''t know and this seems like a good point to move the
> discussion to zfs-discuss and nfs-discuss to see if there
> is additional input.
>
> Spencer
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060505/b061d5d8/attachment.html>

Joe Little

2006-May-05 21:15 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

I just did another test, this time using a linux NFS client against B38 with
UFS and iscsi disks. It was close to the same speed (over 8MB/sec average)
as going to UFS on local disk or ZFS on local disk (around 20MB/sec). My UFS
formated iscsi disk was only a single iscsi disk and not like the RAIDZ
array I have for ZFS on iscsi. Either way, it looks like all methods of
writing (local or NFS) to UFS or ZFS are plenty fast. It took 1minute 7
seconds to write to over NFS to UFS, and 2minutes 15 seconds to write over
NFS to UFS on iscsi. With ZFS over iscsi, the same 3G directory sample took
50+ minutes, at low k''s/sec.

Here''s the NFS->UFS->iSCSI dtrace latency times for comparison:

dtrace: script ''./nfs.dtrace'' matched 3 probes
^C
CPU FUNCTION
  0 | :END

NFS3 op counts
=============RFS3_FSSTAT                       4
RFS3_SYMLINK                      5
RFS3_MKDIR                       73
RFS3_COMMIT                     886
RFS3_CREATE                     901
RFS3_RENAME                     901
RFS3_ACCESS                    1881
RFS3_SETATTR                   3792
RFS3_LOOKUP                    3882
RFS3_GETATTR                   7579
RFS3_WRITE                    46844


NFS3 op avg response time (usec)
===============================RFS3_GETATTR                      7
RFS3_ACCESS                       8
RFS3_LOOKUP                      14
RFS3_FSSTAT                     135
RFS3_RENAME                     884
RFS3_SETATTR                   2172
RFS3_CREATE                    6410
RFS3_MKDIR                    15766
RFS3_SYMLINK                  22340
RFS3_WRITE                    30577
RFS3_COMMIT                   57034


NFS3 op avg system time (usec)
=============================RFS3_ACCESS                       6
RFS3_GETATTR                      6
RFS3_LOOKUP                       8
RFS3_FSSTAT                      23
RFS3_SETATTR                     27
RFS3_CREATE                      66
RFS3_RENAME                      75
RFS3_SYMLINK                     90
RFS3_MKDIR                      120
RFS3_WRITE                      188
RFS3_COMMIT                     306


On 5/5/06, Joe Little <jmlittle at gmail.com>
wrote:>
> well, it was already an NFS-discuss list message. Someone else added
> dtrace-discuss to it. I have already noted this to a degree on zfs-discuss,
> but it seems to be mainly a NFS specific issue at this stage.
>
>
>
> On 5/5/06, Spencer Shepler <spencer.shepler at sun.com> wrote:
> >
> > On Fri, Joe Little wrote:
> > > On 5/5/06, Spencer Shepler <spencer.shepler at sun.com>
wrote:
> > > >On Fri, Joe Little wrote:
> > > >> Well, I used the dtrace script used here. The NFS
implementation
> > > >> (server) is Solaris 11 B38, and the client and the RHEL
linux
> > > >> revision, which doesn''t have this problem going
through other
> > > >> SAN-based NAS (NetApp, EMC, etc.. even iSCSI). I
previously setup a
> >
> > > >> Linux box as an iscsi initiator, XFS, and
Linux''s less than stellar
> > > >> kNFS server revision, and did not see this interaction.
Thus, if
> > there
> > > >> are any thread issues, its likely on Solaris''
end or there is
> > > >> particularly bad interaction with linux clients if and
only if the
> > > >> solaris backend is iSCSI. That latter doesn''t
make sense.
> > > >
> > > >It is a server response time issue as you have demonstrated
with
> > data.
> > > >The server in the NFS/ZFS/iSCSI path is not responding as
quickly
> > > >as other combinations and for this particular application,
the
> > overall
> > > >throughput is subpar.
> > > >
> > > >Focusing on the disparity found to understand why the
NFS/ZFS/iSCSI
> > > >combo is not working well seems like the correct path.
> > > >
> > >
> > > That''s where I''m at a loss. Has the
NFS/ZFS/iSCSI path been tested by
> > > Sun at all?
> >
> > I don''t know and this seems like a good point to move the
> > discussion to zfs-discuss and nfs-discuss to see if there
> > is additional input.
> >
> > Spencer
> >
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060505/9dc85483/attachment.html>

Eric Schrock

2006-May-05 21:32 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

My gut feeling is that somehow the DKIOCFLUSHWRITECACHE ioctls (which
translate to the SCSI flush write cace requests) are throwing iSCSI for
a loop.  We''ve exposed a number of bugs in our drivers because ZFS is
the first filesystem to actually care to issue this request.

To turn this off, you can try:

# mdb -kw> ::walk spa | ::print spa_t spa_root_vdev | ::vdev -rADDR             STATE     AUX          DESCRIPTION
ffffffff82dc16c0 HEALTHY   -            root
ffffffff82dc0640 HEALTHY   -             
/dev/dsk/c0d0s0> ffffffff82dc0640::print -a vdev_t vdev_nowritecache
ffffffff82dc0af8 vdev_nowritecache = 0 (B_FALSE)> ffffffff82dc0af8/W10xffffffff82dc0af8:             0               =      
0x1>
See if that makes a difference.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Spencer Shepler

2006-May-05 21:35 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, Joe Little wrote:> well, it was already an NFS-discuss list message. Someone else added
> dtrace-discuss to it. I have already noted this to a degree on zfs-discuss,
> but it seems to be mainly a NFS specific issue at this stage.
So I took your original data you posted and reformatted it to
make it a little easier to compare.

The first table, the overall operation counts, shows that the client
is generally sending the same number of requests.  Ignoring the
FSINFO, NULL, FSSTAT numbers now given that they are a data collection
anamoly.

The numbers are pretty much exact except for GETATTR when
the combination if NFS/ZFS.  That is a little odd and may be
something to look at out of curiousity but given the average
response times in the second table for GETATTR, it shouldn''t
be an issue.

So the client is sending the same number of ops.  Good
to confirm that.

NFS3 op counts           UFS            ZFS (non-iscsi) ZFS (iscsi)
==================================================================RFS3_FSINFO   
-              -               1
RFS3_NULL                -              -               1
RFS3_FSSTAT              -              -               4
RFS3_SYMLINK             5              5               5    
RFS3_MKDIR               73             73              73   
RFS3_COMMIT              885            885             885  
RFS3_CREATE              901            901             901  
RFS3_RENAME              901            901             901  
RFS3_ACCESS              1863           1884            1958 
RFS3_SETATTR             3792           3792            3792 
RFS3_LOOKUP              3873           3945            3963 
RFS3_GETATTR             7568           8479            8514 
RFS3_WRITE               46844          46844           46844

This second table is the most interesting one.

It is easy to see that when comparing the ZFS non-iscsi and iscsi
columns that the metadata operations: CREATE, MKDIR, RENAME, SETATTR,
SYMLINK are on average an order of magnitude great in response time.
More at the end on why that is the source of the "throughput" issues
you are observing.

NFS3 avg rsp time (usec)
                         UFS            ZFS (non-iscsi) ZFS (iscsi)
==================================================================RFS3_ACCESS   
9              10              13
RFS3_COMMIT              13521          72752           273027
RFS3_CREATE              996            14806           156186
RFS3_FSINFO              -              -               19043
RFS3_FSSTAT              -              -               18
RFS3_GETATTR             8              8               14    
RFS3_LOOKUP              10             13              24    
RFS3_MKDIR               4589           16591           223285
RFS3_NULL                -              -               253   
RFS3_RENAME              222            14780           146412
RFS3_SETATTR             559            13785           138609
RFS3_SYMLINK             344            16018           118492
RFS3_WRITE               9409           82              9931  

And the system time measurements aren''t too interesting 
in that the extra overhead is unlikely to be the biggest issue
at this point.

NFS3 op avg sys time (usec)
                         UFS            ZFS (non-iscsi) ZFS (iscsi)
==================================================================RFS3_ACCESS   
6              8               11
RFS3_COMMIT              112            5069            6256
RFS3_CREATE              72             63              105 
RFS3_FSINFO              -              -               118
RFS3_FSSTAT              -              -               15
RFS3_GETATTR             6              7               12  
RFS3_LOOKUP              8              9               13 
RFS3_MKDIR               105            195             387
RFS3_NULL                -              -               14 
RFS3_RENAME              77             67              112
RFS3_SETATTR             32             51              96 
RFS3_SYMLINK             96             53              84 
RFS3_WRITE               125            80              91 

------

So the average response time table is the key.  With the workload
you described (creating directories and files of "smaller" size)
the response time of the operations like CREATE, MKDIR, RENAME, SETATTR
and SYMLINK will become key.  The NFS protocol requires that for
these operations (and COMMIT for that matter), the associated
file system metadata must be on stable storage before the server
can respond that the operation is complete.

Therefore, the order of magnitude greater response times for those
operations will slow the overall throughput to what you are observing.

With the data that you provided for the UFS/iscsi config there is
additional response time being added because of the iscsi backend
and that would be expected.  However, I don''t have a good explanation
for why the ZFS/iscsi combination is so much more extreme than
the UFS/iscsi combination.

I do know why local access to the ZFS/iscsi config is faster and
that is because for local access the requirement for meta-data
update is not present as it is for the NFS server so there is
a lot of disk updates that can be effectively aggregated and postponed.

Spencer

Joe Little

2006-May-05 22:46 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Thanks for the tip. In the local case, I could send to the
iSCSI-backed ZFS RAIDZ at even faster rates, with a total elapsed time
of 50seconds (17 seconds better than UFS). However, I didn''t even both
finishing the NFS client test, since it was taking a few seconds
between multiple 27K files. So, it didn''t help NFS at all. I''m
wondering if there is something on the NFS end that needs changing,
no? Also, how would one easily script the mdb command below to make
permanent?

On 5/5/06, Eric Schrock <eric.schrock at sun.com>
wrote:> My gut feeling is that somehow the DKIOCFLUSHWRITECACHE ioctls (which
> translate to the SCSI flush write cace requests) are throwing iSCSI for
> a loop.  We''ve exposed a number of bugs in our drivers because ZFS
is
> the first filesystem to actually care to issue this request.
>
> To turn this off, you can try:
>
> # mdb -kw
> > ::walk spa | ::print spa_t spa_root_vdev | ::vdev -r
> ADDR             STATE     AUX          DESCRIPTION
> ffffffff82dc16c0 HEALTHY   -            root
> ffffffff82dc0640 HEALTHY   -              /dev/dsk/c0d0s0
> > ffffffff82dc0640::print -a vdev_t vdev_nowritecache
> ffffffff82dc0af8 vdev_nowritecache = 0 (B_FALSE)
> > ffffffff82dc0af8/W1
> 0xffffffff82dc0af8:             0               =       0x1
> >
>
> See if that makes a difference.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
>

Eric Schrock

2006-May-05 22:53 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little
wrote:> Thanks for the tip. In the local case, I could send to the
> iSCSI-backed ZFS RAIDZ at even faster rates, with a total elapsed time
> of 50seconds (17 seconds better than UFS). However, I didn''t even
both
> finishing the NFS client test, since it was taking a few seconds
> between multiple 27K files. So, it didn''t help NFS at all.
I''m
> wondering if there is something on the NFS end that needs changing,
> no?
Keep in mind that turning off this flag may corrupt on-disk state in the
event of power loss, etc.  What was the delta in the local case?  17
seconds better than UFS, but percentage wise how much faster than the
original?

NFS has the property that it does an enormous amount of synchronous
activity, which can tickle interesting pathologies.  But it''s strange
that it didn''t help NFS that much.
> Also, how would one easily script the mdb command below to make
> permanent?
Unfortunately, there''s no good way to do this, since the
''::vdev''
command isn''t pipeable, and you''d have to use
''::map'' with hardcoded
offsets that would be extremely brittle in the face of upgrades.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Joe Little

2006-May-05 23:10 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On 5/5/06, Eric Schrock <eric.schrock at sun.com>
wrote:> On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little wrote:
> > Thanks for the tip. In the local case, I could send to the
> > iSCSI-backed ZFS RAIDZ at even faster rates, with a total elapsed time
> > of 50seconds (17 seconds better than UFS). However, I didn''t
even both
> > finishing the NFS client test, since it was taking a few seconds
> > between multiple 27K files. So, it didn''t help NFS at all.
I''m
> > wondering if there is something on the NFS end that needs changing,
> > no?
>
> Keep in mind that turning off this flag may corrupt on-disk state in the
> event of power loss, etc.  What was the delta in the local case?  17
> seconds better than UFS, but percentage wise how much faster than the
> original?
>
I believe it was only about 5-10% faster. I don''t have the time
results off hand, just some dtrace latency reports.
> NFS has the property that it does an enormous amount of synchronous
> activity, which can tickle interesting pathologies.  But it''s
strange
> that it didn''t help NFS that much.
Should I also mount via async.. would this be honored on the Solaris
end? The other option mentioned with similar caveats was nocto. I just
tried with both, and the observed transfer rate was about 1.4k/s. It
was painful deleting the 3G directory via NFS, with about 100k/s
deletion rate on these 1000 files. Of course, When I went locally the
delete was instantaneous.

I''ll give the complete results of a minute of this copy in dtrace
results:

NFS3 op counts
=============RFS3_SYMLINK                      5
RFS3_MKDIR                       27
RFS3_COMMIT                      45
RFS3_RENAME                      46
RFS3_CREATE                      47
RFS3_ACCESS                     126
RFS3_WRITE                      143
RFS3_SETATTR                    229
RFS3_GETATTR                    536
RFS3_LOOKUP                    1217


NFS3 op avg response time (usec)
===============================RFS3_LOOKUP                      12
RFS3_ACCESS                      13
RFS3_GETATTR                     15
RFS3_WRITE                       42
RFS3_RENAME                  190052
RFS3_CREATE                  199628
RFS3_COMMIT                  244217
RFS3_SETATTR                 256737
RFS3_MKDIR                   258186
RFS3_SYMLINK                1150959


NFS3 op avg system time (usec)
=============================RFS3_LOOKUP                      10
RFS3_ACCESS                      11
RFS3_GETATTR                     12
RFS3_WRITE                       40
RFS3_SETATTR                     76
RFS3_COMMIT                     100
RFS3_RENAME                     101
RFS3_CREATE                     104
RFS3_SYMLINK                    105
RFS3_MKDIR                      107


NFS3 op quantized response time (usec)
=====================================RFS3_ACCESS
           value  ------------- Distribution ------------- count
               4 |                                         0
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       106
              16 |@@@@@@                                   20
              32 |                                         0

RFS3_WRITE
           value  ------------- Distribution ------------- count
               8 |                                         0
              16 |@@@@@@@@@                                33
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              95
              64 |@@@@                                     15
             128 |                                         0

RFS3_GETATTR
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |                                         1
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@               347
              16 |@@@@@@@@@@@@@                            176
              32 |@                                        11
              64 |                                         0
             128 |                                         1
             256 |                                         0

RFS3_LOOKUP
           value  ------------- Distribution ------------- count
               4 |                                         0
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      1060
              16 |@@@@                                     133
              32 |@                                        24
              64 |                                         0

RFS3_SYMLINK
           value  ------------- Distribution ------------- count
           32768 |                                         0
           65536 |@@@@@@@@                                 1
          131072 |@@@@@@@@                                 1
          262144 |                                         0
          524288 |                                         0
         1048576 |@@@@@@@@                                 1
         2097152 |@@@@@@@@@@@@@@@@                         2
         4194304 |                                         0

RFS3_MKDIR
           value  ------------- Distribution ------------- count
           16384 |                                         0
           32768 |@@@@                                     3
           65536 |@@@@@@@@@@@@@@@                          10
          131072 |@@@@@@@@@@@@@@@                          10
          262144 |@                                        1
          524288 |                                         0
         1048576 |@@@@                                     3
         2097152 |                                         0

RFS3_CREATE
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@                                       2
            1024 |                                         0
            2048 |@                                        1
            4096 |                                         0
            8192 |                                         0
           16384 |@                                        1
           32768 |@@@@@@@@@@@@@@@@@@@@                     23
           65536 |@@@@@@@@@@@@                             14
          131072 |@                                        1
          262144 |                                         0
          524288 |@                                        1
         1048576 |@@@                                      4
         2097152 |                                         0

RFS3_RENAME
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@                                        1
            1024 |@@                                       2
            2048 |                                         0
            4096 |                                         0
            8192 |                                         0
           16384 |                                         0
           32768 |@@@@@@@@@@@@@@@@@                        20
           65536 |@@@@@@@@@@                               12
          131072 |@@@@@                                    6
          262144 |                                         0
          524288 |@                                        1
         1048576 |@@@                                      4
         2097152 |                                         0

RFS3_COMMIT
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@                                        1
            1024 |@                                        1
            2048 |                                         0
            4096 |                                         0
            8192 |                                         0
           16384 |@                                        1
           32768 |@@@@@@@@@@@@@@@@@                        19
           65536 |@@@@@@@@@@@                              12
          131072 |@@@@                                     4
          262144 |                                         0
          524288 |                                         0
         1048576 |@@@@@@                                   7
         2097152 |                                         0

RFS3_SETATTR
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@                                        4
            1024 |@                                        6
            2048 |                                         0
            4096 |                                         0
            8192 |                                         0
           16384 |@                                        4
           32768 |@@@@@@@@@@@@@@@@                         89
           65536 |@@@@@@@@@                                53
          131072 |@@@@@@                                   34
          262144 |@                                        4
          524288 |@                                        3
         1048576 |@@@@@                                    30
         2097152 |                                         1
         4194304 |                                         0



NFS3 op quantized system time (usec)
===================================RFS3_SYMLINK
           value  ------------- Distribution ------------- count
              32 |                                         0
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5
             128 |                                         0

RFS3_ACCESS
           value  ------------- Distribution ------------- count
               4 |                                         0
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     112
              16 |@@@@                                     14
              32 |                                         0

RFS3_MKDIR
           value  ------------- Distribution ------------- count
              32 |                                         0
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     24
             128 |@@@@                                     3
             256 |                                         0

RFS3_RENAME
           value  ------------- Distribution ------------- count
              32 |                                         0
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     41
             128 |@@@@                                     5
             256 |                                         0

RFS3_COMMIT
           value  ------------- Distribution ------------- count
              16 |                                         0
              32 |@@@@@@@@@@@@@@@@                         18
              64 |@@@@@@@@@                                10
             128 |@@@@@@@@@@@@@@@                          17
             256 |                                         0

RFS3_CREATE
           value  ------------- Distribution ------------- count
              32 |                                         0
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      41
             128 |@@@@                                     5
             256 |@                                        1
             512 |                                         0

RFS3_WRITE
           value  ------------- Distribution ------------- count
               8 |                                         0
              16 |@@@@@@@@@@@@@@                           50
              32 |@@@@@@@@@@@@@@@@@@@@@@@                  84
              64 |@@@                                      9
             128 |                                         0

RFS3_GETATTR
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@                                126
               8 |@@@@@@@@@@@@@@@@@@@                      253
              16 |@@@@@@@@@@@                              148
              32 |@                                        9
              64 |                                         0

RFS3_LOOKUP
           value  ------------- Distribution ------------- count
               4 |                                         0
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     1104
              16 |@@@                                      95
              32 |@                                        18
              64 |                                         0

RFS3_SETATTR
           value  ------------- Distribution ------------- count
              16 |                                         0
              32 |@@@@@@@@@@@@@@                           81
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@                143
             128 |@                                        4
             256 |                                         0



>
> > Also, how would one easily script the mdb command below to make
> > permanent?
>
> Unfortunately, there''s no good way to do this, since the
''::vdev''
> command isn''t pipeable, and you''d have to use
''::map'' with hardcoded
> offsets that would be extremely brittle in the face of upgrades.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
>

Joe Little

2006-May-05 23:14 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

And of course, just to circle back, an rsync via ssh from the client
to the Solaris ZFS/iscsi server came in at 17.5MB/sec, taking 1minute
16 seconds, or about 20% longer. So, NFS (over TCP) is 1.4k/s, and
encrypted ssh is 17.5MB/sec following the same network path.

On 5/5/06, Joe Little <jmlittle at gmail.com>
wrote:> On 5/5/06, Eric Schrock <eric.schrock at sun.com> wrote:
> > On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little wrote:
> > > Thanks for the tip. In the local case, I could send to the
> > > iSCSI-backed ZFS RAIDZ at even faster rates, with a total elapsed
time
> > > of 50seconds (17 seconds better than UFS). However, I
didn''t even both
> > > finishing the NFS client test, since it was taking a few seconds
> > > between multiple 27K files. So, it didn''t help NFS at
all. I''m
> > > wondering if there is something on the NFS end that needs
changing,
> > > no?
> >
> > Keep in mind that turning off this flag may corrupt on-disk state in
the
> > event of power loss, etc.  What was the delta in the local case?  17
> > seconds better than UFS, but percentage wise how much faster than the
> > original?
> >
>
> I believe it was only about 5-10% faster. I don''t have the time
> results off hand, just some dtrace latency reports.
>
> > NFS has the property that it does an enormous amount of synchronous
> > activity, which can tickle interesting pathologies.  But it''s
strange
> > that it didn''t help NFS that much.
>
> Should I also mount via async.. would this be honored on the Solaris
> end? The other option mentioned with similar caveats was nocto. I just
> tried with both, and the observed transfer rate was about 1.4k/s. It
> was painful deleting the 3G directory via NFS, with about 100k/s
> deletion rate on these 1000 files. Of course, When I went locally the
> delete was instantaneous.
>
> I''ll give the complete results of a minute of this copy in dtrace
results:
>
> NFS3 op counts
> =============> RFS3_SYMLINK                      5
> RFS3_MKDIR                       27
> RFS3_COMMIT                      45
> RFS3_RENAME                      46
> RFS3_CREATE                      47
> RFS3_ACCESS                     126
> RFS3_WRITE                      143
> RFS3_SETATTR                    229
> RFS3_GETATTR                    536
> RFS3_LOOKUP                    1217
>
>
> NFS3 op avg response time (usec)
> ===============================> RFS3_LOOKUP                      12
> RFS3_ACCESS                      13
> RFS3_GETATTR                     15
> RFS3_WRITE                       42
> RFS3_RENAME                  190052
> RFS3_CREATE                  199628
> RFS3_COMMIT                  244217
> RFS3_SETATTR                 256737
> RFS3_MKDIR                   258186
> RFS3_SYMLINK                1150959
>
>
> NFS3 op avg system time (usec)
> =============================> RFS3_LOOKUP                      10
> RFS3_ACCESS                      11
> RFS3_GETATTR                     12
> RFS3_WRITE                       40
> RFS3_SETATTR                     76
> RFS3_COMMIT                     100
> RFS3_RENAME                     101
> RFS3_CREATE                     104
> RFS3_SYMLINK                    105
> RFS3_MKDIR                      107
>
>
> NFS3 op quantized response time (usec)
> =====================================> RFS3_ACCESS
>            value  ------------- Distribution ------------- count
>                4 |                                         0
>                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       106
>               16 |@@@@@@                                   20
>               32 |                                         0
>
> RFS3_WRITE
>            value  ------------- Distribution ------------- count
>                8 |                                         0
>               16 |@@@@@@@@@                                33
>               32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              95
>               64 |@@@@                                     15
>              128 |                                         0
>
> RFS3_GETATTR
>            value  ------------- Distribution ------------- count
>                2 |                                         0
>                4 |                                         1
>                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@               347
>               16 |@@@@@@@@@@@@@                            176
>               32 |@                                        11
>               64 |                                         0
>              128 |                                         1
>              256 |                                         0
>
> RFS3_LOOKUP
>            value  ------------- Distribution ------------- count
>                4 |                                         0
>                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      1060
>               16 |@@@@                                     133
>               32 |@                                        24
>               64 |                                         0
>
> RFS3_SYMLINK
>            value  ------------- Distribution ------------- count
>            32768 |                                         0
>            65536 |@@@@@@@@                                 1
>           131072 |@@@@@@@@                                 1
>           262144 |                                         0
>           524288 |                                         0
>          1048576 |@@@@@@@@                                 1
>          2097152 |@@@@@@@@@@@@@@@@                         2
>          4194304 |                                         0
>
> RFS3_MKDIR
>            value  ------------- Distribution ------------- count
>            16384 |                                         0
>            32768 |@@@@                                     3
>            65536 |@@@@@@@@@@@@@@@                          10
>           131072 |@@@@@@@@@@@@@@@                          10
>           262144 |@                                        1
>           524288 |                                         0
>          1048576 |@@@@                                     3
>          2097152 |                                         0
>
> RFS3_CREATE
>            value  ------------- Distribution ------------- count
>              256 |                                         0
>              512 |@@                                       2
>             1024 |                                         0
>             2048 |@                                        1
>             4096 |                                         0
>             8192 |                                         0
>            16384 |@                                        1
>            32768 |@@@@@@@@@@@@@@@@@@@@                     23
>            65536 |@@@@@@@@@@@@                             14
>           131072 |@                                        1
>           262144 |                                         0
>           524288 |@                                        1
>          1048576 |@@@                                      4
>          2097152 |                                         0
>
> RFS3_RENAME
>            value  ------------- Distribution ------------- count
>              256 |                                         0
>              512 |@                                        1
>             1024 |@@                                       2
>             2048 |                                         0
>             4096 |                                         0
>             8192 |                                         0
>            16384 |                                         0
>            32768 |@@@@@@@@@@@@@@@@@                        20
>            65536 |@@@@@@@@@@                               12
>           131072 |@@@@@                                    6
>           262144 |                                         0
>           524288 |@                                        1
>          1048576 |@@@                                      4
>          2097152 |                                         0
>
> RFS3_COMMIT
>            value  ------------- Distribution ------------- count
>              256 |                                         0
>              512 |@                                        1
>             1024 |@                                        1
>             2048 |                                         0
>             4096 |                                         0
>             8192 |                                         0
>            16384 |@                                        1
>            32768 |@@@@@@@@@@@@@@@@@                        19
>            65536 |@@@@@@@@@@@                              12
>           131072 |@@@@                                     4
>           262144 |                                         0
>           524288 |                                         0
>          1048576 |@@@@@@                                   7
>          2097152 |                                         0
>
> RFS3_SETATTR
>            value  ------------- Distribution ------------- count
>              256 |                                         0
>              512 |@                                        4
>             1024 |@                                        6
>             2048 |                                         0
>             4096 |                                         0
>             8192 |                                         0
>            16384 |@                                        4
>            32768 |@@@@@@@@@@@@@@@@                         89
>            65536 |@@@@@@@@@                                53
>           131072 |@@@@@@                                   34
>           262144 |@                                        4
>           524288 |@                                        3
>          1048576 |@@@@@                                    30
>          2097152 |                                         1
>          4194304 |                                         0
>
>
>
> NFS3 op quantized system time (usec)
> ===================================> RFS3_SYMLINK
>            value  ------------- Distribution ------------- count
>               32 |                                         0
>               64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5
>              128 |                                         0
>
> RFS3_ACCESS
>            value  ------------- Distribution ------------- count
>                4 |                                         0
>                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     112
>               16 |@@@@                                     14
>               32 |                                         0
>
> RFS3_MKDIR
>            value  ------------- Distribution ------------- count
>               32 |                                         0
>               64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     24
>              128 |@@@@                                     3
>              256 |                                         0
>
> RFS3_RENAME
>            value  ------------- Distribution ------------- count
>               32 |                                         0
>               64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     41
>              128 |@@@@                                     5
>              256 |                                         0
>
> RFS3_COMMIT
>            value  ------------- Distribution ------------- count
>               16 |                                         0
>               32 |@@@@@@@@@@@@@@@@                         18
>               64 |@@@@@@@@@                                10
>              128 |@@@@@@@@@@@@@@@                          17
>              256 |                                         0
>
> RFS3_CREATE
>            value  ------------- Distribution ------------- count
>               32 |                                         0
>               64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      41
>              128 |@@@@                                     5
>              256 |@                                        1
>              512 |                                         0
>
> RFS3_WRITE
>            value  ------------- Distribution ------------- count
>                8 |                                         0
>               16 |@@@@@@@@@@@@@@                           50
>               32 |@@@@@@@@@@@@@@@@@@@@@@@                  84
>               64 |@@@                                      9
>              128 |                                         0
>
> RFS3_GETATTR
>            value  ------------- Distribution ------------- count
>                2 |                                         0
>                4 |@@@@@@@@@                                126
>                8 |@@@@@@@@@@@@@@@@@@@                      253
>               16 |@@@@@@@@@@@                              148
>               32 |@                                        9
>               64 |                                         0
>
> RFS3_LOOKUP
>            value  ------------- Distribution ------------- count
>                4 |                                         0
>                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     1104
>               16 |@@@                                      95
>               32 |@                                        18
>               64 |                                         0
>
> RFS3_SETATTR
>            value  ------------- Distribution ------------- count
>               16 |                                         0
>               32 |@@@@@@@@@@@@@@                           81
>               64 |@@@@@@@@@@@@@@@@@@@@@@@@@                143
>              128 |@                                        4
>              256 |                                         0
>
>
>
>
> >
> > > Also, how would one easily script the mdb command below to make
> > > permanent?
> >
> > Unfortunately, there''s no good way to do this, since the
''::vdev''
> > command isn''t pipeable, and you''d have to use
''::map'' with hardcoded
> > offsets that would be extremely brittle in the face of upgrades.
> >
> > - Eric
> >
> > --
> > Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
> >
>

Spencer Shepler

2006-May-05 23:23 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, Joe Little wrote:> On 5/5/06, Eric Schrock <eric.schrock at sun.com> wrote:
> >On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little wrote:
> >> Thanks for the tip. In the local case, I could send to the
> >> iSCSI-backed ZFS RAIDZ at even faster rates, with a total elapsed
time
> >> of 50seconds (17 seconds better than UFS). However, I
didn''t even both
> >> finishing the NFS client test, since it was taking a few seconds
> >> between multiple 27K files. So, it didn''t help NFS at
all. I''m
> >> wondering if there is something on the NFS end that needs
changing,
> >> no?
> >
> >Keep in mind that turning off this flag may corrupt on-disk state in
the
> >event of power loss, etc.  What was the delta in the local case?  17
> >seconds better than UFS, but percentage wise how much faster than the
> >original?
> >
> 
> I believe it was only about 5-10% faster. I don''t have the time
> results off hand, just some dtrace latency reports.
> 
> >NFS has the property that it does an enormous amount of synchronous
> >activity, which can tickle interesting pathologies.  But it''s
strange
> >that it didn''t help NFS that much.
> 
> Should I also mount via async.. would this be honored on the Solaris
> end? The other option mentioned with similar caveats was nocto. I just
> tried with both, and the observed transfer rate was about 1.4k/s. It
> was painful deleting the 3G directory via NFS, with about 100k/s
> deletion rate on these 1000 files. Of course, When I went locally the
> delete was instantaneous.
I wouldn''t change any of the options at the client.  The issue
is at the server side and none of the other combinations that you
originally pointed out have this problem, right?  Mount options at the
client will just muddy the waters.

We need to understand if/what the NFS/ZFS/iscsi interaction is and why
it is so much worse.  As Eric mentioned, there may be some interesting
pathologies at play here and we need to understand what they are so
they can be addressed.

My suggestion is additional dtrace data collection but I don''t have 
a specific suggestion as to how/what to track next.  
Because of the significant additional latency, I would be looking for
big increases in the number of I/Os being generated to the iscsi backend
as compared to the local attached case.  I would also look for 
some type of serialization of I/Os that is occurring with iscsi vs.
the local attach.

Spencer

Joe Little

2006-May-06 00:59 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Are there known i/o or iscsi dtrace scripts available?

On 5/5/06, Spencer Shepler <spencer.shepler at sun.com>
wrote:> On Fri, Joe Little wrote:
> > On 5/5/06, Eric Schrock <eric.schrock at sun.com> wrote:
> > >On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little wrote:
> > >> Thanks for the tip. In the local case, I could send to the
> > >> iSCSI-backed ZFS RAIDZ at even faster rates, with a total
elapsed time
> > >> of 50seconds (17 seconds better than UFS). However, I
didn''t even both
> > >> finishing the NFS client test, since it was taking a few
seconds
> > >> between multiple 27K files. So, it didn''t help NFS
at all. I''m
> > >> wondering if there is something on the NFS end that needs
changing,
> > >> no?
> > >
> > >Keep in mind that turning off this flag may corrupt on-disk state
in the
> > >event of power loss, etc.  What was the delta in the local case? 
17
> > >seconds better than UFS, but percentage wise how much faster than
the
> > >original?
> > >
> >
> > I believe it was only about 5-10% faster. I don''t have the
time
> > results off hand, just some dtrace latency reports.
> >
> > >NFS has the property that it does an enormous amount of
synchronous
> > >activity, which can tickle interesting pathologies.  But
it''s strange
> > >that it didn''t help NFS that much.
> >
> > Should I also mount via async.. would this be honored on the Solaris
> > end? The other option mentioned with similar caveats was nocto. I just
> > tried with both, and the observed transfer rate was about 1.4k/s. It
> > was painful deleting the 3G directory via NFS, with about 100k/s
> > deletion rate on these 1000 files. Of course, When I went locally the
> > delete was instantaneous.
>
> I wouldn''t change any of the options at the client.  The issue
> is at the server side and none of the other combinations that you
> originally pointed out have this problem, right?  Mount options at the
> client will just muddy the waters.
>
> We need to understand if/what the NFS/ZFS/iscsi interaction is and why
> it is so much worse.  As Eric mentioned, there may be some interesting
> pathologies at play here and we need to understand what they are so
> they can be addressed.
>
> My suggestion is additional dtrace data collection but I don''t
have
> a specific suggestion as to how/what to track next.
> Because of the significant additional latency, I would be looking for
> big increases in the number of I/Os being generated to the iscsi backend
> as compared to the local attached case.  I would also look for
> some type of serialization of I/Os that is occurring with iscsi vs.
> the local attach.
>
> Spencer
>

Lisa Week

2006-May-06 01:43 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

These may help:

http://opensolaris.org/os/community/dtrace/scripts/
    Check out iosnoop.d

http://www.solarisinternals.com/si/dtrace/index.php
    Check out iotrace.d

- Lisa

Joe Little wrote On 05/05/06 18:59,:
> Are there known i/o or iscsi dtrace scripts available?
>
> On 5/5/06, Spencer Shepler <spencer.shepler at sun.com> wrote:
>
>> On Fri, Joe Little wrote:
>> > On 5/5/06, Eric Schrock <eric.schrock at sun.com> wrote:
>> > >On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little wrote:
>> > >> Thanks for the tip. In the local case, I could send to
the
>> > >> iSCSI-backed ZFS RAIDZ at even faster rates, with a total
>> elapsed time
>> > >> of 50seconds (17 seconds better than UFS). However, I
didn''t
>> even both
>> > >> finishing the NFS client test, since it was taking a few
seconds
>> > >> between multiple 27K files. So, it didn''t help
NFS at all. I''m
>> > >> wondering if there is something on the NFS end that needs
changing,
>> > >> no?
>> > >
>> > >Keep in mind that turning off this flag may corrupt on-disk
state
>> in the
>> > >event of power loss, etc.  What was the delta in the local
case?  17
>> > >seconds better than UFS, but percentage wise how much faster
than the
>> > >original?
>> > >
>> >
>> > I believe it was only about 5-10% faster. I don''t have
the time
>> > results off hand, just some dtrace latency reports.
>> >
>> > >NFS has the property that it does an enormous amount of
synchronous
>> > >activity, which can tickle interesting pathologies.  But
it''s strange
>> > >that it didn''t help NFS that much.
>> >
>> > Should I also mount via async.. would this be honored on the
Solaris
>> > end? The other option mentioned with similar caveats was nocto. I
just
>> > tried with both, and the observed transfer rate was about 1.4k/s.
It
>> > was painful deleting the 3G directory via NFS, with about 100k/s
>> > deletion rate on these 1000 files. Of course, When I went locally
the
>> > delete was instantaneous.
>>
>> I wouldn''t change any of the options at the client.  The issue
>> is at the server side and none of the other combinations that you
>> originally pointed out have this problem, right?  Mount options at the
>> client will just muddy the waters.
>>
>> We need to understand if/what the NFS/ZFS/iscsi interaction is and why
>> it is so much worse.  As Eric mentioned, there may be some interesting
>> pathologies at play here and we need to understand what they are so
>> they can be addressed.
>>
>> My suggestion is additional dtrace data collection but I don''t
have
>> a specific suggestion as to how/what to track next.
>> Because of the significant additional latency, I would be looking for
>> big increases in the number of I/Os being generated to the iscsi
backend
>> as compared to the local attached case.  I would also look for
>> some type of serialization of I/Os that is occurring with iscsi vs.
>> the local attach.
>>
>> Spencer
>>
> _______________________________________________
> nfs-discuss mailing list
> nfs-discuss at opensolaris.org

Joe Little

2006-May-06 04:20 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Thanks. I''m playing with it now, trying to get the most succinct test.
This is one thing that bothers me: Regardless of the backend, it
appears that a delete of a large tree (say the linux kernel) over NFS
takes forever, but its immediate when doing so locally. Is delete over
NFS really take such a different code path?


On 5/5/06, Lisa Week <Lisa.Week at sun.com> wrote:> These may help:
>
> http://opensolaris.org/os/community/dtrace/scripts/
>     Check out iosnoop.d
>
> http://www.solarisinternals.com/si/dtrace/index.php
>     Check out iotrace.d
>
> - Lisa
>
> Joe Little wrote On 05/05/06 18:59,:
>
> > Are there known i/o or iscsi dtrace scripts available?
> >
> > On 5/5/06, Spencer Shepler <spencer.shepler at sun.com> wrote:
> >
> >> On Fri, Joe Little wrote:
> >> > On 5/5/06, Eric Schrock <eric.schrock at sun.com>
wrote:
> >> > >On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little
wrote:
> >> > >> Thanks for the tip. In the local case, I could send
to the
> >> > >> iSCSI-backed ZFS RAIDZ at even faster rates, with a
total
> >> elapsed time
> >> > >> of 50seconds (17 seconds better than UFS). However,
I didn''t
> >> even both
> >> > >> finishing the NFS client test, since it was taking a
few seconds
> >> > >> between multiple 27K files. So, it didn''t
help NFS at all. I''m
> >> > >> wondering if there is something on the NFS end that
needs changing,
> >> > >> no?
> >> > >
> >> > >Keep in mind that turning off this flag may corrupt
on-disk state
> >> in the
> >> > >event of power loss, etc.  What was the delta in the
local case?  17
> >> > >seconds better than UFS, but percentage wise how much
faster than the
> >> > >original?
> >> > >
> >> >
> >> > I believe it was only about 5-10% faster. I don''t
have the time
> >> > results off hand, just some dtrace latency reports.
> >> >
> >> > >NFS has the property that it does an enormous amount of
synchronous
> >> > >activity, which can tickle interesting pathologies.  But
it''s strange
> >> > >that it didn''t help NFS that much.
> >> >
> >> > Should I also mount via async.. would this be honored on the
Solaris
> >> > end? The other option mentioned with similar caveats was
nocto. I just
> >> > tried with both, and the observed transfer rate was about
1.4k/s. It
> >> > was painful deleting the 3G directory via NFS, with about
100k/s
> >> > deletion rate on these 1000 files. Of course, When I went
locally the
> >> > delete was instantaneous.
> >>
> >> I wouldn''t change any of the options at the client.  The
issue
> >> is at the server side and none of the other combinations that you
> >> originally pointed out have this problem, right?  Mount options at
the
> >> client will just muddy the waters.
> >>
> >> We need to understand if/what the NFS/ZFS/iscsi interaction is and
why
> >> it is so much worse.  As Eric mentioned, there may be some
interesting
> >> pathologies at play here and we need to understand what they are
so
> >> they can be addressed.
> >>
> >> My suggestion is additional dtrace data collection but I
don''t have
> >> a specific suggestion as to how/what to track next.
> >> Because of the significant additional latency, I would be looking
for
> >> big increases in the number of I/Os being generated to the iscsi
backend
> >> as compared to the local attached case.  I would also look for
> >> some type of serialization of I/Os that is occurring with iscsi
vs.
> >> the local attach.
> >>
> >> Spencer
> >>
> > _______________________________________________
> > nfs-discuss mailing list
> > nfs-discuss at opensolaris.org
>
>

Joe Little

2006-May-06 04:48 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Here''s some sample output. Where the I write over NFS to ZFS (no
iscsi) I get high sizes for i/o:

  UID   PID D    BLOCK   SIZE       COMM PATHNAME
    1   427 W 22416320   4096       nfsd <none>
    1   427 W 22416328   4096       nfsd <none>
    1   427 W 22416336   4096       nfsd <none>
    1   427 W 22416344   4096       nfsd <none>
    1   427 W 22474784  16384       nfsd <none>
    1   427 W 22474816  16384       nfsd <none>
    1   427 W 22474848  16384       nfsd <none>
    1   427 W 22416352   4096       nfsd <none>
    1   427 W 22416360   4096       nfsd <none>
    1   427 W 22416368   4096       nfsd <none>
    1   427 W 22416376   4096       nfsd <none>
    1   427 W 22416384   4096       nfsd <none>
    1   427 W 22416392   4096       nfsd <none>
    1   427 W 22416400   4096       nfsd <none>
    1   427 W 22416408   4096       nfsd <none>
    1   427 W 22416416   4096       nfsd <none>
    1   427 W 22416424   4096       nfsd <none>
    1   427 W 22416432   4096       nfsd <none>
    1   427 W 22416440   4096       nfsd <none>
    1   427 W 22416448   4096       nfsd <none>
    1   427 W 22416456   4096       nfsd <none>
    1   427 W 22039040   8192       nfsd <none>
    1   427 W 22039056   8192       nfsd <none>
    1   427 W 22039072   8192       nfsd <none>
    1   427 W 22416464   4096       nfsd <none>
    1   427 W 22416472   4096       nfsd <none>
    1   427 W 22416480   4096       nfsd <none>
    1   427 W 22416488   4096       nfsd <none>
    1   427 W 22416496   4096       nfsd <none>
    1   427 W 22416504   4096       nfsd <none>
    1   427 W 22416512   4096       nfsd <none>
    1   427 W 22416520   4096       nfsd <none>
    1   427 W 22416528   4096       nfsd <none>
    1   427 W 22416536   4096       nfsd <none>
    1   427 W 22416544   4096       nfsd <none>
    1   427 W 22416552   4096       nfsd <none>
    1   427 W 22416560   4096       nfsd <none>
    1   427 W 22416568   4096       nfsd <none>
    1   427 W 22416576   4096       nfsd <none>
    1   427 W 22416584   4096       nfsd <none>
    1   427 W 22416592   4096       nfsd <none>
    1   427 W 22416600   4096       nfsd <none>
    1   427 W 22416608   4096       nfsd <none>
    1   427 W 22416616   4096       nfsd <none>
    1   427 W 22416624   4096       nfsd <none>
    1   427 W 22416632   4096       nfsd <none>
    1   427 W 22416640  12288       nfsd <none>
    1   427 W 22416664  12288       nfsd <none>
    1   427 W 22416688  12288       nfsd <none>
    1   427 W 22416712   4096       nfsd <none>
    1   427 W 22416720   4096       nfsd <none>
    1   427 W 22416728   4096       nfsd <none>
    1   427 W 22416736   4096       nfsd <none>
    1   427 W 22416744   4096       nfsd <none>
    1   427 W 22416752   4096       nfsd <none>
    1   427 W 22416760  36864       nfsd <none>
    0     0 W 22416832  53248      sched <none>
    1   427 W 22416936  53248       nfsd <none>
    1   427 W 22417040  53248       nfsd <none>
    1   427 W 22417144   4096       nfsd <none>
    1   427 W 22417152   4096       nfsd <none>
    1   427 W 22417160   4096       nfsd <none>
    1   427 W 22417168   4096       nfsd <none>
    1   427 W 22417176   4096       nfsd <none>
    1   427 W 22417184   4096       nfsd <none>
    1   427 W 22417192  12288       nfsd <none>
    1   427 W 22417216  12288       nfsd <none>
    1   427 W 22417240  12288       nfsd <none>
    1   427 W 22417264   4096       nfsd <none>
    1   427 W 22417272   4096       nfsd <none>
    1   427 W 22417280   4096       nfsd <none>

In the iscsi-backed case, I get:
   1   427 W 369124510    512       nfsd <none>
    1   427 W 369124510    512       nfsd <none>
    1   427 W 369124510    512       nfsd <none>
    1   427 W 369124510    512       nfsd <none>
    1   427 W 369124510   1024       nfsd <none>
    1   427 W 369124564   1024       nfsd <none>
    1   427 W 369124564    512       nfsd <none>
    1   427 W 369124564    512       nfsd <none>
    1   427 W 369124564    512       nfsd <none>
    1   427 W 369124564    512       nfsd <none>
    1   427 W 369124564    512       nfsd <none>
    1   427 W 369124564    512       nfsd <none>
    1   427 W 369124564   1024       nfsd <none>
    1   427 W 369124565   1024       nfsd <none>
    1   427 W 369124566    512       nfsd <none>
    1   427 W 369124566    512       nfsd <none>
    1   427 W 369124565    512       nfsd <none>
    1   427 W 369124565    512       nfsd <none>
    1   427 W 369124565    512       nfsd <none>
    1   427 W 369124565    512       nfsd <none>
    1   427 W 369124565   1024       nfsd <none>
    1   427 W 369124566   1024       nfsd <none>
    1   427 W 369124567    512       nfsd <none>
    1   427 W 369124567    512       nfsd <none>
    1   427 W 369124567    512       nfsd <none>
    1   427 W 369124567    512       nfsd <none>
    1   427 W 369124566    512       nfsd <none>
    1   427 W 369124566    512       nfsd <none>
    1   427 W 369124566   1024       nfsd <none>
    1   427 W 369124567   1024       nfsd <none>
    1   427 W 369124568    512       nfsd <none>
    1   427 W 369124568    512       nfsd <none>
    1   427 W 369124568    512       nfsd <none>
    1   427 W 369124568    512       nfsd <none>
    1   427 W 369124568    512       nfsd <none>
    1   427 W 369124568    512       nfsd <none>
    1   427 W 369124567   1024       nfsd <none>
    1   427 W 369124569   1024       nfsd <none>
    1   427 W 369124569    512       nfsd <none>

Looks to me the bulk of my problem is poor block size scheduling. Is
this tuneable for either ZFS or NFS and/or can be set?

On 5/5/06, Lisa Week <Lisa.Week at sun.com> wrote:> These may help:
>
> http://opensolaris.org/os/community/dtrace/scripts/
>     Check out iosnoop.d
>
> http://www.solarisinternals.com/si/dtrace/index.php
>     Check out iotrace.d
>
> - Lisa
>
> Joe Little wrote On 05/05/06 18:59,:
>
> > Are there known i/o or iscsi dtrace scripts available?
> >
> > On 5/5/06, Spencer Shepler <spencer.shepler at sun.com> wrote:
> >
> >> On Fri, Joe Little wrote:
> >> > On 5/5/06, Eric Schrock <eric.schrock at sun.com>
wrote:
> >> > >On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little
wrote:
> >> > >> Thanks for the tip. In the local case, I could send
to the
> >> > >> iSCSI-backed ZFS RAIDZ at even faster rates, with a
total
> >> elapsed time
> >> > >> of 50seconds (17 seconds better than UFS). However,
I didn''t
> >> even both
> >> > >> finishing the NFS client test, since it was taking a
few seconds
> >> > >> between multiple 27K files. So, it didn''t
help NFS at all. I''m
> >> > >> wondering if there is something on the NFS end that
needs changing,
> >> > >> no?
> >> > >
> >> > >Keep in mind that turning off this flag may corrupt
on-disk state
> >> in the
> >> > >event of power loss, etc.  What was the delta in the
local case?  17
> >> > >seconds better than UFS, but percentage wise how much
faster than the
> >> > >original?
> >> > >
> >> >
> >> > I believe it was only about 5-10% faster. I don''t
have the time
> >> > results off hand, just some dtrace latency reports.
> >> >
> >> > >NFS has the property that it does an enormous amount of
synchronous
> >> > >activity, which can tickle interesting pathologies.  But
it''s strange
> >> > >that it didn''t help NFS that much.
> >> >
> >> > Should I also mount via async.. would this be honored on the
Solaris
> >> > end? The other option mentioned with similar caveats was
nocto. I just
> >> > tried with both, and the observed transfer rate was about
1.4k/s. It
> >> > was painful deleting the 3G directory via NFS, with about
100k/s
> >> > deletion rate on these 1000 files. Of course, When I went
locally the
> >> > delete was instantaneous.
> >>
> >> I wouldn''t change any of the options at the client.  The
issue
> >> is at the server side and none of the other combinations that you
> >> originally pointed out have this problem, right?  Mount options at
the
> >> client will just muddy the waters.
> >>
> >> We need to understand if/what the NFS/ZFS/iscsi interaction is and
why
> >> it is so much worse.  As Eric mentioned, there may be some
interesting
> >> pathologies at play here and we need to understand what they are
so
> >> they can be addressed.
> >>
> >> My suggestion is additional dtrace data collection but I
don''t have
> >> a specific suggestion as to how/what to track next.
> >> Because of the significant additional latency, I would be looking
for
> >> big increases in the number of I/Os being generated to the iscsi
backend
> >> as compared to the local attached case.  I would also look for
> >> some type of serialization of I/Os that is occurring with iscsi
vs.
> >> the local attach.
> >>
> >> Spencer
> >>
> > _______________________________________________
> > nfs-discuss mailing list
> > nfs-discuss at opensolaris.org
>
>

Spencer Shepler

2006-May-06 04:55 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, Joe Little wrote:> Thanks. I''m playing with it now, trying to get the most succinct
test.
> This is one thing that bothers me: Regardless of the backend, it
> appears that a delete of a large tree (say the linux kernel) over NFS
> takes forever, but its immediate when doing so locally. Is delete over
> NFS really take such a different code path?
Yes.  As mentioned in my other email, the NFS protocol requires
that operations like REMOVE, RMDIR, CREATE have the filesystem
metadata written to stable storage/disk before sending a response
to the client.  That is not required of local access and therefore
the disparity between the two.

Spencer

Nicolas Williams

2006-May-07 04:53 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, May 05, 2006 at 09:48:00PM -0700, Joe Little
wrote:> Here''s some sample output. Where the I write over NFS to ZFS (no
> iscsi) I get high sizes for i/o:
> 
> In the iscsi-backed case, I get:
[small block sizes]
> Looks to me the bulk of my problem is poor block size scheduling. Is
> this tuneable for either ZFS or NFS and/or can be set?
Looking at iscsiadm(1M) (and knowing little about iSCSI) I wonder what
the maxburstlength/maxrecvdataseglen parameters are set to...

Nico
--

Joe Little

2006-May-07 06:41 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

I would normally agree, just that I get very large block sizes in the
non-NFS scenario.

However, for completeness, you''ll see my standard settings allow for
pretty large transfers. The maxreceivedataseglen below is the default,
and increasing it may help, but it still doesn''t answer why the local
non-nfs case has no problems, right?

                        First Burst Length: 65536
                        Immediate Data: yes
                        Initial Ready To Transfer (R2T): yes
                        Max Burst Length: 262144
                        Max Outstanding R2T: 1
                        Max Receive Data Segment Length: 8192
                        Max Connections: 1
                        Header Digest: NONE
                        Data Digest: NONE


On 5/6/06, Nicolas Williams <Nicolas.Williams at sun.com>
wrote:> On Fri, May 05, 2006 at 09:48:00PM -0700, Joe Little wrote:
> > Here''s some sample output. Where the I write over NFS to ZFS
(no
> > iscsi) I get high sizes for i/o:
> >
> > In the iscsi-backed case, I get:
>
> [small block sizes]
>
> > Looks to me the bulk of my problem is poor block size scheduling. Is
> > this tuneable for either ZFS or NFS and/or can be set?
>
> Looking at iscsiadm(1M) (and knowing little about iSCSI) I wonder what
> the maxburstlength/maxrecvdataseglen parameters are set to...
>
> Nico
> --
>

Nicolas Williams

2006-May-08 18:02 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler
wrote:> On Fri, Joe Little wrote:
> > Thanks. I''m playing with it now, trying to get the most
succinct test.
> > This is one thing that bothers me: Regardless of the backend, it
> > appears that a delete of a large tree (say the linux kernel) over NFS
> > takes forever, but its immediate when doing so locally. Is delete over
> > NFS really take such a different code path?
> 
> Yes.  As mentioned in my other email, the NFS protocol requires
> that operations like REMOVE, RMDIR, CREATE have the filesystem
> metadata written to stable storage/disk before sending a response
> to the client.  That is not required of local access and therefore
> the disparity between the two.
So then multi-threading rm/rmdir on the client-side would help, no?

Are there/should there be async versions of creat(2)/mkdir(2)/
rmdir(2)/link(2)/unlink(2)/...?

Nico
--

Joe Little

2006-May-09 04:24 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

I was asked to also snoop the iscsi end of things, trying to find
something different between the two. iscsi being relatively opaque, it
was easiest to find differences in the patterns. In the local copy to
RAIDZ example, the iscsi link would show packets of 1514 in length in
series of 5-10, with interjecting packets of 60 or 102, generally 2-4
in number. In the NFS client hitting the RAIDZ/iscsi combo, the iscsi
length would have 3-5 on average 1514 length packets with 5-7 packets
of 60 or 102 in between. Basically, the averages swapped, and its
likely because of a lot more meta data and/or write confirmations
going on in the NFS case.

At this point in time, I have two very important questions:

1) Is there any options available or planned to make NFS/ZFS work more
in concert to avoid this overhead, which with many small iscsi packets
(in the iscsi case) kills performance?

2) Is iscsi-backed storage, especially StorageTek acquired products,
in the planning matrix for supported ZFS (NAS) solutions? Also, why
hasn''t this combination been tested to date since this appears to be
an achilles heal. Again, UFS does not have this problem, nor other
file systems on other OSes (namely, XFS, JFS, etc which I''ve tested
before)

On 5/8/06, Nicolas Williams <Nicolas.Williams at sun.com>
wrote:> On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler wrote:
> > On Fri, Joe Little wrote:
> > > Thanks. I''m playing with it now, trying to get the most
succinct test.
> > > This is one thing that bothers me: Regardless of the backend, it
> > > appears that a delete of a large tree (say the linux kernel) over
NFS
> > > takes forever, but its immediate when doing so locally. Is delete
over
> > > NFS really take such a different code path?
> >
> > Yes.  As mentioned in my other email, the NFS protocol requires
> > that operations like REMOVE, RMDIR, CREATE have the filesystem
> > metadata written to stable storage/disk before sending a response
> > to the client.  That is not required of local access and therefore
> > the disparity between the two.
>
> So then multi-threading rm/rmdir on the client-side would help, no?
>
> Are there/should there be async versions of creat(2)/mkdir(2)/
> rmdir(2)/link(2)/unlink(2)/...?
>
> Nico
> --
>

David Weibel

2006-May-09 15:02 UTC

head link

[dtrace-discuss] Re: Re: [nfs-discuss] Script to trace NFSv3 client operations

A theoretical problem could occur if both iSCSI/NFS are sharing a 10/100 path. 
When NFS requests a file, ZFS might request a larger amount of IO from the iSCSI
target (pre-caching), this pre-cache burst could then saturate the pipe causing
NFS to slow when it returns its data?

The Solaris iSCSI initiator team hasn''t done much testing with
iSCSI->ZFS->NFS performance.  We have done a lot of performance testing
with iSCSI->Raw/UFS/ZFS/Raw/etc.

Is NFS/iSCSI using the same NIC path?  Is it 10/100, 1G, etc?  Is this with the
Solaris iSCSI initiator or QLogic 4XXX?  It also might be interesting if you can
let us know the iSCSI storage type.  Each target I have worked with has
different performance characteristics.  Some have verify poor latency on
different transfer sizes.
 
 
This message posted from opensolaris.org

Joe Little

2006-May-09 22:07 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Well, I tried some suggested iscsi tunings to no avail. I did try something
else though: I brought up samba. My linux 2.2 source tree copying into the
ZFS volume (in other words, SMB->ZFS->iSCSI) did far much better, taking a
minute to copy 102MB. And that''s from a 100MB/sec client.

My original tests of around 2TB of mixed files (mostly) small, took 51
minutes in the NFS case, but took 3.5 minutes exactly in the SMB case
(again, from a 100MB/sec client). I averaged almost 9MB/sec on that
transfer. These are not stellar numbers, but they are far better than the
low k/sec that I see with NFS. I definitely think the bug is on the NFS
server end, even considering that the SMB protocol is different.




On 5/8/06, Joe Little <jmlittle at gmail.com>
wrote:>
> I was asked to also snoop the iscsi end of things, trying to find
> something different between the two. iscsi being relatively opaque, it
> was easiest to find differences in the patterns. In the local copy to
> RAIDZ example, the iscsi link would show packets of 1514 in length in
> series of 5-10, with interjecting packets of 60 or 102, generally 2-4
> in number. In the NFS client hitting the RAIDZ/iscsi combo, the iscsi
> length would have 3-5 on average 1514 length packets with 5-7 packets
> of 60 or 102 in between. Basically, the averages swapped, and its
> likely because of a lot more meta data and/or write confirmations
> going on in the NFS case.
>
> At this point in time, I have two very important questions:
>
> 1) Is there any options available or planned to make NFS/ZFS work more
> in concert to avoid this overhead, which with many small iscsi packets
> (in the iscsi case) kills performance?
>
> 2) Is iscsi-backed storage, especially StorageTek acquired products,
> in the planning matrix for supported ZFS (NAS) solutions? Also, why
> hasn''t this combination been tested to date since this appears to
be
> an achilles heal. Again, UFS does not have this problem, nor other
> file systems on other OSes (namely, XFS, JFS, etc which I''ve
tested
> before)
>
>
> On 5/8/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote:
> > On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler wrote:
> > > On Fri, Joe Little wrote:
> > > > Thanks. I''m playing with it now, trying to get the
most succinct
> test.
> > > > This is one thing that bothers me: Regardless of the
backend, it
> > > > appears that a delete of a large tree (say the linux kernel)
over
> NFS
> > > > takes forever, but its immediate when doing so locally. Is
delete
> over
> > > > NFS really take such a different code path?
> > >
> > > Yes.  As mentioned in my other email, the NFS protocol requires
> > > that operations like REMOVE, RMDIR, CREATE have the filesystem
> > > metadata written to stable storage/disk before sending a response
> > > to the client.  That is not required of local access and
therefore
> > > the disparity between the two.
> >
> > So then multi-threading rm/rmdir on the client-side would help, no?
> >
> > Are there/should there be async versions of creat(2)/mkdir(2)/
> > rmdir(2)/link(2)/unlink(2)/...?
> >
> > Nico
> > --
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060509/b40ad1e3/attachment.html>

David Weibel

2006-May-10 22:29 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

I have a test configuration up and running internally.  I''m
not sure I''m seeing the exact same issues.  For this testing
I''m using GRITS.  I don''t know any really great FS performance
tools.  I tend to do my performance testing with vdbench with
raw SCSI IO on purpose to avoid FS caches.  (I''m more focused
on iSCSI performance in the Solaris initiator.)

configuration:
    ultra 20(amd64) - gige connection via nge
    solaris nevada build 40 - nondebug

    Cisco48portGigESwitch
      +-> ultra20 (amd64) host
      +-> w2100z (2x amd64) host
      +-> equal logic iscsi array (3 x 20G LUNs mapped to ultra20)
      |
      +-> lots of other unrelated hosts, arrays, etc...

    3 x 20G iSCSI LUNs -> RAW IO <==> vdbench
       pref: ~100 MB/s arg.

    3 x 20G iSCSI LUNs -> UFS <==> localhost GRITS
       perf: ~76.9 MB/s avg.

    3 x 20G iSCSI LUNs -> NFS <==> w2100z(2x amd64) GRITS
       pref: ~34 MB/s avg.

    3 x 20G iSCSI LUNs -> ZFS raidz <==> localhost GRITS
       pref: ~90 MB/s avg. (more bursty than ufs
                              29MB/s to 36MB/s)

    3 x 20G iSCSI LUNs -> ZFS raidz -> NFS <==>
                                       w2100(2x amd64) GRITS
       pref: ~15 MB/s avg.

GRITS creates 10 directories, write 10 files in each directory,
reads back the files, verifies their contents, deletes the files
and directories and then repeats the process.  This tool is java
based and designed more for file system data verification testing
not performance.

VDBENCH used for performance testing and benchmarking.

I will hand off my configuration for the Sun NFS teams internally
to check out.

-David

Joe Little wrote:
> Well, I tried some suggested iscsi tunings to no avail. I did try 
> something else though: I brought up samba. My linux 2.2 source tree 
> copying into the ZFS volume (in other words, SMB->ZFS->iSCSI) did far
> much better, taking a minute to copy 102MB. And that''s from a 
> 100MB/sec client.
>
> My original tests of around 2TB of mixed files (mostly) small, took 51 
> minutes in the NFS case, but took 3.5 minutes exactly in the SMB case 
> (again, from a 100MB/sec client). I averaged almost 9MB/sec on that 
> transfer. These are not stellar numbers, but they are far better than 
> the low k/sec that I see with NFS. I definitely think the bug is on 
> the NFS server end, even considering that the SMB protocol is different.
>
>
>
>
> On 5/8/06, *Joe Little* <jmlittle at gmail.com 
> <mailto:jmlittle at gmail.com>> wrote:
>
>     I was asked to also snoop the iscsi end of things, trying to find
>     something different between the two. iscsi being relatively opaque, it
>     was easiest to find differences in the patterns. In the local copy to
>     RAIDZ example, the iscsi link would show packets of 1514 in length in
>     series of 5-10, with interjecting packets of 60 or 102, generally 2-4
>     in number. In the NFS client hitting the RAIDZ/iscsi combo, the iscsi
>     length would have 3-5 on average 1514 length packets with 5-7 packets
>     of 60 or 102 in between. Basically, the averages swapped, and its
>     likely because of a lot more meta data and/or write confirmations
>     going on in the NFS case.
>
>     At this point in time, I have two very important questions:
>
>     1) Is there any options available or planned to make NFS/ZFS work more
>     in concert to avoid this overhead, which with many small iscsi packets
>     (in the iscsi case) kills performance?
>
>     2) Is iscsi-backed storage, especially StorageTek acquired products,
>     in the planning matrix for supported ZFS (NAS) solutions? Also, why
>     hasn''t this combination been tested to date since this appears
to be
>     an achilles heal. Again, UFS does not have this problem, nor other
>     file systems on other OSes (namely, XFS, JFS, etc which I''ve
tested
>     before)
>
>
>     On 5/8/06, Nicolas Williams <Nicolas.Williams at sun.com
>     <mailto:Nicolas.Williams at sun.com>> wrote:
>     > On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler wrote:
>     > > On Fri, Joe Little wrote:
>     > > > Thanks. I''m playing with it now, trying to get
the most
>     succinct test.
>     > > > This is one thing that bothers me: Regardless of the
backend, it
>     > > > appears that a delete of a large tree (say the linux
kernel)
>     over NFS
>     > > > takes forever, but its immediate when doing so locally.
Is
>     delete over
>     > > > NFS really take such a different code path?
>     > >
>     > > Yes.  As mentioned in my other email, the NFS protocol
requires
>     > > that operations like REMOVE, RMDIR, CREATE have the
filesystem
>     > > metadata written to stable storage/disk before sending a
response
>     > > to the client.  That is not required of local access and
>     therefore
>     > > the disparity between the two.
>     >
>     > So then multi-threading rm/rmdir on the client-side would help,
no?
>     >
>     > Are there/should there be async versions of creat(2)/mkdir(2)/
>     > rmdir(2)/link(2)/unlink(2)/...?
>     >
>     > Nico
>     > --
>     >
>
>

Joe Little

2006-May-10 22:57 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Thanks for setting this up. To prove me wrong, could someone grab any
linux-2.2.*.tar.gz and untar into the array over NFS and measure the
time. You have equallogic targets, and those are probably the most
performant. I''m using SBEi targets, which go up to ERL2 and tend to
also be performant (80-120MB/sec) for me.

as for GRITS, I''m not finding it using a simple google search. Is that
generally available, and for free?


On 5/10/06, David Weibel <David.Weibel at sun.com>
wrote:>   I have a test configuration up and running internally.  I''m
> not sure I''m seeing the exact same issues.  For this testing
> I''m using GRITS.  I don''t know any really great FS
performance
> tools.  I tend to do my performance testing with vdbench with
> raw SCSI IO on purpose to avoid FS caches.  (I''m more focused
> on iSCSI performance in the Solaris initiator.)
>
> configuration:
>     ultra 20(amd64) - gige connection via nge
>     solaris nevada build 40 - nondebug
>
>     Cisco48portGigESwitch
>       +-> ultra20 (amd64) host
>       +-> w2100z (2x amd64) host
>       +-> equal logic iscsi array (3 x 20G LUNs mapped to ultra20)
>       |
>       +-> lots of other unrelated hosts, arrays, etc...
>
>     3 x 20G iSCSI LUNs -> RAW IO <==> vdbench
>        pref: ~100 MB/s arg.
>
>     3 x 20G iSCSI LUNs -> UFS <==> localhost GRITS
>        perf: ~76.9 MB/s avg.
>
>     3 x 20G iSCSI LUNs -> NFS <==> w2100z(2x amd64) GRITS
>        pref: ~34 MB/s avg.
>
>     3 x 20G iSCSI LUNs -> ZFS raidz <==> localhost GRITS
>        pref: ~90 MB/s avg. (more bursty than ufs
>                               29MB/s to 36MB/s)
>
>     3 x 20G iSCSI LUNs -> ZFS raidz -> NFS <==>
>                                        w2100(2x amd64) GRITS
>        pref: ~15 MB/s avg.
>
> GRITS creates 10 directories, write 10 files in each directory,
> reads back the files, verifies their contents, deletes the files
> and directories and then repeats the process.  This tool is java
> based and designed more for file system data verification testing
> not performance.
>
> VDBENCH used for performance testing and benchmarking.
>
> I will hand off my configuration for the Sun NFS teams internally
> to check out.
>
> -David
>
> Joe Little wrote:
>
> > Well, I tried some suggested iscsi tunings to no avail. I did try
> > something else though: I brought up samba. My linux 2.2 source tree
> > copying into the ZFS volume (in other words, SMB->ZFS->iSCSI)
did far
> > much better, taking a minute to copy 102MB. And that''s from a
> > 100MB/sec client.
> >
> > My original tests of around 2TB of mixed files (mostly) small, took 51
> > minutes in the NFS case, but took 3.5 minutes exactly in the SMB case
> > (again, from a 100MB/sec client). I averaged almost 9MB/sec on that
> > transfer. These are not stellar numbers, but they are far better than
> > the low k/sec that I see with NFS. I definitely think the bug is on
> > the NFS server end, even considering that the SMB protocol is
different.
> >
> >
> >
> >
> > On 5/8/06, *Joe Little* <jmlittle at gmail.com
> > <mailto:jmlittle at gmail.com>> wrote:
> >
> >     I was asked to also snoop the iscsi end of things, trying to find
> >     something different between the two. iscsi being relatively
opaque, it
> >     was easiest to find differences in the patterns. In the local copy
to
> >     RAIDZ example, the iscsi link would show packets of 1514 in length
in
> >     series of 5-10, with interjecting packets of 60 or 102, generally
2-4
> >     in number. In the NFS client hitting the RAIDZ/iscsi combo, the
iscsi
> >     length would have 3-5 on average 1514 length packets with 5-7
packets
> >     of 60 or 102 in between. Basically, the averages swapped, and its
> >     likely because of a lot more meta data and/or write confirmations
> >     going on in the NFS case.
> >
> >     At this point in time, I have two very important questions:
> >
> >     1) Is there any options available or planned to make NFS/ZFS work
more
> >     in concert to avoid this overhead, which with many small iscsi
packets
> >     (in the iscsi case) kills performance?
> >
> >     2) Is iscsi-backed storage, especially StorageTek acquired
products,
> >     in the planning matrix for supported ZFS (NAS) solutions? Also,
why
> >     hasn''t this combination been tested to date since this
appears to be
> >     an achilles heal. Again, UFS does not have this problem, nor other
> >     file systems on other OSes (namely, XFS, JFS, etc which
I''ve tested
> >     before)
> >
> >
> >     On 5/8/06, Nicolas Williams <Nicolas.Williams at sun.com
> >     <mailto:Nicolas.Williams at sun.com>> wrote:
> >     > On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler
wrote:
> >     > > On Fri, Joe Little wrote:
> >     > > > Thanks. I''m playing with it now, trying to
get the most
> >     succinct test.
> >     > > > This is one thing that bothers me: Regardless of
the backend, it
> >     > > > appears that a delete of a large tree (say the
linux kernel)
> >     over NFS
> >     > > > takes forever, but its immediate when doing so
locally. Is
> >     delete over
> >     > > > NFS really take such a different code path?
> >     > >
> >     > > Yes.  As mentioned in my other email, the NFS protocol
requires
> >     > > that operations like REMOVE, RMDIR, CREATE have the
filesystem
> >     > > metadata written to stable storage/disk before sending a
response
> >     > > to the client.  That is not required of local access and
> >     therefore
> >     > > the disparity between the two.
> >     >
> >     > So then multi-threading rm/rmdir on the client-side would
help, no?
> >     >
> >     > Are there/should there be async versions of
creat(2)/mkdir(2)/
> >     > rmdir(2)/link(2)/unlink(2)/...?
> >     >
> >     > Nico
> >     > --
> >     >
> >
> >
>
>
>

Roch Bourbonnais - Performance Engineering

2006-May-11 08:32 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

# ptime tar xf linux-2.2.22.tar
  ptime tar xf linux-2.2.22.tar

  real       50.292
  user        1.019
  sys        11.417
  # ptime tar xf linux-2.2.22.tar
  ptime tar xf linux-2.2.22.tar

  real       56.833
  user        1.056
  sys        11.581
  # 


avg time waiting for async writes is around 3ms.
How much are you getting for the tar xf ?

-r

Joe Little

2006-May-11 10:56 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

How did you get the average time for async writes? My client (lacking
ptime, its linux) comes in at 50 minutes, not 50 seconds. I''m running
again right now for a more accurate number. I''m untarring from a local
file on the directory to the NFS share.

On 5/11/06, Roch Bourbonnais - Performance Engineering
<Roch.Bourbonnais at sun.com> wrote:>
>
>   # ptime tar xf linux-2.2.22.tar
>   ptime tar xf linux-2.2.22.tar
>
>   real       50.292
>   user        1.019
>   sys        11.417
>   # ptime tar xf linux-2.2.22.tar
>   ptime tar xf linux-2.2.22.tar
>
>   real       56.833
>   user        1.056
>   sys        11.581
>   #
>
>
> avg time waiting for async writes is around 3ms.
> How much are you getting for the tar xf ?
>
> -r
>
>

Roch Bourbonnais - Performance Engineering

2006-May-11 11:06 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

Joe Little writes:
 > How did you get the average time for async writes? My client (lacking
 > ptime, its linux) comes in at 50 minutes, not 50 seconds. I''m
running
 > again right now for a more accurate number. I''m untarring from a
local
 > file on the directory to the NFS share.
 > 

I used dtrace to measure times (I  used the sleep time so it
gives a ballpark figure).

I untared with the tar file on the NFS share. 
Just retimed after moving the tar file to /tmp.

# ptime tar xf /tmp/linux-2.2.22.tar
ptime tar xf /tmp/linux-2.2.22.tar

real       49.630
user        1.033
sys        11.405


-r

 > 
 > On 5/11/06, Roch Bourbonnais - Performance Engineering
 > <Roch.Bourbonnais at sun.com> wrote:
 > >
 > >
 > >   # ptime tar xf linux-2.2.22.tar
 > >   ptime tar xf linux-2.2.22.tar
 > >
 > >   real       50.292
 > >   user        1.019
 > >   sys        11.417
 > >   # ptime tar xf linux-2.2.22.tar
 > >   ptime tar xf linux-2.2.22.tar
 > >
 > >   real       56.833
 > >   user        1.056
 > >   sys        11.581
 > >   #
 > >
 > >
 > > avg time waiting for async writes is around 3ms.
 > > How much are you getting for the tar xf ?
 > >
 > > -r
 > >
 > >

Joe Little

2006-May-11 13:59 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

well, here''s my first pass result:

[root at csl-prime loges1]# time tar xf /root/linux-2.2.26.tar

real    114m6.662s
user    0m0.049s
sys     0m1.354s


On 5/11/06, Roch Bourbonnais - Performance Engineering
<Roch.Bourbonnais at sun.com> wrote:>
> Joe Little writes:
>  > How did you get the average time for async writes? My client (lacking
>  > ptime, its linux) comes in at 50 minutes, not 50 seconds.
I''m running
>  > again right now for a more accurate number. I''m untarring
from a local
>  > file on the directory to the NFS share.
>  >
>
> I used dtrace to measure times (I  used the sleep time so it
> gives a ballpark figure).
>
> I untared with the tar file on the NFS share.
> Just retimed after moving the tar file to /tmp.
>
> # ptime tar xf /tmp/linux-2.2.22.tar
> ptime tar xf /tmp/linux-2.2.22.tar
>
> real       49.630
> user        1.033
> sys        11.405
>
>
> -r
>
>  >
>  > On 5/11/06, Roch Bourbonnais - Performance Engineering
>  > <Roch.Bourbonnais at sun.com> wrote:
>  > >
>  > >
>  > >   # ptime tar xf linux-2.2.22.tar
>  > >   ptime tar xf linux-2.2.22.tar
>  > >
>  > >   real       50.292
>  > >   user        1.019
>  > >   sys        11.417
>  > >   # ptime tar xf linux-2.2.22.tar
>  > >   ptime tar xf linux-2.2.22.tar
>  > >
>  > >   real       56.833
>  > >   user        1.056
>  > >   sys        11.581
>  > >   #
>  > >
>  > >
>  > > avg time waiting for async writes is around 3ms.
>  > > How much are you getting for the tar xf ?
>  > >
>  > > -r
>  > >
>  > >
>
>

Spencer Shepler

2006-May-12 00:01 UTC

head link

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

So, there is a set of us that have taken this discussion off list while
we collect/share more data.  Once we get a little further in diagnosis,
we will summarize and bring the discussion back to the alias.

Spencer

On Thu, Joe Little wrote:> well, here''s my first pass result:
> 
> [root at csl-prime loges1]# time tar xf /root/linux-2.2.26.tar
> 
> real    114m6.662s
> user    0m0.049s
> sys     0m1.354s
> 
> 
> On 5/11/06, Roch Bourbonnais - Performance Engineering
> <Roch.Bourbonnais at sun.com> wrote:
> >
> >Joe Little writes:
> > > How did you get the average time for async writes? My client
(lacking
> > > ptime, its linux) comes in at 50 minutes, not 50 seconds.
I''m running
> > > again right now for a more accurate number. I''m
untarring from a local
> > > file on the directory to the NFS share.
> > >
> >
> >I used dtrace to measure times (I  used the sleep time so it
> >gives a ballpark figure).
> >
> >I untared with the tar file on the NFS share.
> >Just retimed after moving the tar file to /tmp.
> >
> ># ptime tar xf /tmp/linux-2.2.22.tar
> >ptime tar xf /tmp/linux-2.2.22.tar
> >
> >real       49.630
> >user        1.033
> >sys        11.405
> >
> >
> >-r
> >
> > >
> > > On 5/11/06, Roch Bourbonnais - Performance Engineering
> > > <Roch.Bourbonnais at sun.com> wrote:
> > > >
> > > >
> > > >   # ptime tar xf linux-2.2.22.tar
> > > >   ptime tar xf linux-2.2.22.tar
> > > >
> > > >   real       50.292
> > > >   user        1.019
> > > >   sys        11.417
> > > >   # ptime tar xf linux-2.2.22.tar
> > > >   ptime tar xf linux-2.2.22.tar
> > > >
> > > >   real       56.833
> > > >   user        1.056
> > > >   sys        11.581
> > > >   #
> > > >
> > > >
> > > > avg time waiting for async writes is around 3ms.
> > > > How much are you getting for the tar xf ?
> > > >
> > > > -r
> > > >
> > > >
> >
> >

dtrace discuss - May 2006 - Script to trace NFSv3 client operations

[dtrace-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[dtrace-discuss] Re: Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

[zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations