thr3ads.net - zfs discuss - [zfs-discuss] Slow write speed to ZFS pool (via NFS) [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Joe S

2007-Jun-19 20:06 UTC

[zfs-discuss] Slow write speed to ZFS pool (via NFS)

I have a couple of performance questions.

Right now, I am transferring about 200GB of data via NFS to my new Solaris
server. I started this YESTERDAY. When writing to my ZFS pool via NFS, I
notice what I believe to be slow write speeds. My client hosts vary between
a MacBook Pro running Tiger to a FreeBSD 6.2 Intel server. All clients are
connected to the a 10/100/1000 switch.

* Is there anything I can tune on my server?
* Is the problem with NFS?
* Do I need to provide any other information?


PERFORMANCE NUMBERS:

(The file transfer is still going on)

bash-3.00# zpool iostat 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank         140G  1.50T     13     91  1.45M  2.60M
tank         140G  1.50T      0     89      0  1.42M
tank         140G  1.50T      0     89  1.40K  1.40M
tank         140G  1.50T      0     94      0  1.46M
tank         140G  1.50T      0     85  1.50K  1.35M
tank         140G  1.50T      0    101      0  1.47M
tank         140G  1.50T      0     90      0  1.35M
tank         140G  1.50T      0     84      0  1.37M
tank         140G  1.50T      0     90      0  1.39M
tank         140G  1.50T      0     90      0  1.43M
tank         140G  1.50T      0     91      0  1.40M
tank         140G  1.50T      0     91      0  1.43M
tank         140G  1.50T      0     90  1.60K  1.39M

bash-3.00# zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank         141G  1.50T     13     91  1.45M  2.59M
  raidz1    70.3G   768G      6     45   793K  1.30M
    c3d0        -      -      3     43   357K   721K
    c4d0        -      -      3     42   404K   665K
    c6d0        -      -      3     43   404K   665K
  raidz1    70.2G   768G      6     45   692K  1.30M
    c3d1        -      -      3     42   354K   665K
    c4d1        -      -      3     42   354K   665K
    c5d0        -      -      3     43   354K   665K
----------  -----  -----  -----  -----  -----  -----

I also decided to time a local filesystem write test:

bash-3.00# time dd if=/dev/zero of=/data/testfile bs=1024k count=1000
1000+0 records in
1000+0 records out

real    0m16.490s
user    0m0.012s
sys     0m2.547s


SERVER INFORMATION:

Solaris 10 U3
Intel Pentium 4 3.0GHz
2GB RAM
Intel NIC (e1000g0)
1x 80 GB ATA drive for OS -
6x 300GB SATA drives for /data
  c3d0 - Sil3112 PCI SATA card port 1
  c3d1 - Sil3112 PCI SATA card port 2
  c4d0 - Sil3112 PCI SATA card port 3
  c4d1 - Sil3112 PCI SATA card port 4
  c5d0 - Onboard Intel SATA
  c6d0 - Onboard Intel SATA


DISK INFORMATION:

bash-3.00# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c1d0 <DEFAULT cyl 9961 alt 2 hd 255 sec 63>
          /pci at 0,0/pci-ide at 1f,1/ide at 0/cmdk at 0,0
       1. c3d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
          /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 0/cmdk at 0,0
       2. c3d1 <Maxtor 6-XXXXXXX-0001-279.48GB>
          /pci at 0,0/pci8086,244e at 1e/pci-ide at 3 /ide at 0/cmdk at 1,0
       3. c4d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
          /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 1/cmdk at 0,0
       4. c4d1 <Maxtor 6-XXXXXXX-0001-279.48GB>
          /pci at 0,0/pci8086, 244e at 1e/pci-ide at 3/ide at 1/cmdk at 1,0
       5. c5d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
          /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0
       6. c6d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
          /pci at 0,0/pci-ide at 1f ,2/ide at 1/cmdk at 0,0
Specify disk (enter its number): ^C
(XXXXXXX = drive serial number)


ZPOOL CONFIGURATION:

bash-3.00# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank                   1.64T    140G   1.50T     8%  ONLINE     -

bash-3.00# zpool status
  pool: tank
 state: ONLINE
 scrub: scrub completed with 0 errors on Tue Jun 19 07:33:05 2007
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0
            c4d0    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3d1    ONLINE       0     0     0
            c4d1    ONLINE       0     0     0
            c5d0    ONLINE       0     0     0

errors: No known data errors


ZFS Configuration:

bash-3.00# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
tank                  93.3G  1006G  32.6K  /tank
tank/data             93.3G  1006G  93.3G  /data
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070619/901a0e07/attachment.html>

oliver soell

2007-Jun-19 20:52 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

I have a very similar setup on opensolaris b62 - 5 disks on raidz on one onboard
sata port and four 3112-based ports. I have noticed that although this card
seems like a nice cheap one, it is only two channels, so therein lies a huge
performance decrease. I have thought about getting another card so that there is
no contention on the sata channels.
-o
 
 
This message posted from opensolaris.org

Joe S

2007-Jun-19 21:09 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

Correction:

SATA Controller is a Sillcon Image 3114, not a 3112.


On 6/19/07, Joe S <js.lists at gmail.com> wrote:>
> I have a couple of performance questions.
>
> Right now, I am transferring about 200GB of data via NFS to my new Solaris
> server. I started this YESTERDAY. When writing to my ZFS pool via NFS, I
> notice what I believe to be slow write speeds. My client hosts vary between
> a MacBook Pro running Tiger to a FreeBSD 6.2 Intel server. All clients are
> connected to the a 10/100/1000 switch.
>
> * Is there anything I can tune on my server?
> * Is the problem with NFS?
> * Do I need to provide any other information?
>
>
> PERFORMANCE NUMBERS:
>
> (The file transfer is still going on)
>
> bash-3.00# zpool iostat 5
>                capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> tank         140G  1.50T     13     91  1.45M  2.60M
> tank         140G  1.50T      0     89      0  1.42M
> tank         140G  1.50T      0     89  1.40K  1.40M
> tank         140G  1.50T      0     94      0  1.46M
> tank         140G  1.50T      0     85  1.50K  1.35M
> tank         140G  1.50T      0    101      0  1.47M
> tank         140G  1.50T      0     90      0  1.35M
> tank         140G  1.50T      0     84      0  1.37M
> tank         140G  1.50T      0     90      0  1.39M
> tank         140G  1.50T      0     90      0  1.43M
> tank         140G  1.50T      0     91      0  1.40M
> tank         140G  1.50T      0     91      0  1.43M
> tank         140G  1.50T      0     90  1.60K  1.39M
>
> bash-3.00# zpool iostat -v
>                capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> tank         141G  1.50T     13     91  1.45M  2.59M
>   raidz1    70.3G   768G      6     45   793K  1.30M
>     c3d0        -      -      3     43   357K   721K
>     c4d0        -      -      3     42   404K   665K
>     c6d0        -      -      3     43   404K   665K
>   raidz1    70.2G   768G      6     45   692K  1.30M
>     c3d1        -      -      3     42   354K   665K
>     c4d1        -      -      3     42   354K   665K
>     c5d0        -      -      3     43   354K   665K
> ----------  -----  -----  -----  -----  -----  -----
>
> I also decided to time a local filesystem write test:
>
> bash-3.00# time dd if=/dev/zero of=/data/testfile bs=1024k count=1000
> 1000+0 records in
> 1000+0 records out
>
> real    0m16.490s
> user    0m0.012s
> sys     0m2.547s
>
>
> SERVER INFORMATION:
>
> Solaris 10 U3
> Intel Pentium 4 3.0GHz
> 2GB RAM
> Intel NIC (e1000g0)
> 1x 80 GB ATA drive for OS -
> 6x 300GB SATA drives for /data
>   c3d0 - Sil3112 PCI SATA card port 1
>   c3d1 - Sil3112 PCI SATA card port 2
>   c4d0 - Sil3112 PCI SATA card port 3
>   c4d1 - Sil3112 PCI SATA card port 4
>   c5d0 - Onboard Intel SATA
>   c6d0 - Onboard Intel SATA
>
>
> DISK INFORMATION:
>
> bash-3.00# format
> Searching for disks...done
>
> AVAILABLE DISK SELECTIONS:
>        0. c1d0 <DEFAULT cyl 9961 alt 2 hd 255 sec 63>
>           /pci at 0,0/pci-ide at 1f,1/ide at 0/cmdk at 0,0
>        1. c3d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
>           /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 0/cmdk at 0,0
>        2. c3d1 <Maxtor 6-XXXXXXX-0001-279.48GB >
>           /pci at 0,0/pci8086,244e at 1e/pci-ide at 3 /ide at 0/cmdk at 1,0
>        3. c4d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
>           /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 1/cmdk at 0,0
>        4. c4d1 <Maxtor 6-XXXXXXX-0001-279.48GB >
>           /pci at 0,0/pci8086, 244e at 1e/pci-ide at 3/ide at 1/cmdk at 1,0
>        5. c5d0 <Maxtor 6-XXXXXXX-0001-279.48GB>
>           /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0
>        6. c6d0 <Maxtor 6-XXXXXXX-0001-279.48GB >
>           /pci at 0,0/pci-ide at 1f ,2/ide at 1/cmdk at 0,0
> Specify disk (enter its number): ^C
> (XXXXXXX = drive serial number)
>
>
> ZPOOL CONFIGURATION:
>
> bash-3.00# zpool list
> NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
> tank                   1.64T    140G   1.50T     8%  ONLINE     -
>
> bash-3.00# zpool status
>   pool: tank
>  state: ONLINE
>  scrub: scrub completed with 0 errors on Tue Jun 19 07:33:05 2007
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c3d0    ONLINE       0     0     0
>             c4d0    ONLINE       0     0     0
>             c6d0    ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c3d1    ONLINE       0     0     0
>             c4d1    ONLINE       0     0     0
>             c5d0    ONLINE       0     0     0
>
> errors: No known data errors
>
>
> ZFS Configuration:
>
> bash-3.00# zfs list
> NAME                   USED  AVAIL  REFER  MOUNTPOINT
> tank                  93.3G  1006G  32.6K  /tank
> tank/data             93.3G  1006G  93.3G  /data
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070619/443d36fc/attachment.html>

Bart Smaalders

2007-Jun-19 21:24 UTC

head link

[zfs-discuss] Slow write speed to ZFS pool (via NFS)

Joe S wrote:> I have a couple of performance questions.
> 
> Right now, I am transferring about 200GB of data via NFS to my new 
> Solaris server. I started this YESTERDAY. When writing to my ZFS pool 
> via NFS, I notice what I believe to be slow write speeds. My client 
> hosts vary between a MacBook Pro running Tiger to a FreeBSD 6.2 Intel 
> server. All clients are connected to the a 10/100/1000 switch.
> 
> * Is there anything I can tune on my server?
> * Is the problem with NFS?
> * Do I need to provide any other information?
> 
If you have a lot of small files, doing this sort of thing
over NFS can be pretty painful... for a speedup, consider:

(cd <oldroot on client; tar cf - .) | ssh joes at server ''(cd
<newroot on
server>; tar xf -)''

- Bart

-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts

Mario Goebbels

2007-Jun-20 09:49 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

> Correction: 
> 
> SATA Controller is a Sillcon Image 3114, not a 3112.
Do these slow speeds only appear when writing via NFS or generally in
all scenarios? Just asking, because Solaris'' ata driver
doesn''t
initialize settings like block mode, prefetch and such on IDE/SATA
drives (that is if ata applies here with that chipset).

-mg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 648 bytes
Desc: This is a digitally signed message part
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070620/75b849d5/attachment.bin>

Joe S

2007-Jun-20 16:59 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

After researching this further, I found that there are some known
performance issues with NFS + ZFS. I tried transferring files via SMB, and
got write speeds on average of 25MB/s.

So I will have my UNIX systems use SMB to write files to my Solaris server.
This seems weird, but its fast. I''m sure Sun is working on fixing this.
I
can''t imagine running a Sun box with out NFS.

On 6/20/07, Mario Goebbels <me at tomservo.cc>
wrote:>
> > Correction:
> >
> > SATA Controller is a Sillcon Image 3114, not a 3112.
>
> Do these slow speeds only appear when writing via NFS or generally in
> all scenarios? Just asking, because Solaris'' ata driver
doesn''t
> initialize settings like block mode, prefetch and such on IDE/SATA
> drives (that is if ata applies here with that chipset).
>
> -mg
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070620/8448a867/attachment.html>

Roch - PAE

2007-Jun-21 09:36 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

Joe S writes:
 > After researching this further, I found that there are some known
 > performance issues with NFS + ZFS. I tried transferring files via SMB, and
 > got write speeds on average of 25MB/s.
 > 
 > So I will have my UNIX systems use SMB to write files to my Solaris
server.
 > This seems weird, but its fast. I''m sure Sun is working on fixing
this. I
 > can''t imagine running a Sun box with out NFS.
 > 

Call be a picky but :

There is no NFS over ZFS issue (IMO/FWIW).
There is a ZFS over NVRAM issue; well understood (not related to NFS).
There is a Samba vs NFS issue; not well understood (not related to ZFS).

This last bullet is probably better suited for
nfs-discuss at opensolaris.org.

If ZFS is talking to storage array with NVRAM, then we have
an issue (not related to NFS) described by  :

	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6462690
	6462690 sd driver should set SYNC_NV bit when issuing SYNCHRONIZE CACHE to
SBC-2 devices

The  above bug/rfe  lies  in the  SD   driver but very  much
triggered by ZFS particularly running NFS,  but not only. It
affects only NVRAM based storage and is being worked on.

If ZFS is talking to a JBOD, then the slowness is a
characteristic of NFS (not related to ZFS).

So FWIW on  JBOD, there is no  ZFS+NFS "issue"  in the sense
that  I  don''t know   how    we could    change ZFS  to   be
significantly better  at NFS nor  do  I know how to change  NFS 
that would help  _particularly_  ZFS.  Doesn''t  mean  there is
none, I just don''t know about them. So please ping me if you
highlight such an issue. So if one replaces ZFS by some other
filesystem and gets large speedup  I''m interested (make sure
the other  filesystem either runs  with  write cache off, or
flushes it on NFS commit).

So that leaves us with a Samba vs NFS issue (not related to
ZFS). We know that NFS is able to create file _at most_ at
one file per server I/O latency. Samba appears better and this is
what we need to investigate. It might be better in a way
that NFS can borrow (maybe through some better NFSV4 delegation
code) or Samba might be better by being careless with data.
If we find such an NFS improvement it will help all backend
filesystems not just ZFS.

Which is why I say: There is no NFS over ZFS issue.

-r

Brian Hechinger

2007-Jun-23 01:47 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

On Thu, Jun 21, 2007 at 11:36:53AM +0200, Roch - PAE
wrote:> 
> code) or Samba might be better by being careless with data.
Well, it *is* trying to be a Microsoft replacement.  Gotta get it
right, you know?  ;)

-brian
-- 
"Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it''s just
that most of the shit out there is built by people who''d be better
suited to making sure that my burger is cooked thoroughly."  -- Jonathan
Patschke

Thomas Garner

2007-Jun-23 15:59 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

So it is expected behavior on my Nexenta alpha 7 server for Sun''s nfsd
to stop responding after 2 hours of running a bittorrent client over
nfs4 from a linux client, causing zfs snapshots to hang and requiring
a hard reboot to get the world back in order?

Thomas
> There is no NFS over ZFS issue (IMO/FWIW).
> If ZFS is talking to a JBOD, then the slowness is a
> characteristic of NFS (not related to ZFS).
>
> So FWIW on  JBOD, there is no  ZFS+NFS "issue"  in the sense
> that  I  don''t know   how    we could    change ZFS  to   be
> significantly better  at NFS nor  do  I know how to change  NFS
> that would help  _particularly_  ZFS.  Doesn''t  mean  there is
> none, I just don''t know about them. So please ping me if you
> highlight such an issue. So if one replaces ZFS by some other
> filesystem and gets large speedup  I''m interested (make sure
> the other  filesystem either runs  with  write cache off, or
> flushes it on NFS commit).
>
> So that leaves us with a Samba vs NFS issue (not related to
> ZFS). We know that NFS is able to create file _at most_ at
> one file per server I/O latency. Samba appears better and this is
> what we need to investigate. It might be better in a way
> that NFS can borrow (maybe through some better NFSV4 delegation
> code) or Samba might be better by being careless with data.
> If we find such an NFS improvement it will help all backend
> filesystems not just ZFS.
>
> Which is why I say: There is no NFS over ZFS issue.

Paul Fisher

2007-Jun-23 17:05 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

> From: zfs-discuss-bounces at opensolaris.org 
> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of 
> Thomas Garner
> 
> So it is expected behavior on my Nexenta alpha 7 server for Sun''s
nfsd
> to stop responding after 2 hours of running a bittorrent client over
> nfs4 from a linux client, causing zfs snapshots to hang and requiring
> a hard reboot to get the world back in order?
We have seen this behavior, but it appears to be entirely related to the
hardware having the "Intel IPMI" stuff swallow up the NFS traffic on
port 623 directly by the network hardware and never getting.

http://blogs.sun.com/shepler/entry/port_623_or_the_mount


--

paul

Thomas Garner

2007-Jun-24 17:05 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

> We have seen this behavior, but it appears to be entirely related to the
hardware having the "Intel IPMI" stuff swallow up the NFS traffic on
port 623 directly by the network hardware and never getting.
>
> http://blogs.sun.com/shepler/entry/port_623_or_the_mount
Unfortunately, this nfs hangs across 3 separate machines, none of
which should have this IPMI issue.  It did spur me on to dig a little
deeper, though, so thanks for the encouragement that all may not be
well.

Can anyone debug this?  Remember that this is Nexenta Alpha 7, so it
should be b61.  nfsd is totally hung (rpc timeouts) and zfs would be
having problems taking snapshots, if I hadn''t disabled the hourly
snapshots.

Thanks!
Thomas

[tgarner at flyingcows ~]$ rpcinfo -t filer0 nfs
rpcinfo: RPC: Timed out
program 100003 version 0 is not available

echo "::pgrep nfsd | ::walk thread | ::findstack -v" | mdb -k

stack pointer for thread 821cda00: 822d6e28
  822d6e5c swtch+0x17d()
  822d6e8c cv_wait_sig_swap_core+0x13f(8b8a9232, 8b8a9200, 0)
  822d6ea4 cv_wait_sig_swap+0x13(8b8a9232, 8b8a9200)
  822d6ee0 cv_waituntil_sig+0x100(8b8a9232, 8b8a9200, 0)
  822d6f44 poll_common+0x3e1(8069480, a, 0, 0)
  822d6f84 pollsys+0x7c()
  822d6fac sys_sysenter+0x102()
stack pointer for thread 821d2e00: 8c279d98
  8c279dcc swtch+0x17d()
  8c279df4 cv_wait_sig+0x123(8988796e, 89887970)
  8c279e2c svc_wait+0xaa(1)
  8c279f84 nfssys+0x423()
  8c279fac sys_sysenter+0x102()
stack pointer for thread a9f88800: 8c92e218
  8c92e244 swtch+0x17d()
  8c92e254 cv_wait+0x4e(8a4169ea, 8a4169e0)
  8c92e278 mv_wait_for_dma+0x32()
  8c92e2a4 mv_start+0x278(88252c78, 89833498)
  8c92e2d4 sata_hba_start+0x79(8987d23c, 8c92e304)
  8c92e308 sata_txlt_synchronize_cache+0xb7(8987d23c)
  8c92e334 sata_scsi_start+0x1b7(8987d1e4, 8987d1e0)
  8c92e368 scsi_transport+0x52(8987d1e0)
  8c92e3a4 sd_start_cmds+0x28a(8a2710c0, 0)
  8c92e3c0 sd_core_iostart+0x158(18, 8a2710c0, 8da3be70)
  8c92e3f8 sd_uscsi_strategy+0xe8(8da3be70)
  8c92e414 sd_send_scsi_SYNCHRONIZE_CACHE+0xd4(8a2710c0, 8c50074c)
  8c92e4b0 sdioctl+0x48e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0)
  8c92e4dc cdev_ioctl+0x2e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0)
  8c92e504 ldi_ioctl+0xa4(8a671700, 422, 8c50074c, 80100000, 883cee68, 0)
  8c92e544 vdev_disk_io_start+0x187(8c500580)
  8c92e554 vdev_io_start+0x18(8c500580)
  8c92e580 zio_vdev_io_start+0x142(8c500580)
  8c92e59c zio_next_stage+0xaa(8c500580)
  8c92e5b0 zio_ready+0x136(8c500580)
  8c92e5cc zio_next_stage+0xaa(8c500580)
  8c92e5ec zio_wait_for_children+0x46(8c500580, 1, 8c50076c)
  8c92e600 zio_wait_children_ready+0x18(8c500580)
  8c92e614 zio_next_stage_async+0xac(8c500580)
  8c92e624 zio_nowait+0xe(8c500580)
  8c92e660 zio_ioctl+0x94(9c6f8300, 89557c80, 89556400, 422, 0, 0)
  8c92e694 zil_flush_vdev+0x54(89557c80, 0, 0, 8c92e6e0, 9c6f8500)
  8c92e6e4 zil_flush_vdevs+0x6b(8bbe46c0)
  8c92e734 zil_commit_writer+0x35f(8bbe46c0, 3497c, 0, 4af5, 0)
  8c92e774 zil_commit+0x96(8bbe46c0, ffffffff, ffffffff, 4af5, 0)
  8c92e7e8 zfs_putpage+0x1e4(8c8ab480, 0, 0, 0, 0, 8c6c75c0)
  8c92e824 vhead_putpage+0x95(8c8ab480, 0, 0, 0, 0, 8c6c75c0)
  8c92e86c fop_putpage+0x27(8c8ab480, 0, 0, 0, 0, 8c6c75c0)
  8c92e91c rfs4_op_commit+0x153(82141dd4, b28c3100, 8c92ed8c, 8c92e948)
  8c92ea48 rfs4_compound+0x1ce(8c92ead0, 8c92ea7c, 0, 8c92ed8c, 0)
  8c92eaac rfs4_dispatch+0x65(8bf9b248, 8c92ed8c, b28c5a40, 8c92ead0)
  8c92ed10 common_dispatch+0x6b0(8c92ed8c, b28c5a40, 2, 4, 8bf9c01c, 8bf9b1f0)
  8c92ed34 rfs_dispatch+0x1f(8c92ed8c, b28c5a40)
  8c92edc4 svc_getreq+0x158(b28c5a40, 842952a0)
  8c92ee0c svc_run+0x146(898878e8)
  8c92ee2c svc_do_run+0x6e(1)
  8c92ef84 nfssys+0x3fb()
  8c92efac sys_sysenter+0x102()
<snipping out a bunch of other threads>

Roch - PAE

2007-Jun-25 08:54 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

Sorry about that; looks like you''ve hit this:

	6546683 marvell88sx driver misses wakeup for mv_empty_cv
	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6546683

Fixed in snv_64.
-r


Thomas Garner writes:
 > > We have seen this behavior, but it appears to be entirely related to
the hardware having the "Intel IPMI" stuff swallow up the NFS traffic
on port 623 directly by the network hardware and never getting.
 > >
 > > http://blogs.sun.com/shepler/entry/port_623_or_the_mount
 > 
 > Unfortunately, this nfs hangs across 3 separate machines, none of
 > which should have this IPMI issue.  It did spur me on to dig a little
 > deeper, though, so thanks for the encouragement that all may not be
 > well.
 > 
 > Can anyone debug this?  Remember that this is Nexenta Alpha 7, so it
 > should be b61.  nfsd is totally hung (rpc timeouts) and zfs would be
 > having problems taking snapshots, if I hadn''t disabled the hourly
 > snapshots.
 > 
 > Thanks!
 > Thomas
 > 
 > [tgarner at flyingcows ~]$ rpcinfo -t filer0 nfs
 > rpcinfo: RPC: Timed out
 > program 100003 version 0 is not available
 > 
 > echo "::pgrep nfsd | ::walk thread | ::findstack -v" | mdb -k
 > 
 > stack pointer for thread 821cda00: 822d6e28
 >   822d6e5c swtch+0x17d()
 >   822d6e8c cv_wait_sig_swap_core+0x13f(8b8a9232, 8b8a9200, 0)
 >   822d6ea4 cv_wait_sig_swap+0x13(8b8a9232, 8b8a9200)
 >   822d6ee0 cv_waituntil_sig+0x100(8b8a9232, 8b8a9200, 0)
 >   822d6f44 poll_common+0x3e1(8069480, a, 0, 0)
 >   822d6f84 pollsys+0x7c()
 >   822d6fac sys_sysenter+0x102()
 > stack pointer for thread 821d2e00: 8c279d98
 >   8c279dcc swtch+0x17d()
 >   8c279df4 cv_wait_sig+0x123(8988796e, 89887970)
 >   8c279e2c svc_wait+0xaa(1)
 >   8c279f84 nfssys+0x423()
 >   8c279fac sys_sysenter+0x102()
 > stack pointer for thread a9f88800: 8c92e218
 >   8c92e244 swtch+0x17d()
 >   8c92e254 cv_wait+0x4e(8a4169ea, 8a4169e0)
 >   8c92e278 mv_wait_for_dma+0x32()
 >   8c92e2a4 mv_start+0x278(88252c78, 89833498)
 >   8c92e2d4 sata_hba_start+0x79(8987d23c, 8c92e304)
 >   8c92e308 sata_txlt_synchronize_cache+0xb7(8987d23c)
 >   8c92e334 sata_scsi_start+0x1b7(8987d1e4, 8987d1e0)
 >   8c92e368 scsi_transport+0x52(8987d1e0)
 >   8c92e3a4 sd_start_cmds+0x28a(8a2710c0, 0)
 >   8c92e3c0 sd_core_iostart+0x158(18, 8a2710c0, 8da3be70)
 >   8c92e3f8 sd_uscsi_strategy+0xe8(8da3be70)
 >   8c92e414 sd_send_scsi_SYNCHRONIZE_CACHE+0xd4(8a2710c0, 8c50074c)
 >   8c92e4b0 sdioctl+0x48e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0)
 >   8c92e4dc cdev_ioctl+0x2e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0)
 >   8c92e504 ldi_ioctl+0xa4(8a671700, 422, 8c50074c, 80100000, 883cee68, 0)
 >   8c92e544 vdev_disk_io_start+0x187(8c500580)
 >   8c92e554 vdev_io_start+0x18(8c500580)
 >   8c92e580 zio_vdev_io_start+0x142(8c500580)
 >   8c92e59c zio_next_stage+0xaa(8c500580)
 >   8c92e5b0 zio_ready+0x136(8c500580)
 >   8c92e5cc zio_next_stage+0xaa(8c500580)
 >   8c92e5ec zio_wait_for_children+0x46(8c500580, 1, 8c50076c)
 >   8c92e600 zio_wait_children_ready+0x18(8c500580)
 >   8c92e614 zio_next_stage_async+0xac(8c500580)
 >   8c92e624 zio_nowait+0xe(8c500580)
 >   8c92e660 zio_ioctl+0x94(9c6f8300, 89557c80, 89556400, 422, 0, 0)
 >   8c92e694 zil_flush_vdev+0x54(89557c80, 0, 0, 8c92e6e0, 9c6f8500)
 >   8c92e6e4 zil_flush_vdevs+0x6b(8bbe46c0)
 >   8c92e734 zil_commit_writer+0x35f(8bbe46c0, 3497c, 0, 4af5, 0)
 >   8c92e774 zil_commit+0x96(8bbe46c0, ffffffff, ffffffff, 4af5, 0)
 >   8c92e7e8 zfs_putpage+0x1e4(8c8ab480, 0, 0, 0, 0, 8c6c75c0)
 >   8c92e824 vhead_putpage+0x95(8c8ab480, 0, 0, 0, 0, 8c6c75c0)
 >   8c92e86c fop_putpage+0x27(8c8ab480, 0, 0, 0, 0, 8c6c75c0)
 >   8c92e91c rfs4_op_commit+0x153(82141dd4, b28c3100, 8c92ed8c, 8c92e948)
 >   8c92ea48 rfs4_compound+0x1ce(8c92ead0, 8c92ea7c, 0, 8c92ed8c, 0)
 >   8c92eaac rfs4_dispatch+0x65(8bf9b248, 8c92ed8c, b28c5a40, 8c92ead0)
 >   8c92ed10 common_dispatch+0x6b0(8c92ed8c, b28c5a40, 2, 4, 8bf9c01c,
8bf9b1f0)
 >   8c92ed34 rfs_dispatch+0x1f(8c92ed8c, b28c5a40)
 >   8c92edc4 svc_getreq+0x158(b28c5a40, 842952a0)
 >   8c92ee0c svc_run+0x146(898878e8)
 >   8c92ee2c svc_do_run+0x6e(1)
 >   8c92ef84 nfssys+0x3fb()
 >   8c92efac sys_sysenter+0x102()
 > <snipping out a bunch of other threads>

Thomas Garner

2007-Jun-25 15:59 UTC

head link

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

Thanks, Roch!  Much appreciated knowing what the problem is and that a
fix is in a forthcoming release.

Thomas

On 6/25/07, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:>
>
> Sorry about that; looks like you''ve hit this:
>
>         6546683 marvell88sx driver misses wakeup for mv_empty_cv
>         http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6546683
>
> Fixed in snv_64.
> -r

Roch - PAE

2007-Jun-26 09:52 UTC

head link

[zfs-discuss] There is no NFS over ZFS issue

Regarding the bold statement 


	There is no NFS over ZFS issue


What   I  mean here  is that,    if  you  _do_  encounter  a
performance pathology not  linked to the NVRAM Storage/cache
flush issue then you _should_ complain or better get someone
to do an analysis of the situation.

One  should   not  assume that some    observed pathological
performance of  NFS/ZFS is widespread and due  to some known
ZFS issue about to be fixed.

To be sure, there are lots of performance opportunities that
will provide incremental  improvements the most  significant
of which "ZFS    Separate  Intent Log"  just  integrated  in
Nevada. This     opens  up the   field   for further NFS/ZFS
performance investigations.

But the data that got this thread  started seem to highlight
an NFS   vs Samba opportinity,   something  we need  to look
into. Otherwise I don''t think that the  data produced so far
has hightlighted   any specific  NFS/ZFS issue.    There are
certainly   opportinities    for   incremental   performance
improvements but, to the best of my knowledge, outside the
NVRAM/Flush issue on certain storage :


	There are no known prevalent NFS over ZFS performance
	pathologies on record.


-r


Ref: 
http://mail.opensolaris.org/pipermail/zfs-discuss/2007-June/thread.html#29026

2007-Aug-10 07:57 UTC

head link

[zfs-discuss] Slow write speed to ZFS pool (via NFS)

> So that leaves us with a Samba vs NFS issue (not
> related to
> ZFS). We know that NFS is able to create file _at
> most_ at
> one file per server I/O latency. Samba appears better
> and this is
> what we need to investigate. It might be better in a
> way
> that NFS can borrow (maybe through some better NFSV4
> delegation
> code) or Samba might be better by being careless with
> data.
> If we find such an NFS improvement it will help all
> backend
> filesystems not just ZFS.
Just curious:  Was this nfs-samba ghost ever caught and sent back to the spirit
realm?  :)
 
 
This message posted from opensolaris.org

zfs discuss - Jun 2007 - Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

[zfs-discuss] There is no NFS over ZFS issue

[zfs-discuss] Slow write speed to ZFS pool (via NFS)