I have a couple of performance questions. Right now, I am transferring about 200GB of data via NFS to my new Solaris server. I started this YESTERDAY. When writing to my ZFS pool via NFS, I notice what I believe to be slow write speeds. My client hosts vary between a MacBook Pro running Tiger to a FreeBSD 6.2 Intel server. All clients are connected to the a 10/100/1000 switch. * Is there anything I can tune on my server? * Is the problem with NFS? * Do I need to provide any other information? PERFORMANCE NUMBERS: (The file transfer is still going on) bash-3.00# zpool iostat 5 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- tank 140G 1.50T 13 91 1.45M 2.60M tank 140G 1.50T 0 89 0 1.42M tank 140G 1.50T 0 89 1.40K 1.40M tank 140G 1.50T 0 94 0 1.46M tank 140G 1.50T 0 85 1.50K 1.35M tank 140G 1.50T 0 101 0 1.47M tank 140G 1.50T 0 90 0 1.35M tank 140G 1.50T 0 84 0 1.37M tank 140G 1.50T 0 90 0 1.39M tank 140G 1.50T 0 90 0 1.43M tank 140G 1.50T 0 91 0 1.40M tank 140G 1.50T 0 91 0 1.43M tank 140G 1.50T 0 90 1.60K 1.39M bash-3.00# zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- tank 141G 1.50T 13 91 1.45M 2.59M raidz1 70.3G 768G 6 45 793K 1.30M c3d0 - - 3 43 357K 721K c4d0 - - 3 42 404K 665K c6d0 - - 3 43 404K 665K raidz1 70.2G 768G 6 45 692K 1.30M c3d1 - - 3 42 354K 665K c4d1 - - 3 42 354K 665K c5d0 - - 3 43 354K 665K ---------- ----- ----- ----- ----- ----- ----- I also decided to time a local filesystem write test: bash-3.00# time dd if=/dev/zero of=/data/testfile bs=1024k count=1000 1000+0 records in 1000+0 records out real 0m16.490s user 0m0.012s sys 0m2.547s SERVER INFORMATION: Solaris 10 U3 Intel Pentium 4 3.0GHz 2GB RAM Intel NIC (e1000g0) 1x 80 GB ATA drive for OS - 6x 300GB SATA drives for /data c3d0 - Sil3112 PCI SATA card port 1 c3d1 - Sil3112 PCI SATA card port 2 c4d0 - Sil3112 PCI SATA card port 3 c4d1 - Sil3112 PCI SATA card port 4 c5d0 - Onboard Intel SATA c6d0 - Onboard Intel SATA DISK INFORMATION: bash-3.00# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1d0 <DEFAULT cyl 9961 alt 2 hd 255 sec 63> /pci at 0,0/pci-ide at 1f,1/ide at 0/cmdk at 0,0 1. c3d0 <Maxtor 6-XXXXXXX-0001-279.48GB> /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 0/cmdk at 0,0 2. c3d1 <Maxtor 6-XXXXXXX-0001-279.48GB> /pci at 0,0/pci8086,244e at 1e/pci-ide at 3 /ide at 0/cmdk at 1,0 3. c4d0 <Maxtor 6-XXXXXXX-0001-279.48GB> /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 1/cmdk at 0,0 4. c4d1 <Maxtor 6-XXXXXXX-0001-279.48GB> /pci at 0,0/pci8086, 244e at 1e/pci-ide at 3/ide at 1/cmdk at 1,0 5. c5d0 <Maxtor 6-XXXXXXX-0001-279.48GB> /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 6. c6d0 <Maxtor 6-XXXXXXX-0001-279.48GB> /pci at 0,0/pci-ide at 1f ,2/ide at 1/cmdk at 0,0 Specify disk (enter its number): ^C (XXXXXXX = drive serial number) ZPOOL CONFIGURATION: bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 1.64T 140G 1.50T 8% ONLINE - bash-3.00# zpool status pool: tank state: ONLINE scrub: scrub completed with 0 errors on Tue Jun 19 07:33:05 2007 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3d1 ONLINE 0 0 0 c4d1 ONLINE 0 0 0 c5d0 ONLINE 0 0 0 errors: No known data errors ZFS Configuration: bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 93.3G 1006G 32.6K /tank tank/data 93.3G 1006G 93.3G /data -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070619/901a0e07/attachment.html>
oliver soell
2007-Jun-19 20:52 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
I have a very similar setup on opensolaris b62 - 5 disks on raidz on one onboard sata port and four 3112-based ports. I have noticed that although this card seems like a nice cheap one, it is only two channels, so therein lies a huge performance decrease. I have thought about getting another card so that there is no contention on the sata channels. -o This message posted from opensolaris.org
Correction: SATA Controller is a Sillcon Image 3114, not a 3112. On 6/19/07, Joe S <js.lists at gmail.com> wrote:> > I have a couple of performance questions. > > Right now, I am transferring about 200GB of data via NFS to my new Solaris > server. I started this YESTERDAY. When writing to my ZFS pool via NFS, I > notice what I believe to be slow write speeds. My client hosts vary between > a MacBook Pro running Tiger to a FreeBSD 6.2 Intel server. All clients are > connected to the a 10/100/1000 switch. > > * Is there anything I can tune on my server? > * Is the problem with NFS? > * Do I need to provide any other information? > > > PERFORMANCE NUMBERS: > > (The file transfer is still going on) > > bash-3.00# zpool iostat 5 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > tank 140G 1.50T 13 91 1.45M 2.60M > tank 140G 1.50T 0 89 0 1.42M > tank 140G 1.50T 0 89 1.40K 1.40M > tank 140G 1.50T 0 94 0 1.46M > tank 140G 1.50T 0 85 1.50K 1.35M > tank 140G 1.50T 0 101 0 1.47M > tank 140G 1.50T 0 90 0 1.35M > tank 140G 1.50T 0 84 0 1.37M > tank 140G 1.50T 0 90 0 1.39M > tank 140G 1.50T 0 90 0 1.43M > tank 140G 1.50T 0 91 0 1.40M > tank 140G 1.50T 0 91 0 1.43M > tank 140G 1.50T 0 90 1.60K 1.39M > > bash-3.00# zpool iostat -v > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > tank 141G 1.50T 13 91 1.45M 2.59M > raidz1 70.3G 768G 6 45 793K 1.30M > c3d0 - - 3 43 357K 721K > c4d0 - - 3 42 404K 665K > c6d0 - - 3 43 404K 665K > raidz1 70.2G 768G 6 45 692K 1.30M > c3d1 - - 3 42 354K 665K > c4d1 - - 3 42 354K 665K > c5d0 - - 3 43 354K 665K > ---------- ----- ----- ----- ----- ----- ----- > > I also decided to time a local filesystem write test: > > bash-3.00# time dd if=/dev/zero of=/data/testfile bs=1024k count=1000 > 1000+0 records in > 1000+0 records out > > real 0m16.490s > user 0m0.012s > sys 0m2.547s > > > SERVER INFORMATION: > > Solaris 10 U3 > Intel Pentium 4 3.0GHz > 2GB RAM > Intel NIC (e1000g0) > 1x 80 GB ATA drive for OS - > 6x 300GB SATA drives for /data > c3d0 - Sil3112 PCI SATA card port 1 > c3d1 - Sil3112 PCI SATA card port 2 > c4d0 - Sil3112 PCI SATA card port 3 > c4d1 - Sil3112 PCI SATA card port 4 > c5d0 - Onboard Intel SATA > c6d0 - Onboard Intel SATA > > > DISK INFORMATION: > > bash-3.00# format > Searching for disks...done > > AVAILABLE DISK SELECTIONS: > 0. c1d0 <DEFAULT cyl 9961 alt 2 hd 255 sec 63> > /pci at 0,0/pci-ide at 1f,1/ide at 0/cmdk at 0,0 > 1. c3d0 <Maxtor 6-XXXXXXX-0001-279.48GB> > /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 0/cmdk at 0,0 > 2. c3d1 <Maxtor 6-XXXXXXX-0001-279.48GB > > /pci at 0,0/pci8086,244e at 1e/pci-ide at 3 /ide at 0/cmdk at 1,0 > 3. c4d0 <Maxtor 6-XXXXXXX-0001-279.48GB> > /pci at 0,0/pci8086,244e at 1e/pci-ide at 3/ide at 1/cmdk at 0,0 > 4. c4d1 <Maxtor 6-XXXXXXX-0001-279.48GB > > /pci at 0,0/pci8086, 244e at 1e/pci-ide at 3/ide at 1/cmdk at 1,0 > 5. c5d0 <Maxtor 6-XXXXXXX-0001-279.48GB> > /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0 > 6. c6d0 <Maxtor 6-XXXXXXX-0001-279.48GB > > /pci at 0,0/pci-ide at 1f ,2/ide at 1/cmdk at 0,0 > Specify disk (enter its number): ^C > (XXXXXXX = drive serial number) > > > ZPOOL CONFIGURATION: > > bash-3.00# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > tank 1.64T 140G 1.50T 8% ONLINE - > > bash-3.00# zpool status > pool: tank > state: ONLINE > scrub: scrub completed with 0 errors on Tue Jun 19 07:33:05 2007 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c3d0 ONLINE 0 0 0 > c4d0 ONLINE 0 0 0 > c6d0 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c3d1 ONLINE 0 0 0 > c4d1 ONLINE 0 0 0 > c5d0 ONLINE 0 0 0 > > errors: No known data errors > > > ZFS Configuration: > > bash-3.00# zfs list > NAME USED AVAIL REFER MOUNTPOINT > tank 93.3G 1006G 32.6K /tank > tank/data 93.3G 1006G 93.3G /data > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070619/443d36fc/attachment.html>
Joe S wrote:> I have a couple of performance questions. > > Right now, I am transferring about 200GB of data via NFS to my new > Solaris server. I started this YESTERDAY. When writing to my ZFS pool > via NFS, I notice what I believe to be slow write speeds. My client > hosts vary between a MacBook Pro running Tiger to a FreeBSD 6.2 Intel > server. All clients are connected to the a 10/100/1000 switch. > > * Is there anything I can tune on my server? > * Is the problem with NFS? > * Do I need to provide any other information? >If you have a lot of small files, doing this sort of thing over NFS can be pretty painful... for a speedup, consider: (cd <oldroot on client; tar cf - .) | ssh joes at server ''(cd <newroot on server>; tar xf -)'' - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Mario Goebbels
2007-Jun-20 09:49 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
> Correction: > > SATA Controller is a Sillcon Image 3114, not a 3112.Do these slow speeds only appear when writing via NFS or generally in all scenarios? Just asking, because Solaris'' ata driver doesn''t initialize settings like block mode, prefetch and such on IDE/SATA drives (that is if ata applies here with that chipset). -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 648 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070620/75b849d5/attachment.bin>
After researching this further, I found that there are some known performance issues with NFS + ZFS. I tried transferring files via SMB, and got write speeds on average of 25MB/s. So I will have my UNIX systems use SMB to write files to my Solaris server. This seems weird, but its fast. I''m sure Sun is working on fixing this. I can''t imagine running a Sun box with out NFS. On 6/20/07, Mario Goebbels <me at tomservo.cc> wrote:> > > Correction: > > > > SATA Controller is a Sillcon Image 3114, not a 3112. > > Do these slow speeds only appear when writing via NFS or generally in > all scenarios? Just asking, because Solaris'' ata driver doesn''t > initialize settings like block mode, prefetch and such on IDE/SATA > drives (that is if ata applies here with that chipset). > > -mg > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070620/8448a867/attachment.html>
Joe S writes: > After researching this further, I found that there are some known > performance issues with NFS + ZFS. I tried transferring files via SMB, and > got write speeds on average of 25MB/s. > > So I will have my UNIX systems use SMB to write files to my Solaris server. > This seems weird, but its fast. I''m sure Sun is working on fixing this. I > can''t imagine running a Sun box with out NFS. > Call be a picky but : There is no NFS over ZFS issue (IMO/FWIW). There is a ZFS over NVRAM issue; well understood (not related to NFS). There is a Samba vs NFS issue; not well understood (not related to ZFS). This last bullet is probably better suited for nfs-discuss at opensolaris.org. If ZFS is talking to storage array with NVRAM, then we have an issue (not related to NFS) described by : http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6462690 6462690 sd driver should set SYNC_NV bit when issuing SYNCHRONIZE CACHE to SBC-2 devices The above bug/rfe lies in the SD driver but very much triggered by ZFS particularly running NFS, but not only. It affects only NVRAM based storage and is being worked on. If ZFS is talking to a JBOD, then the slowness is a characteristic of NFS (not related to ZFS). So FWIW on JBOD, there is no ZFS+NFS "issue" in the sense that I don''t know how we could change ZFS to be significantly better at NFS nor do I know how to change NFS that would help _particularly_ ZFS. Doesn''t mean there is none, I just don''t know about them. So please ping me if you highlight such an issue. So if one replaces ZFS by some other filesystem and gets large speedup I''m interested (make sure the other filesystem either runs with write cache off, or flushes it on NFS commit). So that leaves us with a Samba vs NFS issue (not related to ZFS). We know that NFS is able to create file _at most_ at one file per server I/O latency. Samba appears better and this is what we need to investigate. It might be better in a way that NFS can borrow (maybe through some better NFSV4 delegation code) or Samba might be better by being careless with data. If we find such an NFS improvement it will help all backend filesystems not just ZFS. Which is why I say: There is no NFS over ZFS issue. -r
Brian Hechinger
2007-Jun-23 01:47 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
On Thu, Jun 21, 2007 at 11:36:53AM +0200, Roch - PAE wrote:> > code) or Samba might be better by being careless with data.Well, it *is* trying to be a Microsoft replacement. Gotta get it right, you know? ;) -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it''s just that most of the shit out there is built by people who''d be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke
Thomas Garner
2007-Jun-23 15:59 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
So it is expected behavior on my Nexenta alpha 7 server for Sun''s nfsd to stop responding after 2 hours of running a bittorrent client over nfs4 from a linux client, causing zfs snapshots to hang and requiring a hard reboot to get the world back in order? Thomas> There is no NFS over ZFS issue (IMO/FWIW).> If ZFS is talking to a JBOD, then the slowness is a > characteristic of NFS (not related to ZFS). > > So FWIW on JBOD, there is no ZFS+NFS "issue" in the sense > that I don''t know how we could change ZFS to be > significantly better at NFS nor do I know how to change NFS > that would help _particularly_ ZFS. Doesn''t mean there is > none, I just don''t know about them. So please ping me if you > highlight such an issue. So if one replaces ZFS by some other > filesystem and gets large speedup I''m interested (make sure > the other filesystem either runs with write cache off, or > flushes it on NFS commit). > > So that leaves us with a Samba vs NFS issue (not related to > ZFS). We know that NFS is able to create file _at most_ at > one file per server I/O latency. Samba appears better and this is > what we need to investigate. It might be better in a way > that NFS can borrow (maybe through some better NFSV4 delegation > code) or Samba might be better by being careless with data. > If we find such an NFS improvement it will help all backend > filesystems not just ZFS. > > Which is why I say: There is no NFS over ZFS issue.
Paul Fisher
2007-Jun-23 17:05 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
> From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of > Thomas Garner > > So it is expected behavior on my Nexenta alpha 7 server for Sun''s nfsd > to stop responding after 2 hours of running a bittorrent client over > nfs4 from a linux client, causing zfs snapshots to hang and requiring > a hard reboot to get the world back in order?We have seen this behavior, but it appears to be entirely related to the hardware having the "Intel IPMI" stuff swallow up the NFS traffic on port 623 directly by the network hardware and never getting. http://blogs.sun.com/shepler/entry/port_623_or_the_mount -- paul
Thomas Garner
2007-Jun-24 17:05 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
> We have seen this behavior, but it appears to be entirely related to the hardware having the "Intel IPMI" stuff swallow up the NFS traffic on port 623 directly by the network hardware and never getting. > > http://blogs.sun.com/shepler/entry/port_623_or_the_mountUnfortunately, this nfs hangs across 3 separate machines, none of which should have this IPMI issue. It did spur me on to dig a little deeper, though, so thanks for the encouragement that all may not be well. Can anyone debug this? Remember that this is Nexenta Alpha 7, so it should be b61. nfsd is totally hung (rpc timeouts) and zfs would be having problems taking snapshots, if I hadn''t disabled the hourly snapshots. Thanks! Thomas [tgarner at flyingcows ~]$ rpcinfo -t filer0 nfs rpcinfo: RPC: Timed out program 100003 version 0 is not available echo "::pgrep nfsd | ::walk thread | ::findstack -v" | mdb -k stack pointer for thread 821cda00: 822d6e28 822d6e5c swtch+0x17d() 822d6e8c cv_wait_sig_swap_core+0x13f(8b8a9232, 8b8a9200, 0) 822d6ea4 cv_wait_sig_swap+0x13(8b8a9232, 8b8a9200) 822d6ee0 cv_waituntil_sig+0x100(8b8a9232, 8b8a9200, 0) 822d6f44 poll_common+0x3e1(8069480, a, 0, 0) 822d6f84 pollsys+0x7c() 822d6fac sys_sysenter+0x102() stack pointer for thread 821d2e00: 8c279d98 8c279dcc swtch+0x17d() 8c279df4 cv_wait_sig+0x123(8988796e, 89887970) 8c279e2c svc_wait+0xaa(1) 8c279f84 nfssys+0x423() 8c279fac sys_sysenter+0x102() stack pointer for thread a9f88800: 8c92e218 8c92e244 swtch+0x17d() 8c92e254 cv_wait+0x4e(8a4169ea, 8a4169e0) 8c92e278 mv_wait_for_dma+0x32() 8c92e2a4 mv_start+0x278(88252c78, 89833498) 8c92e2d4 sata_hba_start+0x79(8987d23c, 8c92e304) 8c92e308 sata_txlt_synchronize_cache+0xb7(8987d23c) 8c92e334 sata_scsi_start+0x1b7(8987d1e4, 8987d1e0) 8c92e368 scsi_transport+0x52(8987d1e0) 8c92e3a4 sd_start_cmds+0x28a(8a2710c0, 0) 8c92e3c0 sd_core_iostart+0x158(18, 8a2710c0, 8da3be70) 8c92e3f8 sd_uscsi_strategy+0xe8(8da3be70) 8c92e414 sd_send_scsi_SYNCHRONIZE_CACHE+0xd4(8a2710c0, 8c50074c) 8c92e4b0 sdioctl+0x48e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0) 8c92e4dc cdev_ioctl+0x2e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0) 8c92e504 ldi_ioctl+0xa4(8a671700, 422, 8c50074c, 80100000, 883cee68, 0) 8c92e544 vdev_disk_io_start+0x187(8c500580) 8c92e554 vdev_io_start+0x18(8c500580) 8c92e580 zio_vdev_io_start+0x142(8c500580) 8c92e59c zio_next_stage+0xaa(8c500580) 8c92e5b0 zio_ready+0x136(8c500580) 8c92e5cc zio_next_stage+0xaa(8c500580) 8c92e5ec zio_wait_for_children+0x46(8c500580, 1, 8c50076c) 8c92e600 zio_wait_children_ready+0x18(8c500580) 8c92e614 zio_next_stage_async+0xac(8c500580) 8c92e624 zio_nowait+0xe(8c500580) 8c92e660 zio_ioctl+0x94(9c6f8300, 89557c80, 89556400, 422, 0, 0) 8c92e694 zil_flush_vdev+0x54(89557c80, 0, 0, 8c92e6e0, 9c6f8500) 8c92e6e4 zil_flush_vdevs+0x6b(8bbe46c0) 8c92e734 zil_commit_writer+0x35f(8bbe46c0, 3497c, 0, 4af5, 0) 8c92e774 zil_commit+0x96(8bbe46c0, ffffffff, ffffffff, 4af5, 0) 8c92e7e8 zfs_putpage+0x1e4(8c8ab480, 0, 0, 0, 0, 8c6c75c0) 8c92e824 vhead_putpage+0x95(8c8ab480, 0, 0, 0, 0, 8c6c75c0) 8c92e86c fop_putpage+0x27(8c8ab480, 0, 0, 0, 0, 8c6c75c0) 8c92e91c rfs4_op_commit+0x153(82141dd4, b28c3100, 8c92ed8c, 8c92e948) 8c92ea48 rfs4_compound+0x1ce(8c92ead0, 8c92ea7c, 0, 8c92ed8c, 0) 8c92eaac rfs4_dispatch+0x65(8bf9b248, 8c92ed8c, b28c5a40, 8c92ead0) 8c92ed10 common_dispatch+0x6b0(8c92ed8c, b28c5a40, 2, 4, 8bf9c01c, 8bf9b1f0) 8c92ed34 rfs_dispatch+0x1f(8c92ed8c, b28c5a40) 8c92edc4 svc_getreq+0x158(b28c5a40, 842952a0) 8c92ee0c svc_run+0x146(898878e8) 8c92ee2c svc_do_run+0x6e(1) 8c92ef84 nfssys+0x3fb() 8c92efac sys_sysenter+0x102() <snipping out a bunch of other threads>
Sorry about that; looks like you''ve hit this: 6546683 marvell88sx driver misses wakeup for mv_empty_cv http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6546683 Fixed in snv_64. -r Thomas Garner writes: > > We have seen this behavior, but it appears to be entirely related to the hardware having the "Intel IPMI" stuff swallow up the NFS traffic on port 623 directly by the network hardware and never getting. > > > > http://blogs.sun.com/shepler/entry/port_623_or_the_mount > > Unfortunately, this nfs hangs across 3 separate machines, none of > which should have this IPMI issue. It did spur me on to dig a little > deeper, though, so thanks for the encouragement that all may not be > well. > > Can anyone debug this? Remember that this is Nexenta Alpha 7, so it > should be b61. nfsd is totally hung (rpc timeouts) and zfs would be > having problems taking snapshots, if I hadn''t disabled the hourly > snapshots. > > Thanks! > Thomas > > [tgarner at flyingcows ~]$ rpcinfo -t filer0 nfs > rpcinfo: RPC: Timed out > program 100003 version 0 is not available > > echo "::pgrep nfsd | ::walk thread | ::findstack -v" | mdb -k > > stack pointer for thread 821cda00: 822d6e28 > 822d6e5c swtch+0x17d() > 822d6e8c cv_wait_sig_swap_core+0x13f(8b8a9232, 8b8a9200, 0) > 822d6ea4 cv_wait_sig_swap+0x13(8b8a9232, 8b8a9200) > 822d6ee0 cv_waituntil_sig+0x100(8b8a9232, 8b8a9200, 0) > 822d6f44 poll_common+0x3e1(8069480, a, 0, 0) > 822d6f84 pollsys+0x7c() > 822d6fac sys_sysenter+0x102() > stack pointer for thread 821d2e00: 8c279d98 > 8c279dcc swtch+0x17d() > 8c279df4 cv_wait_sig+0x123(8988796e, 89887970) > 8c279e2c svc_wait+0xaa(1) > 8c279f84 nfssys+0x423() > 8c279fac sys_sysenter+0x102() > stack pointer for thread a9f88800: 8c92e218 > 8c92e244 swtch+0x17d() > 8c92e254 cv_wait+0x4e(8a4169ea, 8a4169e0) > 8c92e278 mv_wait_for_dma+0x32() > 8c92e2a4 mv_start+0x278(88252c78, 89833498) > 8c92e2d4 sata_hba_start+0x79(8987d23c, 8c92e304) > 8c92e308 sata_txlt_synchronize_cache+0xb7(8987d23c) > 8c92e334 sata_scsi_start+0x1b7(8987d1e4, 8987d1e0) > 8c92e368 scsi_transport+0x52(8987d1e0) > 8c92e3a4 sd_start_cmds+0x28a(8a2710c0, 0) > 8c92e3c0 sd_core_iostart+0x158(18, 8a2710c0, 8da3be70) > 8c92e3f8 sd_uscsi_strategy+0xe8(8da3be70) > 8c92e414 sd_send_scsi_SYNCHRONIZE_CACHE+0xd4(8a2710c0, 8c50074c) > 8c92e4b0 sdioctl+0x48e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0) > 8c92e4dc cdev_ioctl+0x2e(1ac0080, 422, 8c50074c, 80100000, 883cee68, 0) > 8c92e504 ldi_ioctl+0xa4(8a671700, 422, 8c50074c, 80100000, 883cee68, 0) > 8c92e544 vdev_disk_io_start+0x187(8c500580) > 8c92e554 vdev_io_start+0x18(8c500580) > 8c92e580 zio_vdev_io_start+0x142(8c500580) > 8c92e59c zio_next_stage+0xaa(8c500580) > 8c92e5b0 zio_ready+0x136(8c500580) > 8c92e5cc zio_next_stage+0xaa(8c500580) > 8c92e5ec zio_wait_for_children+0x46(8c500580, 1, 8c50076c) > 8c92e600 zio_wait_children_ready+0x18(8c500580) > 8c92e614 zio_next_stage_async+0xac(8c500580) > 8c92e624 zio_nowait+0xe(8c500580) > 8c92e660 zio_ioctl+0x94(9c6f8300, 89557c80, 89556400, 422, 0, 0) > 8c92e694 zil_flush_vdev+0x54(89557c80, 0, 0, 8c92e6e0, 9c6f8500) > 8c92e6e4 zil_flush_vdevs+0x6b(8bbe46c0) > 8c92e734 zil_commit_writer+0x35f(8bbe46c0, 3497c, 0, 4af5, 0) > 8c92e774 zil_commit+0x96(8bbe46c0, ffffffff, ffffffff, 4af5, 0) > 8c92e7e8 zfs_putpage+0x1e4(8c8ab480, 0, 0, 0, 0, 8c6c75c0) > 8c92e824 vhead_putpage+0x95(8c8ab480, 0, 0, 0, 0, 8c6c75c0) > 8c92e86c fop_putpage+0x27(8c8ab480, 0, 0, 0, 0, 8c6c75c0) > 8c92e91c rfs4_op_commit+0x153(82141dd4, b28c3100, 8c92ed8c, 8c92e948) > 8c92ea48 rfs4_compound+0x1ce(8c92ead0, 8c92ea7c, 0, 8c92ed8c, 0) > 8c92eaac rfs4_dispatch+0x65(8bf9b248, 8c92ed8c, b28c5a40, 8c92ead0) > 8c92ed10 common_dispatch+0x6b0(8c92ed8c, b28c5a40, 2, 4, 8bf9c01c, 8bf9b1f0) > 8c92ed34 rfs_dispatch+0x1f(8c92ed8c, b28c5a40) > 8c92edc4 svc_getreq+0x158(b28c5a40, 842952a0) > 8c92ee0c svc_run+0x146(898878e8) > 8c92ee2c svc_do_run+0x6e(1) > 8c92ef84 nfssys+0x3fb() > 8c92efac sys_sysenter+0x102() > <snipping out a bunch of other threads>
Thomas Garner
2007-Jun-25 15:59 UTC
[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)
Thanks, Roch! Much appreciated knowing what the problem is and that a fix is in a forthcoming release. Thomas On 6/25/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:> > > Sorry about that; looks like you''ve hit this: > > 6546683 marvell88sx driver misses wakeup for mv_empty_cv > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6546683 > > Fixed in snv_64. > -r
Regarding the bold statement There is no NFS over ZFS issue What I mean here is that, if you _do_ encounter a performance pathology not linked to the NVRAM Storage/cache flush issue then you _should_ complain or better get someone to do an analysis of the situation. One should not assume that some observed pathological performance of NFS/ZFS is widespread and due to some known ZFS issue about to be fixed. To be sure, there are lots of performance opportunities that will provide incremental improvements the most significant of which "ZFS Separate Intent Log" just integrated in Nevada. This opens up the field for further NFS/ZFS performance investigations. But the data that got this thread started seem to highlight an NFS vs Samba opportinity, something we need to look into. Otherwise I don''t think that the data produced so far has hightlighted any specific NFS/ZFS issue. There are certainly opportinities for incremental performance improvements but, to the best of my knowledge, outside the NVRAM/Flush issue on certain storage : There are no known prevalent NFS over ZFS performance pathologies on record. -r Ref: http://mail.opensolaris.org/pipermail/zfs-discuss/2007-June/thread.html#29026
> So that leaves us with a Samba vs NFS issue (not > related to > ZFS). We know that NFS is able to create file _at > most_ at > one file per server I/O latency. Samba appears better > and this is > what we need to investigate. It might be better in a > way > that NFS can borrow (maybe through some better NFSV4 > delegation > code) or Samba might be better by being careless with > data. > If we find such an NFS improvement it will help all > backend > filesystems not just ZFS.Just curious: Was this nfs-samba ghost ever caught and sent back to the spirit realm? :) This message posted from opensolaris.org